1
|
Kuo HC, Hsieh YP, Tseng HH, Wang CT, Fang SH, Tsao Y. Toward Real-World Voice Disorder Classification. IEEE Trans Biomed Eng 2023; 70:2922-2932. [PMID: 37099463 DOI: 10.1109/tbme.2023.3270532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
OBJECTIVE Voice disorders significantly compromise individuals' ability to speak in their daily lives. Without early diagnosis and treatment, these disorders may deteriorate drastically. Thus, automatic classification systems at home are desirable for people who are inaccessible to clinical disease assessments. However, the performance of such systems may be weakened due to the constrained resources and domain mismatch between the clinical data and noisy real-world data. METHODS This study develops a compact and domain-robust voice disorder classification system to identify the utterances of health, neoplasm, and benign structural diseases. Our proposed system utilizes a feature extractor model composed of factorized convolutional neural networks and subsequently deploys domain adversarial training to reconcile the domain mismatch by extracting domain-invariant features. RESULTS The results show that the unweighted average recall in the noisy real-world domain improved by 13% and remained at 80% in the clinic domain with only slight degradation. The domain mismatch was effectively eliminated. Moreover, the proposed system reduced the usage of both memory and computation by over 73.9%. CONCLUSION By deploying factorized convolutional neural networks and domain adversarial training, domain-invariant features can be derived for voice disorder classification with limited resources. The promising results confirm that the proposed system can significantly reduce resource consumption and improve classification accuracy by considering the domain mismatch. SIGNIFICANCE To the best of our knowledge, this is the first study that jointly considers real-world model compression and noise-robustness issues in voice disorder classification. The proposed system is intended for application to embedded systems with limited resources.
Collapse
|
2
|
Shahbazi-Gahrouei D, Bagherzadeh S, Torabinezhad F, Mahdavi SM, Fadavi P, Salmanian S. Binary logistic regression modeling of voice impairment and voice assessment in iranian patients with nonlaryngeal head-and-neck cancers after chemoradiation therapy: Objective and subjective voice evaluation. JOURNAL OF MEDICAL SIGNALS & SENSORS 2023. [DOI: 10.4103/jmss.jmss_143_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
3
|
Rong P, Hansen O, Heidrick L. Relationship between rate-elicited changes in muscular-kinematic control strategies and acoustic performance in individuals with ALS-A multimodal investigation. JOURNAL OF COMMUNICATION DISORDERS 2022; 99:106253. [PMID: 36007484 DOI: 10.1016/j.jcomdis.2022.106253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 06/15/2023]
Abstract
INTRODUCTION As a key control variable, duration has been long suspected to mediate the organization of speech motor control strategies, which has management implications for neuromotor speech disorders. This study aimed to experimentally delineate the role of duration in organizing speech motor control in neurologically healthy and impaired speakers using a voluntary speaking rate manipulation paradigm. METHODS Thirteen individuals with amyotrophic lateral sclerosis (ALS) and 10 healthy controls performed a sentence reading task three times, first at their habitual rate, then at a slower rate. A multimodal approach combining surface electromyography, kinematic, and acoustic technologies was used to record jaw muscle activities, jaw kinematics, and speech acoustics. Six muscular-kinematic features were extracted and factor-analyzed to characterize the organization of the mandibular control hierarchy. Five acoustic features were extracted, measuring the spectrotemporal properties of the diphthong /ɑɪ/ and the plosives /t/ and /k/. RESULTS The muscular-kinematic features converged into two interpretable latent factors, reflecting the level and cohesiveness/flexibility of mandibular control, respectively. Voluntary rate reduction led to a trend toward (1) finer, less cohesive, and more flexible mandibular control, and (2) increased range and decreased transition slope of the diphthong formants, across neurologically healthy and impaired groups. Differential correlations were found between the rate-elicited changes in mandibular control and acoustic performance for neurologically healthy and impaired speakers. CONCLUSIONS The results provided empirical evidence for the long-suspected but previously unsubstantiated role of duration in (re)organizing speech motor control strategies. The rate-elicited reorganization of muscular-kinematic control contributed to the acoustic performance of healthy speakers, in ways consistent with theoretical predictions. Such contributions were less consistent in impaired speakers, implying the complex nature of speaking rate reduction in ALS, possibly reflecting an interplay of disease-related constraints and volitional duration control. This information may help to stratify and identify candidates for the rate manipulation therapy.
Collapse
Affiliation(s)
- Panying Rong
- Department of Speech-Language-Hearing: Sciences & Disorders, University of Kansas, Lawrence KS, USA.
| | - Olivia Hansen
- Department of Speech-Language-Hearing: Sciences & Disorders, University of Kansas, Lawrence KS, USA; Department of Hearing & Speech, University of Kansas Medical Center, Kansas City, KS, USA
| | - Lindsey Heidrick
- Department of Hearing & Speech, University of Kansas Medical Center, Kansas City, KS, USA
| |
Collapse
|
4
|
Bao G, Lin M, Sang X, Hou Y, Liu Y, Wu Y. Classification of Dysphonic Voices in Parkinson's Disease with Semi-Supervised Competitive Learning Algorithm. BIOSENSORS 2022; 12:502. [PMID: 35884305 PMCID: PMC9312485 DOI: 10.3390/bios12070502] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 07/04/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
This article proposes a novel semi-supervised competitive learning (SSCL) algorithm for vocal pattern classifications in Parkinson’s disease (PD). The acoustic parameters of voice records were grouped into the families of jitter, shimmer, harmonic-to-noise, frequency, and nonlinear measures, respectively. The linear correlations were computed within each acoustic parameter family. According to the correlation matrix results, the jitter, shimmer, and harmonic-to-noise parameters presented as highly correlated in terms of Pearson’s correlation coefficients. Then, the principal component analysis (PCA) technique was implemented to eliminate the redundant dimensions of the acoustic parameters for each family. The Mann−Whitney−Wilcoxon hypothesis test was used to evaluate the significant difference of the PCA-projected features between the healthy subjects and PD patients. Eight dominant PCA-projected features were selected based on the eigenvalue threshold criterion and the statistical significance level (p < 0.05) of the hypothesis test. The SSCL algorithm proposed in this paper included the procedures of the competitive prototype seed selection, K-means optimization, and the nearest neighbor classifications. The pattern classification experimental results showed that the proposed SSCL method can provide the excellent diagnostic performances in terms of accuracy (0.838), recall (0.825), specificity (0.85), precision (0.846), F-score (0.835), Matthews correlation coefficient (0.675), area under the receiver operating characteristic curve (0.939), and Kappa coefficient (0.675), which were consistently better than those results of conventional KNN or SVM classifiers.
Collapse
|
5
|
Ghasemzadeh H, Doyle PC, Searl J. Image representation of the acoustic signal: An effective tool for modeling spectral and temporal dynamics of connected speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:580. [PMID: 35931551 PMCID: PMC9458292 DOI: 10.1121/10.0012734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 06/09/2022] [Accepted: 06/30/2022] [Indexed: 06/15/2023]
Abstract
Recent studies have advocated for the use of connected speech in clinical voice and speech assessment. This suggestion is based on the presence of clinically relevant information within the onset, offset, and variation in connected speech. Existing works on connected speech utilize methods originally designed for analysis of sustained vowels and, hence, cannot properly quantify the transient behavior of connected speech. This study presents a non-parametric approach to analysis based on a two-dimensional, temporal-spectral representation of speech. Variations along horizontal and vertical axes corresponding to the temporal and spectral dynamics of speech were quantified using two statistical models. The first, a spectral model, was defined as the probability of changes between the energy of two consecutive frequency sub-bands at a fixed time segment. The second, a temporal model, was defined as the probability of changes in the energy of a sub-band between consecutive time segments. As the first step of demonstrating the efficacy and utility of the proposed method, a diagnostic framework was adopted in this study. Data obtained revealed that the proposed method has (at minimum) significant discriminatory power over the existing alternative approaches.
Collapse
Affiliation(s)
- Hamzeh Ghasemzadeh
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, One Bowdoin Square, 11th Floor, Boston, Massachusetts 02114, USA
| | - Philip C Doyle
- Department of Otolaryngology Head and Neck Surgery, Division of Laryngology, Stanford University School of Medicine, Stanford University, 801 Welch Road, Stanford, California. 94305, USA
| | - Jeff Searl
- Department of Communicative Sciences and Disorders, Michigan State University, 1026 Red Cedar Road, Oyer Speech & Hearing Building, East Lansing, Michigan 48824, USA
| |
Collapse
|
6
|
Romana A, Bandon J, Carlozzi N, Roberts A, Provost EM. Classification of Manifest Huntington Disease using Vowel Distortion Measures. INTERSPEECH 2020; 2020:4966-4970. [PMID: 33244474 PMCID: PMC7685306 DOI: 10.21437/interspeech.2020-2724] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Huntington disease (HD) is a fatal autosomal dominant neurocognitive disorder that causes cognitive disturbances, neuropsychiatric symptoms, and impaired motor abilities (e.g., gait, speech, voice). Due to its progressive nature, HD treatment requires ongoing clinical monitoring of symptoms. Individuals with the Huntingtin gene mutation, which causes HD, may exhibit a range of speech symptoms as they progress from premanifest to manifest HD. Speech-based passive monitoring has the potential to augment clinical information by more continuously tracking manifestation symptoms. Differentiating between premanifest and manifest HD is an important yet under-studied problem, as this distinction marks the need for increased treatment. In this work we present the first demonstration of how changes in speech can be measured to differentiate between premanifest and manifest HD. To do so, we focus on one speech symptom of HD: distorted vowels. We introduce a set of Filtered Vowel Distortion Measures (FVDM) which we extract from read speech. We show that FVDM, coupled with features from existing literature, can differentiate between premanifest and manifest HD with 80% accuracy.
Collapse
Affiliation(s)
- Amrit Romana
- Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - John Bandon
- Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Noelle Carlozzi
- Physical Medicine & Rehabilitation, University of Michigan, Ann Arbor, Michigan, USA
| | - Angela Roberts
- Communication Sciences and Disorders, Northwestern University, Evanston, Illinois, USA
| | - Emily Mower Provost
- Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
7
|
Chen L, Chen J. Deep Neural Network for Automatic Classification of Pathological Voice Signals. J Voice 2020; 36:288.e15-288.e24. [PMID: 32660846 DOI: 10.1016/j.jvoice.2020.05.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 05/17/2020] [Accepted: 05/26/2020] [Indexed: 10/23/2022]
Abstract
OBJECTIVES Computer-aided pathological voice detection is efficient for initial screening of pathological voice, and has received high academic and clinical attention. This paper proposes an automatic diagnosis method of pathological voice based on deep neural network (DNN). Other two classification models (support vector machines and random forests) were used to verify the effectiveness of DNN. METHODS In this paper, we extracted 12 Mel frequency cepstral coefficients of each voice sample as row features. The constructed DNN consists a two-layer stacked sparse autoencoders network and a softmax layer. The stacked sparse autoencoders layer can learn high-level features from raw Mel frequency cepstral coefficients features. Then, the softmax layer can diagnose pathological voice according to high-level features. The DNN and the other two comparison models used the same train set and test set for the experiment. RESULTS Experimental results reveal that the value of sensitivity, specificity, precision, accuracy, and F1 score of the DNN can reach 97.8%, 99.4%, 99.4%, 98.6%, and 98.4%, respectively. The five indexes of DNN classification results are at least 6.2%, 5%, 5.6%, 5.7%, and 6.2% higher than the comparison models (support vector machine and random forest). CONCLUSIONS The proposed DNN can learn advanced features from raw acoustic features, and distinguish pathological voice from healthy voice. To the extent of this preliminary study, future studies can further explore the application of DNN in other experiments and clinical practice.
Collapse
Affiliation(s)
- Lili Chen
- School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, China; Chongqing Survey Institute, Chongqing, China.
| | - Junjiang Chen
- School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, China
| |
Collapse
|
8
|
Complexity Measures of Voice Recordings as a Discriminative Tool for Parkinson's Disease. BIOSENSORS-BASEL 2019; 10:bios10010001. [PMID: 31861890 PMCID: PMC7168233 DOI: 10.3390/bios10010001] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 12/17/2019] [Accepted: 12/17/2019] [Indexed: 11/24/2022]
Abstract
In this paper, we have investigated the differences in the voices of Parkinson’s disease (PD) and age-matched control (CO) subjects when uttering three phonemes using two complexity measures: fractal dimension (FD) and normalised mutual information (NMI). Three sustained phonetic voice recordings, /a/, /u/ and /m/, from 22 CO (mean age = 66.91) and 24 PD (mean age = 71.83) participants were analysed. FD was first computed for PD and CO voice recordings, followed by the computation of NMI between the test groups: PD–CO, PD–PD and CO–CO. Four features reported in the literature—normalised pitch period entropy (Norm. PPE), glottal-to-noise excitation ratio (GNE), detrended fluctuation analysis (DFA) and glottal closing quotient (ClQ)—were also computed for comparison with the proposed complexity measures. The statistical significance of the features was tested using a one-way ANOVA test. Support vector machine (SVM) with a linear kernel was used to classify the test groups, using a leave-one-out validation method. The results showed that PD voice recordings had lower FD compared to CO (p < 0.008). It was also observed that the average NMI between CO voice recordings was significantly lower compared with the CO–PD and PD–PD groups (p < 0.036) for the three phonetic sounds. The average NMI and FD demonstrated higher accuracy (>80%) in differentiating the test groups compared with other speech feature-based classifications. This study has demonstrated that the voices of PD patients has reduced FD, and NMI between voice recordings of PD–CO and PD–PD is higher compared with CO–CO. This suggests that the use of NMI obtained from the sample voice, when paired with known groups of CO and PD, can be used to identify PD voices. These findings could have applications for population screening.
Collapse
|
9
|
On the design of automatic voice condition analysis systems. Part I: Review of concepts and an insight to the state of the art. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.12.024] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
10
|
Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach. J Voice 2018; 33:634-641. [PMID: 29567049 DOI: 10.1016/j.jvoice.2018.02.003] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 02/06/2018] [Indexed: 01/20/2023]
Abstract
OBJECTIVES Computerized detection of voice disorders has attracted considerable academic and clinical interest in the hope of providing an effective screening method for voice diseases before endoscopic confirmation. This study proposes a deep-learning-based approach to detect pathological voice and examines its performance and utility compared with other automatic classification algorithms. METHODS This study retrospectively collected 60 normal voice samples and 402 pathological voice samples of 8 common clinical voice disorders in a voice clinic of a tertiary teaching hospital. We extracted Mel frequency cepstral coefficients from 3-second samples of a sustained vowel. The performances of three machine learning algorithms, namely, deep neural network (DNN), support vector machine, and Gaussian mixture model, were evaluated based on a fivefold cross-validation. Collective cases from the voice disorder database of MEEI (Massachusetts Eye and Ear Infirmary) were used to verify the performance of the classification mechanisms. RESULTS The experimental results demonstrated that DNN outperforms Gaussian mixture model and support vector machine. Its accuracy in detecting voice pathologies reached 94.26% and 90.52% in male and female subjects, based on three representative Mel frequency cepstral coefficient features. When applied to the MEEI database for validation, the DNN also achieved a higher accuracy (99.32%) than the other two classification algorithms. CONCLUSIONS By stacking several layers of neurons with optimized weights, the proposed DNN algorithm can fully utilize the acoustic features and efficiently differentiate between normal and pathological voice samples. Based on this pilot study, future research may proceed to explore more application of DNN from laboratory and clinical perspectives.
Collapse
|
11
|
Wu Y, Chen P, Yao Y, Ye X, Xiao Y, Liao L, Wu M, Chen J. Dysphonic Voice Pattern Analysis of Patients in Parkinson's Disease Using Minimum Interclass Probability Risk Feature Selection and Bagging Ensemble Learning Methods. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017; 2017:4201984. [PMID: 28553366 PMCID: PMC5434464 DOI: 10.1155/2017/4201984] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Revised: 03/08/2017] [Accepted: 04/06/2017] [Indexed: 11/17/2022]
Abstract
Analysis of quantified voice patterns is useful in the detection and assessment of dysphonia and related phonation disorders. In this paper, we first study the linear correlations between 22 voice parameters of fundamental frequency variability, amplitude variations, and nonlinear measures. The highly correlated vocal parameters are combined by using the linear discriminant analysis method. Based on the probability density functions estimated by the Parzen-window technique, we propose an interclass probability risk (ICPR) method to select the vocal parameters with small ICPR values as dominant features and compare with the modified Kullback-Leibler divergence (MKLD) feature selection approach. The experimental results show that the generalized logistic regression analysis (GLRA), support vector machine (SVM), and Bagging ensemble algorithm input with the ICPR features can provide better classification results than the same classifiers with the MKLD selected features. The SVM is much better at distinguishing normal vocal patterns with a specificity of 0.8542. Among the three classification methods, the Bagging ensemble algorithm with ICPR features can identify 90.77% vocal patterns, with the highest sensitivity of 0.9796 and largest area value of 0.9558 under the receiver operating characteristic curve. The classification results demonstrate the effectiveness of our feature selection and pattern analysis methods for dysphonic voice detection and measurement.
Collapse
Affiliation(s)
- Yunfeng Wu
- School of Information Science and Technology, Xiamen University, 422 Si Ming South Road, Xiamen, Fujian 361005, China
| | - Pinnan Chen
- School of Information Science and Technology, Xiamen University, 422 Si Ming South Road, Xiamen, Fujian 361005, China
| | - Yuchen Yao
- School of Information Science and Technology, Xiamen University, 422 Si Ming South Road, Xiamen, Fujian 361005, China
| | - Xiaoquan Ye
- School of Information Science and Technology, Xiamen University, 422 Si Ming South Road, Xiamen, Fujian 361005, China
| | - Yugui Xiao
- School of Information Science and Technology, Xiamen University, 422 Si Ming South Road, Xiamen, Fujian 361005, China
| | - Lifang Liao
- School of Information Science and Technology, Xiamen University, 422 Si Ming South Road, Xiamen, Fujian 361005, China
| | - Meihong Wu
- School of Information Science and Technology, Xiamen University, 422 Si Ming South Road, Xiamen, Fujian 361005, China
| | - Jian Chen
- Department of Rehabilitation, Zhongshan Hospital, Xiamen University, 201 Hubin South Road, Xiamen, Fujian 361004, China
| |
Collapse
|
12
|
Lopes LW, Batista Simões L, Delfino da Silva J, da Silva Evangelista D, da Nóbrega e Ugulino AC, Oliveira Costa Silva P, Jefferson Dias Vieira V. Accuracy of Acoustic Analysis Measurements in the Evaluation of Patients With Different Laryngeal Diagnoses. J Voice 2017; 31:382.e15-382.e26. [DOI: 10.1016/j.jvoice.2016.08.015] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Revised: 08/20/2016] [Accepted: 08/23/2016] [Indexed: 11/29/2022]
|
13
|
Speech disorders in Parkinson’s disease: early diagnostics and effects of medication and brain stimulation. J Neural Transm (Vienna) 2017; 124:303-334. [PMID: 28101650 DOI: 10.1007/s00702-017-1676-0] [Citation(s) in RCA: 98] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 01/04/2017] [Indexed: 01/31/2023]
|
14
|
Mekyska J, Janousova E, Gomez-Vilda P, Smekal Z, Rektorova I, Eliasova I, Kostalova M, Mrackova M, Alonso-Hernandez JB, Faundez-Zanuy M, López-de-Ipiña K. Robust and complex approach of pathological speech signal analysis. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.02.085] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
15
|
Ghasemzadeh H, Tajik Khass M, Khalil Arjmandi M, Pooyan M. Detection of vocal disorders based on phase space parameters and Lyapunov spectrum. Biomed Signal Process Control 2015. [DOI: 10.1016/j.bspc.2015.07.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
|
17
|
Yang S, Zheng F, Luo X, Cai S, Wu Y, Liu K, Wu M, Chen J, Krishnan S. Effective dysphonia detection using feature dimension reduction and kernel density estimation for patients with Parkinson's disease. PLoS One 2014; 9:e88825. [PMID: 24586406 PMCID: PMC3930574 DOI: 10.1371/journal.pone.0088825] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Accepted: 01/12/2014] [Indexed: 11/19/2022] Open
Abstract
Detection of dysphonia is useful for monitoring the progression of phonatory impairment for patients with Parkinson's disease (PD), and also helps assess the disease severity. This paper describes the statistical pattern analysis methods to study different vocal measurements of sustained phonations. The feature dimension reduction procedure was implemented by using the sequential forward selection (SFS) and kernel principal component analysis (KPCA) methods. Four selected vocal measures were projected by the KPCA onto the bivariate feature space, in which the class-conditional feature densities can be approximated with the nonparametric kernel density estimation technique. In the vocal pattern classification experiments, Fisher's linear discriminant analysis (FLDA) was applied to perform the linear classification of voice records for healthy control subjects and PD patients, and the maximum a posteriori (MAP) decision rule and support vector machine (SVM) with radial basis function kernels were employed for the nonlinear classification tasks. Based on the KPCA-mapped feature densities, the MAP classifier successfully distinguished 91.8% voice records, with a sensitivity rate of 0.986, a specificity rate of 0.708, and an area value of 0.94 under the receiver operating characteristic (ROC) curve. The diagnostic performance provided by the MAP classifier was superior to those of the FLDA and SVM classifiers. In addition, the classification results indicated that gender is insensitive to dysphonia detection, and the sustained phonations of PD patients with minimal functional disability are more difficult to be correctly identified.
Collapse
Affiliation(s)
- Shanshan Yang
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian, China
| | - Fang Zheng
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian, China
| | - Xin Luo
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian, China
| | - Suxian Cai
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian, China
| | - Yunfeng Wu
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian, China
| | - Kaizhi Liu
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian, China
| | - Meihong Wu
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian, China
| | - Jian Chen
- Department of Rehabilitation, Zhongshan Hospital Xiamen University, Xiamen, Fujian, China
| | - Sridhar Krishnan
- Department of Electrical and Computer Engineering, Ryerson University, Toronto, Ontario, Canada
| |
Collapse
|
18
|
Erfanian Saeedi N, Almasganj F. Wavelet adaptation for automatic voice disorders sorting. Comput Biol Med 2013; 43:699-704. [PMID: 23668345 DOI: 10.1016/j.compbiomed.2013.03.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2011] [Revised: 11/22/2012] [Accepted: 03/17/2013] [Indexed: 10/27/2022]
Abstract
Early diagnosis of voice disorders and abnormalities by means of digital speech processing is a subject of interest for many researchers. Various methods are introduced in the literature, some of which are able to extensively discriminate pathological voices from normal ones. Voice disorders sorting, on the other hand, has received less attention due to the complexity of the problem. Although, previous publications show satisfactory results in classifying one type of disordered voice from normal cases, or two different types of abnormalities from each other, no comprehensive approach for automatic sorting of vocal abnormalities has been offered yet. In this paper, a solution for this problem is suggested. We create a powerful wavelet feature extraction approach, in which, instead of standard wavelets, adaptive wavelets are generated and applied to the voice signals. Orthogonal wavelets are parameterized via lattice structure and then, the optimal parameters are investigated through an iterative process, using the genetic algorithm (GA). GA is guided by the classifier results. Based on the generated wavelet, a wavelet-filterbank is constructed and the voice signals are decomposed to compute eight energy-based features. A support vector machine (SVM) then classifies the signals using the extracted features. Experimental results show that six various types of vocal disorders: paralysis, nodules, polyps, edema, spasmodic dysphonia and keratosis are fully sorted via the proposed method. This could be a successful step toward sorting a larger number of abnormalities associated with the vocal system.
Collapse
Affiliation(s)
- Nafise Erfanian Saeedi
- Department of Electrical and Electronic Engineering, The University of Melbourne, Victoria 3010, Australia.
| | | |
Collapse
|
19
|
Todder D, Avissar S, Schreiber G. Non-Linear Dynamic Analysis of Inter-Word Time Intervals in Psychotic Speech. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE-JTEHM 2013; 1:2200107. [PMID: 27170852 PMCID: PMC4819231 DOI: 10.1109/jtehm.2013.2268850] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Revised: 05/27/2013] [Accepted: 06/07/2013] [Indexed: 11/23/2022]
Abstract
“Language is a form and not a substance”
— Ferdinand de Saussure Objective: Analyses of speech processes in schizophrenia are invariably focused on words as vocal signals. The results of such analyses are, however, strongly related to content, and may be language- and culture-dependent. Little attention has been paid to a pure measure of the form of speech, unrelated to its content: inter-words time intervals. Method: 15 patients with schizophrenia and 15 healthy volunteers are recorded spontaneously speaking for 10–15 min. Recordings are analyzed for inter-words time intervals using the following non-linear dynamical methods: unstable periodic orbits, correlation dimension, bi-spectral analysis, and symbolic dynamics. Results: The series of inter-word time intervals in normal speech have the characteristics of a low-dimensional chaotic attractor with a correlation dimension of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$3.2\pm 1.1$\end{document}. Deconstruction of the attractor appears in psychosis with re-establishment after anti-psychotic treatment. Shannon entropy, a measure of the complexity in the time series, calculated from symbolic dynamics, is higher for psychotic speech, which is also characterized by higher levels of phase coupling: higher bicoherence, obtained using bi-spectral analysis. Conclusion: Non-linear dynamical methods applied to ITIs thus enable a content-independent, pure measure of the form of normal thought, its distortion in psychosis, and its restoration under treatment.
Collapse
Affiliation(s)
- Doron Todder
- Ben Gurion University of the Negev Psychiatry Department Beer Sheva Israel 84105
| | - Sofia Avissar
- Ben Gurion University of the Negev Clinical Pharmacology Department Beer Sheva Israel 84105
| | - Gabriel Schreiber
- Ben Gurion University of the Negev Faculty of Health Sciences Beer Sheva Israel 84105
| |
Collapse
|
20
|
Henríquez Rodríguez P, Alonso Hernández JB, Ferrer Ballester MA, Travieso González CM, Orozco-Arroyave JR. Global Selection of Features for Nonlinear Dynamics Characterization of Emotional Speech. Cognit Comput 2012. [DOI: 10.1007/s12559-012-9157-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
21
|
Veiga J, Lopes AJ, Jansen JM, Melo PL. Airflow pattern complexity and airway obstruction in asthma. J Appl Physiol (1985) 2011; 111:412-9. [PMID: 21565988 DOI: 10.1152/japplphysiol.00267.2011] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The scientific and clinical value of a measure of complexity is potentially enormous because complexity appears to be lost in the presence of illness. The authors examined the effect of elevated airway obstruction on the complexity of the airflow (Q) pattern of asthmatic patients analyzing the airflow approximate entropy (ApEnQ). This study involved 11 healthy controls, 11 asthmatics with normal spirometric exams, and 40 asthmatics with mild (14), moderate (14), and severe (12) airway obstructions. A significant (P < 0.02) reduction in the ApEnQ was observed in the asthmatic patients. This reduction was significantly correlated with spirometric indexes of airway obstruction [FEV(1) (%): R = 0.31, P = 0.013] and the total respiratory impedance (R = -0.39; P < 0.002). These results are in close agreement with pathophysiological fundamentals and suggest that the airflow pattern becomes less complex in asthmatic patients, which may reduce the adaptability of the respiratory system to perform the exercise that is associated with daily life activities. This analysis was able to identify respiratory changes in patients with mild obstruction with an adequate accuracy (83%). Higher accuracies were obtained in patients with moderate and severe obstructions. The analysis of airflow pattern complexity by the ApEnQ was able to provide new information concerning the changes associated with asthma. In addition, this analysis was also able to contribute to the detection of the adverse effects of asthma. Because these measurements are easy to perform, such a technique may represent an alternative and/or a complement to other conventional exams to help the clinical evaluations of asthmatic patients.
Collapse
Affiliation(s)
- Juliana Veiga
- Biomedical Instrumentation Laboratory, Institute of Biology and Faculty of Engineering, State University of Rio de Janeiro, Rio de Janeiro, Brazil
| | | | | | | |
Collapse
|
22
|
Arias-Londoño JD, Godino-Llorente JI, Sáenz-Lechón N, Osma-Ruiz V, Castellanos-Domínguez G. Automatic detection of pathological voices using complexity measures, noise parameters, and mel-cepstral coefficients. IEEE Trans Biomed Eng 2011; 58:370-9. [PMID: 21257362 DOI: 10.1109/tbme.2010.2089052] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This paper proposes a new approach to improve the amount of information extracted from the speech aiming to increase the accuracy of a system developed for the automatic detection of pathological voices. The paper addresses the discrimination capabilities of 11 features extracted using nonlinear analysis of time series. Two of these features are based on conventional nonlinear statistics (largest Lyapunov exponent and correlation dimension), two are based on recurrence and fractal-scaling analysis, and the remaining are based on different estimations of the entropy. Moreover, this paper uses a strategy based on combining classifiers for fusing the nonlinear analysis with the information provided by classic parameterization approaches found in the literature (noise parameters and mel-frequency cepstral coefficients). The classification was carried out in two steps using, first, a generative and, later, a discriminative approach. Combining both classifiers, the best accuracy obtained is 98.23% ± 0.001.
Collapse
|
23
|
|