1
|
Shaikh AAS, Bhargavi MS, Naik GR. Unraveling the complexities of pathological voice through saliency analysis. Comput Biol Med 2023; 166:107566. [PMID: 37857135 DOI: 10.1016/j.compbiomed.2023.107566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/14/2023] [Accepted: 10/10/2023] [Indexed: 10/21/2023]
Abstract
The human voice is an essential communication tool, but various disorders and habits can disrupt it. Diagnosis of pathological and abnormal voices is very important. Conventional diagnosis of these voice pathologies can be invasive and costly. Voice pathology disorders can be effectively detected using Artificial Intelligence and computer-aided voice pathology classification tools. Previous studies focused primarily on binary classification, leaving limited attention to multi-class classification. This study proposes three different neural network architectures to investigate the feature characteristics of three voice pathologies-Hyperkinetic Dysphonia, Hypokinetic Dysphonia, Reflux Laryngitis, and healthy voices using multi-class classification and the Voice ICar fEDerico II (VOICED) dataset. The study proposes UNet++ autoencoder-based denoiser techniques for accurate feature extraction to overcome noisy data. The architectures include a Multi-Layer Perceptron (MLP) trained on structured feature sets, a Short-Time Fourier Transform (STFT) model, and a Mel-Frequency Cepstral Coefficients (MFCC) model. The MLP model on 143 features achieved 97.1% accuracy, while the STFT model showed similar performance with increased sensitivity of 99.8%. The MFCC model maintained 97.1% accuracy but with a smaller model size and improved accuracy on the Reflux Laryngitis class. The study identifies crucial features through saliency analysis and reveals that detecting voice abnormalities requires the identification of regions of inaudible high-pitch sounds. Additionally, the study highlights the challenges posed by limited and disjointed pathological voice databases and proposes solutions for enhancing the performance of voice abnormality classification. Overall, the study's findings have potential applications in clinical applications and specialized audio-capturing tools.
Collapse
Affiliation(s)
- Abdullah Abdul Sattar Shaikh
- Department of Computer Science and Engineering, Bangalore Institute of Technology, Bangalore, 560004, Karnataka, India.
| | - M S Bhargavi
- Department of Computer Science and Engineering, Bangalore Institute of Technology, Bangalore, 560004, Karnataka, India.
| | - Ganesh R Naik
- Adelaide Institute for Sleep Health, Flinders University, Bedford Park 5042, Adelaide, SA, Australia.
| |
Collapse
|
2
|
Alhussain G, Shuweihdi F, Abd-alrazaq A, Alali H, Househ M. The Effectiveness of Supervised Machine Learning in Screening and Diagnosing Voice Disorders: A Systematic Review and Meta-Analysis (Preprint).. [DOI: 10.2196/preprints.38472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
BACKGROUND
Voice screening and diagnosis are processes that are used during voice disorders investigations. Both have limited standardized tests, which are affected by the clinician’s experience and subjective judgment. Machine learning (ML) algorithms were introduced and employed in screening/diagnosing patients’ voices as an objective tool. The effectiveness of ML algorithms in assessing and diagnosing voice disorders has been investigated by numerous studies.
OBJECTIVE
This systematic review aims to assess the effectiveness of ML algorithms in screening and diagnosing voice disorders.
METHODS
An electronic search was conducted in five databases. We included studies that examined the performance (accuracy, sensitivity, and specificity) of any ML algorithms in detecting abnormal voice samples. Two reviewers independently selected the studies, extracted data from the included studies, and assessed the risk of bias in the included studies. The methodological quality of each study was assessed using the QUADAS-2 tool. Characteristics of studies, population, and index tests were extracted. Meta-analyses were conducted for pooling accuracy, sensitivity, and specificity of ML techniques. Sources of heterogeneity were addressed by excluding some studies and discussing the possible sources of it.
RESULTS
Out of 1409 records retrieved, 13 studies were included (participants: 4079) in this review. Thirteen machine learning techniques were used in the included studies, but the most commonly used technique was SVM. The pooled accuracy, sensitivity, and specificity of ML techniques in screening voice disorders were 93%, 96%, and 93%, respectively. LS-SVM had the highest accuracy (99%) while K-NN had the highest sensitivity (98%) and specificity (98%). Quadric Discriminant analysis (QDA) achieved the lowest accuracy (91%), sensitivity (89%), and specificity (89%).
CONCLUSIONS
ML showed promising findings in screening voice disorders. However, the findings could not be conclusive in diagnosing voice disorders due to the limited number of studies that used ML for diagnosing purposes, thus, more investigations need to be made. Accordingly, it might not be possible to use ML as a substitution for the current diagnostic tools. Instead, it might be used as a decision support tool for clinicians to assess their patients, this could improve the management process for voice disorders assessment.
Collapse
|
3
|
Al-Hussain G, Shuweihdi F, Alali H, Househ M, Abd-Alrazaq A. The Effectiveness of Supervised Machine Learning in Screening and Diagnosing Voice Disorders: A Systematic Review and Meta-Analysis (Preprint). J Med Internet Res 2022; 24:e38472. [PMID: 36239999 PMCID: PMC9617188 DOI: 10.2196/38472] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 06/17/2022] [Accepted: 07/28/2022] [Indexed: 11/13/2022] Open
Abstract
Background Objective Methods Results Conclusions Trial Registration
Collapse
Affiliation(s)
- Ghada Al-Hussain
- Department of Unified Health Record, Lean for Business Services, Riyadh, Saudi Arabia
| | - Farag Shuweihdi
- Leeds Institute of Health Sciences, School of Medicine, University of Leads, Leeds, United Kingdom
| | - Haitham Alali
- Health Management Department, Faculty of Medical and Health Sciences, Liwa College of Technology, Abu Dhabi, United Arab Emirates
| | - Mowafa Househ
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Alaa Abd-Alrazaq
- AI Center for Precision Health, Weill Cornell Medicine, Doha, Qatar
| |
Collapse
|
4
|
Diagnosis of Parkinson’s Disease at an Early Stage Using Volume Rendering SPECT Image Slices. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2019. [DOI: 10.1007/s13369-019-04152-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
5
|
On the design of automatic voice condition analysis systems. Part I: Review of concepts and an insight to the state of the art. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.12.024] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
6
|
Hegde S, Shetty S, Rai S, Dodderi T. A Survey on Machine Learning Approaches for Automatic Detection of Voice Disorders. J Voice 2018; 33:947.e11-947.e33. [PMID: 30316551 DOI: 10.1016/j.jvoice.2018.07.014] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 07/06/2018] [Accepted: 07/10/2018] [Indexed: 10/28/2022]
Abstract
The human voice production system is an intricate biological device capable of modulating pitch and loudness. Inherent internal and/or external factors often damage the vocal folds and result in some change of voice. The consequences are reflected in body functioning and emotional standing. Hence, it is paramount to identify voice changes at an early stage and provide the patient with an opportunity to overcome any ramification and enhance their quality of life. In this line of work, automatic detection of voice disorders using machine learning techniques plays a key role, as it is proven to help ease the process of understanding the voice disorder. In recent years, many researchers have investigated techniques for an automated system that helps clinicians with early diagnosis of voice disorders. In this paper, we present a survey of research work conducted on automatic detection of voice disorders and explore how it is able to identify the different types of voice disorders. We also analyze different databases, feature extraction techniques, and machine learning approaches used in these research works.
Collapse
Affiliation(s)
- Sarika Hegde
- NMAM Institute of Technology, Udupi, Karnataka, India.
| | | | - Smitha Rai
- NMAM Institute of Technology, Udupi, Karnataka, India
| | - Thejaswi Dodderi
- Nitte Institute of Speech & Hearing, Mangaluru, Karnataka, India
| |
Collapse
|
7
|
Deshpande PS, Manikandan MS. Effective Glottal Instant Detection and Electroglottographic Parameter Extraction for Automated Voice Pathology Assessment. IEEE J Biomed Health Inform 2018; 22:398-408. [DOI: 10.1109/jbhi.2017.2654683] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
8
|
An Expert Diagnosis System for Parkinson Disease Based on Genetic Algorithm-Wavelet Kernel-Extreme Learning Machine. PARKINSONS DISEASE 2016; 2016:5264743. [PMID: 27274882 PMCID: PMC4871978 DOI: 10.1155/2016/5264743] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 03/24/2016] [Accepted: 03/29/2016] [Indexed: 11/17/2022]
Abstract
Parkinson disease is a major public health problem all around the world. This paper proposes an expert disease diagnosis system for Parkinson disease based on genetic algorithm- (GA-) wavelet kernel- (WK-) Extreme Learning Machines (ELM). The classifier used in this paper is single layer neural network (SLNN) and it is trained by the ELM learning method. The Parkinson disease datasets are obtained from the UCI machine learning database. In wavelet kernel-Extreme Learning Machine (WK-ELM) structure, there are three adjustable parameters of wavelet kernel. These parameters and the numbers of hidden neurons play a major role in the performance of ELM. In this study, the optimum values of these parameters and the numbers of hidden neurons of ELM were obtained by using a genetic algorithm (GA). The performance of the proposed GA-WK-ELM method is evaluated using statical methods such as classification accuracy, sensitivity and specificity analysis, and ROC curves. The calculated highest classification accuracy of the proposed GA-WK-ELM method is found as 96.81%.
Collapse
|
9
|
Diagnosing Parkinson's Diseases Using Fuzzy Neural System. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2016; 2016:1267919. [PMID: 26881009 PMCID: PMC4736962 DOI: 10.1155/2016/1267919] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 12/14/2015] [Indexed: 11/25/2022]
Abstract
This study presents the design of the recognition system that will discriminate between healthy people and people with Parkinson's disease. A diagnosing of Parkinson's diseases is performed using fusion of the fuzzy system and neural networks. The structure and learning algorithms of the proposed fuzzy neural system (FNS) are presented. The approach described in this paper allows enhancing the capability of the designed system and efficiently distinguishing healthy individuals. It was proved through simulation of the system that has been performed using data obtained from UCI machine learning repository. A comparative study was carried out and the simulation results demonstrated that the proposed fuzzy neural system improves the recognition rate of the designed system.
Collapse
|
10
|
Mehta DD, Van Stan JH, Zañartu M, Ghassemi M, Guttag JV, Espinoza VM, Cortés JP, Cheyne HA, Hillman RE. Using Ambulatory Voice Monitoring to Investigate Common Voice Disorders: Research Update. Front Bioeng Biotechnol 2015; 3:155. [PMID: 26528472 PMCID: PMC4607864 DOI: 10.3389/fbioe.2015.00155] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 09/23/2015] [Indexed: 11/28/2022] Open
Abstract
Many common voice disorders are chronic or recurring conditions that are likely to result from inefficient and/or abusive patterns of vocal behavior, referred to as vocal hyperfunction. The clinical management of hyperfunctional voice disorders would be greatly enhanced by the ability to monitor and quantify detrimental vocal behaviors during an individual’s activities of daily life. This paper provides an update on ongoing work that uses a miniature accelerometer on the neck surface below the larynx to collect a large set of ambulatory data on patients with hyperfunctional voice disorders (before and after treatment) and matched-control subjects. Three types of analysis approaches are being employed in an effort to identify the best set of measures for differentiating among hyperfunctional and normal patterns of vocal behavior: (1) ambulatory measures of voice use that include vocal dose and voice quality correlates, (2) aerodynamic measures based on glottal airflow estimates extracted from the accelerometer signal using subject-specific vocal system models, and (3) classification based on machine learning and pattern recognition approaches that have been used successfully in analyzing long-term recordings of other physiological signals. Preliminary results demonstrate the potential for ambulatory voice monitoring to improve the diagnosis and treatment of common hyperfunctional voice disorders.
Collapse
Affiliation(s)
- Daryush D Mehta
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital , Boston, MA , USA ; Department of Surgery, Harvard Medical School , Boston, MA , USA ; MGH Institute of Health Professions, Massachusetts General Hospital , Boston, MA , USA
| | - Jarrad H Van Stan
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital , Boston, MA , USA ; MGH Institute of Health Professions, Massachusetts General Hospital , Boston, MA , USA
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María , Valparaíso , Chile
| | - Marzyeh Ghassemi
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology , Cambridge, MA , USA
| | - John V Guttag
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology , Cambridge, MA , USA
| | - Víctor M Espinoza
- Department of Electronic Engineering, Universidad Técnica Federico Santa María , Valparaíso , Chile ; Department of Music and Sonology, Faculty of Arts, Universidad de Chile , Santiago , Chile
| | - Juan P Cortés
- Department of Electronic Engineering, Universidad Técnica Federico Santa María , Valparaíso , Chile
| | - Harold A Cheyne
- Bioacoustics Research Laboratory, Laboratory of Ornithology, Cornell University , Ithaca, NY , USA
| | - Robert E Hillman
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital , Boston, MA , USA ; Department of Surgery, Harvard Medical School , Boston, MA , USA ; MGH Institute of Health Professions, Massachusetts General Hospital , Boston, MA , USA
| |
Collapse
|
11
|
|
12
|
Akbari A, Arjmandi MK. An efficient voice pathology classification scheme based on applying multi-layer linear discriminant analysis to wavelet packet-based features. Biomed Signal Process Control 2014. [DOI: 10.1016/j.bspc.2013.11.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
13
|
Roy N, Barkmeier-Kraemer J, Eadie T, Sivasankar MP, Mehta D, Paul D, Hillman R. Evidence-based clinical voice assessment: a systematic review. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2013. [PMID: 23184134 DOI: 10.1044/1058-0360(2012/12-0014)] [Citation(s) in RCA: 203] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
PURPOSE To determine what research evidence exists to support the use of voice measures in the clinical assessment of patients with voice disorders. METHOD The American Speech-Language-Hearing Association (ASHA) National Center for Evidence-Based Practice in Communication Disorders staff searched 29 databases for peer-reviewed English-language articles between January 1930 and April 2009 that included key words pertaining to objective and subjective voice measures, voice disorders, and diagnostic accuracy. The identified articles were systematically assessed by an ASHA-appointed committee employing a modification of the critical appraisal of diagnostic evidence rating system. RESULTS One hundred articles met the search criteria. The majority of studies investigated acoustic measures (60%) and focused on how well a test method identified the presence or absence of a voice disorder (78%). Only 17 of the 100 articles were judged to contain adequate evidence for the measures studied to be formally considered for inclusion in clinical voice assessment. CONCLUSION Results provide evidence for selected acoustic, laryngeal imaging-based, auditory-perceptual, functional, and aerodynamic measures to be used as effective components in a clinical voice evaluation. However, there is clearly a pressing need for further high-quality research to produce sufficient evidence on which to recommend a comprehensive set of methods for a standard clinical voice evaluation.
Collapse
Affiliation(s)
- Nelson Roy
- University of Utah, Salt Lake City, UT, USA.
| | | | | | | | | | | | | |
Collapse
|
14
|
An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine. Biomed Signal Process Control 2012. [DOI: 10.1016/j.bspc.2011.03.010] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
15
|
Pathological Likelihood Index as a Measurement of the Degree of Voice Normality and Perceived Hoarseness. J Voice 2010; 24:667-77. [DOI: 10.1016/j.jvoice.2009.04.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 04/20/2009] [Indexed: 11/22/2022]
|
16
|
Verikas A, Gelzinis A, Bacauskiene M, Hållander M, Uloza V, Kaseta M. Combining image, voice, and the patient’s questionnaire data to categorize laryngeal disorders. Artif Intell Med 2010; 49:43-50. [DOI: 10.1016/j.artmed.2010.02.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2008] [Revised: 01/19/2010] [Accepted: 02/16/2010] [Indexed: 11/28/2022]
|
17
|
|
18
|
Godino-Llorente JI, Osma-Ruiz V, Sáenz-Lechón N, Gómez-Vilda P, Blanco-Velasco M, Cruz-Roldán F. The Effectiveness of the Glottal to Noise Excitation Ratio for the Screening of Voice Disorders. J Voice 2010; 24:47-56. [DOI: 10.1016/j.jvoice.2008.04.006] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2008] [Accepted: 04/22/2008] [Indexed: 10/21/2022]
|
19
|
Khadivi Heris H, Seyed Aghazadeh B, Nikkhah-Bahrami M. Optimal feature selection for the assessment of vocal fold disorders. Comput Biol Med 2009; 39:860-8. [DOI: 10.1016/j.compbiomed.2009.06.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2007] [Revised: 02/22/2009] [Accepted: 06/25/2009] [Indexed: 11/16/2022]
|
20
|
Advances in laryngeal imaging. Eur Arch Otorhinolaryngol 2009; 266:1509-20. [PMID: 19618198 DOI: 10.1007/s00405-009-1050-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2008] [Accepted: 07/07/2009] [Indexed: 10/20/2022]
Abstract
Imaging and image analysis became an important issue in laryngeal diagnostics. Various techniques, such as videostroboscopy, videokymography, digital kymography, or ultrasonography are available and are used in research and clinical practice. This paper reviews recent advances in imaging for laryngeal diagnostics.
Collapse
|
21
|
Godino-Llorente J, Fraile R, Sáenz-Lechón N, Osma-Ruiz V, Gómez-Vilda P. Automatic detection of voice impairments from text-dependent running speech. Biomed Signal Process Control 2009. [DOI: 10.1016/j.bspc.2009.01.007] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
22
|
Little MA, McSharry PE, Hunter EJ, Spielman J, Ramig LO. Suitability of dysphonia measurements for telemonitoring of Parkinson's disease. IEEE Trans Biomed Eng 2009; 56:1015. [PMID: 21399744 PMCID: PMC3051371 DOI: 10.1109/tbme.2008.2005954] [Citation(s) in RCA: 229] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We present an assessment of the practical value of existing traditional and non-standard measures for discriminating healthy people from people with Parkinson's disease (PD) by detecting dysphonia. We introduce a new measure of dysphonia, Pitch Period Entropy (PPE), which is robust to many uncontrollable confounding effects including noisy acoustic environments and normal, healthy variations in voice frequency. We collected sustained phonations from 31 people, 23 with PD. We then selected 10 highly uncorrelated measures, and an exhaustive search of all possible combinations of these measures finds four that in combination lead to overall correct classification performance of 91.4%, using a kernel support vector machine. In conclusion, we find that non-standard methods in combination with traditional harmonics-to-noise ratios are best able to separate healthy from PD subjects. The selected non-standard methods are robust to many uncontrollable variations in acoustic environment and individual subjects, and are thus well-suited to telemonitoring applications.
Collapse
Affiliation(s)
- Max A Little
- Systems Analysis, Modelling and Prediction Group, University of Oxford, UK
| | | | | | | | | |
Collapse
|
23
|
Verikas A, Gelzinis A, Bacauskiene M, Uloza V, Kaseta M. Using the patient's questionnaire data to screen laryngeal disorders. Comput Biol Med 2009; 39:148-55. [PMID: 19144329 DOI: 10.1016/j.compbiomed.2008.11.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Revised: 11/28/2008] [Accepted: 11/28/2008] [Indexed: 10/21/2022]
Abstract
This paper is concerned with soft computing techniques for screening laryngeal disorders based on patient's questionnaire data. By applying the genetic search, the most important questionnaire statements are determined and a support vector machine (SVM) classifier is designed for categorizing the questionnaire data into the healthy, nodular and diffuse classes. To explore the obtained automated decisions, the curvilinear component analysis (CCA) in the space of decisions as well as questionnaire statements is applied. When testing the developed tools on the set of data collected from 180 patients, the classification accuracy of 85.0% was obtained. Bearing in mind the subjective nature of the data, the obtained classification accuracy is rather encouraging. The CCA allows obtaining ordered two-dimensional maps of the data in various spaces and facilitates the exploration of automated decisions provided by the system and determination of relevant groups of patients for various comparisons.
Collapse
Affiliation(s)
- A Verikas
- Department of Applied Electronics, Kaunas University of Technology, Lithuania.
| | | | | | | | | |
Collapse
|
24
|
Wormald RN, Moran RJ, Reilly RB, Lacy PD. Performance of an Automated, Remote System to Detect Vocal Fold Paralysis. Ann Otol Rhinol Laryngol 2008; 117:834-8. [DOI: 10.1177/000348940811701107] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Objectives: The aim of this project was to evaluate the diagnostic accuracy of an automated, remote system for correctly identifying vocal fold paralysis. Methods: Consecutive patients presenting for vocal analysis at the Beaumont Hospital Voice Clinic were enrolled in this prospective, blinded study. Control patients were enlisted from routine otolaryngology clinics. All patients were assessed by standard history, clinical examination, and flexible laryngoscopy or videostroboscopy. The subjects were blindly assessed by remote voice analysis. Sustained phonation was recorded over a standard telephone network. Each recording was subjected to automated, remote analysis of extracted features, including measures of pitch perturbation, amplitude perturbation, and harmonics-to-noise ratio. The presence or absence of a vocal fold paralysis as determined by the automated classifier was recorded and correlated with clinical findings. Results: Seventy-eight consecutive patients were enrolled in the study. The automated speech analysis system demonstrated 92% sensitivity and 75% specificity for detecting vocal fold paralysis. Conclusions: This pilot study, assessing an automated system that analyzes audiological data remotely over the standard telephone network, suggests that with further “training” it may become a reliable, simple, and convenient means for screening patients for voice disorders.
Collapse
|
25
|
Gelzinis A, Verikas A, Bacauskiene M. Automated speech analysis applied to laryngeal disease categorization. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2008; 91:36-47. [PMID: 18346812 DOI: 10.1016/j.cmpb.2008.01.008] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2006] [Revised: 01/22/2008] [Accepted: 01/31/2008] [Indexed: 05/26/2023]
Abstract
The long-term goal of the work is a decision support system for diagnostics of laryngeal diseases. Colour images of vocal folds, a voice signal, and questionnaire data are the information sources to be used in the analysis. This paper is concerned with automated analysis of a voice signal applied to screening of laryngeal diseases. The effectiveness of 11 different feature sets in classification of voice recordings of the sustained phonation of the vowel sound /a/ into a healthy and two pathological classes, diffuse and nodular, is investigated. A k-NN classifier, SVM, and a committee build using various aggregation options are used for the classification. The study was made using the mixed gender database containing 312 voice recordings. The correct classification rate of 84.6% was achieved when using an SVM committee consisting of four members. The pitch and amplitude perturbation measures, cepstral energy features, autocorrelation features as well as linear prediction cosine transform coefficients were amongst the feature sets providing the best performance. In the case of two class classification, using recordings from 79 subjects representing the pathological and 69 the healthy class, the correct classification rate of 95.5% was obtained from a five member committee. Again the pitch and amplitude perturbation measures provided the best performance.
Collapse
Affiliation(s)
- A Gelzinis
- Department of Applied Electronics, Kaunas University of Technology, LT-51368, Kaunas, Lithuania.
| | | | | |
Collapse
|
26
|
Crovato CDP, Schuck A. The use of wavelet packet transform and artificial neural networks in analysis and classification of dysphonic voices. IEEE Trans Biomed Eng 2007; 54:1898-900. [PMID: 17926690 DOI: 10.1109/tbme.2006.889780] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
This paper presents a dysphonic voice classification system using the wavelet packet transform and the best basis algorithm (BBA) as dimensionality reductor and 06 artificial neural networks (ANN) acting as specialist systems. Each ANN was a 03-layer multilayer perceptron with 64 input nodes, 01 output node and in the intermediary layer the number of neurons depends on the related training pathology group. The dysphonic voice database was separated in five pathology groups and one healthy control group. Each ANN was trained and associated with one of the 06 groups, and fed by the best base tree (BBT) nodes' entropy values, using the multiple cross validation (MCV) method and the leave-one-out (LOO) variation technique and success rates obtained were 87.5%, 95.31%, 87.5%, 100%, 96.87% and 89.06% for the groups 01 to 06, respectively.
Collapse
Affiliation(s)
- César David Paredes Crovato
- Departamento de Engenharia Elétrica, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, CEP 90.035-190, Brazil.
| | | |
Collapse
|
27
|
Fonseca ES, Guido RC, Scalassara PR, Maciel CD, Pereira JC. Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders. Comput Biol Med 2007; 37:571-8. [PMID: 17078942 DOI: 10.1016/j.compbiomed.2006.08.008] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
This work describes a novel algorithm to identify laryngeal pathologies, by the digital analysis of the voice. It is based on Daubechies' discrete wavelet transform (DWT-db), linear prediction coefficients (LPC), and least squares support vector machines (LS-SVM). Wavelets with different support-sizes and three LS-SVM kernels are compared. Particularly, the proposed approach, implemented with modest computer requirements, leads to an adequate larynx pathology classifier to identify nodules in vocal folds. It presents over 90% of classification accuracy and has a low order of computational complexity in relation to the speech signal's length.
Collapse
Affiliation(s)
- Everthon Silva Fonseca
- SEL/EESC/USP and IFSC/USP - Department of Electrical Engineering, School of Engineering at São Carlos and Institute of Physics at Sao Carlos, University of São Paulo, SP, Brazil.
| | | | | | | | | |
Collapse
|
28
|
Godino-Llorente JI, Gómez-Vilda P, Blanco-Velasco M. Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters. IEEE Trans Biomed Eng 2006; 53:1943-53. [PMID: 17019858 DOI: 10.1109/tbme.2006.871883] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Voice diseases have been increasing dramatically in recent times due mainly to unhealthy social habits and voice abuse. These diseases must be diagnosed and treated at an early stage, especially in the case of larynx cancer. It is widely recognized that vocal and voice diseases do not necessarily cause changes in voice quality as perceived by a listener. Acoustic analysis could be a useful tool to diagnose this type of disease. Preliminary research has shown that the detection of voice alterations can be carried out by means of Gaussian mixture models and short-term mel cepstral parameters complemented by frame energy together with first and second derivatives. This paper, using the F-Ratio and Fisher's discriminant ratio, will demonstrate that the detection of voice impairments can be performed using both mel cesptral vectors and their first derivative, ignoring the second derivative.
Collapse
|
29
|
Moran RJ, Reilly RB, de Chazal P, Lacy PD. Telephony-Based Voice Pathology Assessment Using Automated Speech Analysis. IEEE Trans Biomed Eng 2006; 53:468-77. [PMID: 16532773 DOI: 10.1109/tbme.2005.869776] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A system for remotely detecting vocal fold pathologies using telephone-quality speech is presented. The system uses a linear classifier, processing measurements of pitch perturbation, amplitude perturbation and harmonic-to-noise ratio derived from digitized speech recordings. Voice recordings from the Disordered Voice Database Model 4337 system were used to develop and validate the system. Results show that while a sustained phonation, recorded in a controlled environment, can be classified as normal or pathologic with accuracy of 89.1%, telephone-quality speech can be classified as normal or pathologic with an accuracy of 74.2%, using the same scheme. Amplitude perturbation features prove most robust for telephone-quality speech. The pathologic recordings were then subcategorized into four groups, comprising normal, neuromuscular pathologic, physical pathologic and mixed (neuromuscular with physical) pathologic. A separate classifier was developed for classifying the normal group from each pathologic subcategory. Results show that neuromuscular disorders could be detected remotely with an accuracy of 87%, physical abnormalities with an accuracy of 78% and mixed pathology voice with an accuracy of 61%. This study highlights the real possibility for remote detection and diagnosis of voice pathology.
Collapse
Affiliation(s)
- Rosalyn J Moran
- Department of Electronic and Electrical Engineering, University College Dublin, Dublin 4, Ireland.
| | | | | | | |
Collapse
|
30
|
Godino-Llorente JI, Gómez-Vilda P. Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans Biomed Eng 2004; 51:380-4. [PMID: 14765711 DOI: 10.1109/tbme.2003.820386] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
It is well known that vocal and voice diseases do not necessarily cause perceptible changes in the acoustic voice signal. Acoustic analysis is a useful tool to diagnose voice diseases being a complementary technique to other methods based on direct observation of the vocal folds by laryngoscopy. Through the present paper two neural-network based classification approaches applied to the automatic detection of voice disorders will be studied. Structures studied are multilayer perceptron and learning vector quantization fed using short-term vectors calculated accordingly to the well-known Mel Frequency Coefficient cepstral parameterization. The paper shows that these architectures allow the detection of voice disorders--including glottic cancer--under highly reliable conditions. Within this context, the Learning Vector quantization methodology demonstrated to be more reliable than the multilayer perceptron architecture yielding 96% frame accuracy under similar working conditions.
Collapse
Affiliation(s)
- J I Godino-Llorente
- Universidad Politécnica de Madrid, Escuela Universitaria de Ingeniería Técnica de Telecomunicación, Dpt. of Ingeniería de Circuitos y Sistemas, Ctra. Valencia Km. 7, 28031, Madrid.
| | | |
Collapse
|