Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chen L, Chen J. Deep Neural Network for Automatic Classification of Pathological Voice Signals. J Voice 2020;36:288.e15-288.e24. [PMID: 32660846 DOI: 10.1016/j.jvoice.2020.05.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 05/17/2020] [Accepted: 05/26/2020] [Indexed: 10/23/2022]

For:	Chen L, Chen J. Deep Neural Network for Automatic Classification of Pathological Voice Signals. J Voice 2020;36:288.e15-288.e24. [PMID: 32660846 DOI: 10.1016/j.jvoice.2020.05.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 05/17/2020] [Accepted: 05/26/2020] [Indexed: 10/23/2022]

Number

Cited by Other Article(s)

Barlow J, Sragi Z, Rivera-Rivera G, Al-Awady A, Daşdöğen Ü, Courey MS, Kirke DN. The Use of Deep Learning Software in the Detection of Voice Disorders: A Systematic Review. Otolaryngol Head Neck Surg 2024;170:1531-1543. [PMID: 38168017 DOI: 10.1002/ohn.636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 11/30/2023] [Accepted: 12/07/2023] [Indexed: 01/05/2024]

Shaikh AAS, Bhargavi MS, Naik GR. Unraveling the complexities of pathological voice through saliency analysis. Comput Biol Med 2023;166:107566. [PMID: 37857135 DOI: 10.1016/j.compbiomed.2023.107566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/14/2023] [Accepted: 10/10/2023] [Indexed: 10/21/2023]

Abstract

The human voice is an essential communication tool, but various disorders and habits can disrupt it. Diagnosis of pathological and abnormal voices is very important. Conventional diagnosis of these voice pathologies can be invasive and costly. Voice pathology disorders can be effectively detected using Artificial Intelligence and computer-aided voice pathology classification tools. Previous studies focused primarily on binary classification, leaving limited attention to multi-class classification. This study proposes three different neural network architectures to investigate the feature characteristics of three voice pathologies-Hyperkinetic Dysphonia, Hypokinetic Dysphonia, Reflux Laryngitis, and healthy voices using multi-class classification and the Voice ICar fEDerico II (VOICED) dataset. The study proposes UNet++ autoencoder-based denoiser techniques for accurate feature extraction to overcome noisy data. The architectures include a Multi-Layer Perceptron (MLP) trained on structured feature sets, a Short-Time Fourier Transform (STFT) model, and a Mel-Frequency Cepstral Coefficients (MFCC) model. The MLP model on 143 features achieved 97.1% accuracy, while the STFT model showed similar performance with increased sensitivity of 99.8%. The MFCC model maintained 97.1% accuracy but with a smaller model size and improved accuracy on the Reflux Laryngitis class. The study identifies crucial features through saliency analysis and reveals that detecting voice abnormalities requires the identification of regions of inaudible high-pitch sounds. Additionally, the study highlights the challenges posed by limited and disjointed pathological voice databases and proposes solutions for enhancing the performance of voice abnormality classification. Overall, the study's findings have potential applications in clinical applications and specialized audio-capturing tools.

Collapse

Liu GS, Hodges JM, Yu J, Sung CK, Erickson‐DiRenzo E, Doyle PC. End-to-end deep learning classification of vocal pathology using stacked vowels. Laryngoscope Investig Otolaryngol 2023;8:1312-1318. [PMID: 37899847 PMCID: PMC10601590 DOI: 10.1002/lio2.1144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 08/13/2023] [Indexed: 10/31/2023] Open

Abstract

Objectives

Advances in artificial intelligence (AI) technology have increased the feasibility of classifying voice disorders using voice recordings as a screening tool. This work develops upon previous models that take in single vowel recordings by analyzing multiple vowel recordings simultaneously to enhance prediction of vocal pathology.

Methods

Voice samples from the Saarbruecken Voice Database, including three sustained vowels (/a/, /i/, /u/) from 687 healthy human participants and 334 dysphonic patients, were used to train 1-dimensional convolutional neural network models for multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings. Three models were trained: (1) a baseline model that analyzed individual vowels in isolation, (2) a stacked vowel model that analyzed three vowels (/a/, /i/, /u/) in the neutral pitch simultaneously, and (3) a stacked pitch model that analyzed the /a/ vowel in three pitches (low, neutral, and high) simultaneously.

Results

For multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings, the stacked vowel model demonstrated higher performance compared with the baseline and stacked pitch models (F1 score 0.81 vs. 0.77 and 0.78, respectively). Specifically, the stacked vowel model achieved higher performance for class-specific classification of hyperfunctional dysphonia voice samples compared with the baseline and stacked pitch models (F1 score 0.56 vs. 0.49 and 0.50, respectively).

Conclusions

This study demonstrates the feasibility and potential of analyzing multiple sustained vowel recordings simultaneously to improve AI-driven screening and classification of vocal pathology. The stacked vowel model architecture in particular offers promise to enhance such an approach.

Lay Summary

AI analysis of multiple vowel recordings can improve classification of voice pathologies compared with models using a single sustained vowel and offer a strategy to enhance AI-driven screening of voice disorders.

Level of Evidence

Collapse

Zhang J, Wu J, Qiu Y, Song A, Li W, Li X, Liu Y. Intelligent speech technologies for transcription, disease diagnosis, and medical equipment interactive control in smart hospitals: A review. Comput Biol Med 2023;153:106517. [PMID: 36623438 PMCID: PMC9814440 DOI: 10.1016/j.compbiomed.2022.106517] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 12/23/2022] [Accepted: 12/31/2022] [Indexed: 01/07/2023]

Wang J, Xu H, Peng X, Liu J, He C. Pathological voice classification based on multi-domain features and deep hierarchical extreme learning machine. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023;153:423. [PMID: 36732280 DOI: 10.1121/10.0016869] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 12/29/2022] [Indexed: 06/18/2023]

Maskeliūnas R, Kulikajevas A, Damaševičius R, Pribuišis K, Ulozaitė-Stanienė N, Uloza V. Lightweight Deep Learning Model for Assessment of Substitution Voicing and Speech after Laryngeal Carcinoma Surgery. Cancers (Basel) 2022;14:cancers14102366. [PMID: 35625971 PMCID: PMC9139213 DOI: 10.3390/cancers14102366] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 05/03/2022] [Accepted: 05/04/2022] [Indexed: 11/16/2022] Open