1
|
Jegan R, Jayagowri R. Voice pathology detection using optimized convolutional neural networks and explainable artificial intelligence-based analysis. Comput Methods Biomech Biomed Engin 2023:1-17. [PMID: 37850553 DOI: 10.1080/10255842.2023.2270102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 10/08/2023] [Indexed: 10/19/2023]
Abstract
This article proposes a noninvasive computer-aided assessment approach based on optimized convolutional neural network for healthy and pathological voice detection. Firstly, the input voice samples are first transformed into mel-spectrogram time-frequency visual representations and fed for training the CNN model. The time-frequency image captures inherent speech variations beneficial for healthy and pathological voice sample detection. The weights and biases of trained CNN network are further optimized using artificial bee colony (ABC) optimization algorithm resulting in optimum CNN network employed for testing unseen data. The proposed approach is evaluated using three popular and publicly available datasets: SVD, AVPD and VOICED. Experimental results emphasize that proposed ABC optimized CNN model shows improved accuracy performance by 1.02% compared to conventional CNN network illustrating data-independent discriminative representation ability. Finally, gradient-weighted class activation mapping (Grad-CAM) explainable artificial intelligence (XAI) is utilized to make the decision understandable.
Collapse
Affiliation(s)
- Roohum Jegan
- Department of Electronics and Communication Engineering, BMS College of Engineering, Bengluru, Karnataka, India
| | - R Jayagowri
- Department of Electronics and Communication Engineering, BMS College of Engineering, Bengluru, Karnataka, India
| |
Collapse
|
2
|
Shibata Y, Victorino JN, Natsuyama T, Okamoto N, Yoshimura R, Shibata T. Estimation of subjective quality of life in schizophrenic patients using speech features. FRONTIERS IN REHABILITATION SCIENCES 2023; 4:1121034. [PMID: 36968213 PMCID: PMC10036834 DOI: 10.3389/fresc.2023.1121034] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 02/13/2023] [Indexed: 03/12/2023]
Abstract
IntroductionPatients with schizophrenia experience the most prolonged hospital stay in Japan. Also, the high re-hospitalization rate affects their quality of life (QoL). Despite being an effective predictor of treatment, QoL has not been widely utilized due to time constraints and lack of interest. As such, this study aimed to estimate the schizophrenic patients' subjective quality of life using speech features. Specifically, this study uses speech from patients with schizophrenia to estimate the subscale scores, which measure the subjective QoL of the patients. The objectives were to (1) estimate the subscale scores from different patients or cross-sectional measurements, and 2) estimate the subscale scores from the same patient in different periods or longitudinal measurements.MethodsA conversational agent was built to record the responses of 18 schizophrenic patients on the Japanese Schizophrenia Quality of Life Scale (JSQLS) with three subscales: “Psychosocial,” “Motivation and Energy,” and “Symptoms and Side-effects.” These three subscales were used as objective variables. On the other hand, the speech features during measurement (Chromagram, Mel spectrogram, Mel-Frequency Cepstrum Coefficient) were used as explanatory variables. For the first objective, a trained model estimated the subscale scores for the 18 subjects using the Nested Cross-validation (CV) method. For the second objective, six of the 18 subjects were measured twice. Then, another trained model estimated the subscale scores for the second time using the 18 subjects' data as training data. Ten different machine learning algorithms were used in this study, and the errors of the learned models were compared.Results and DiscussionThe results showed that the mean RMSE of the cross-sectional measurement was 13.433, with k-Nearest Neighbors as the best model. Meanwhile, the mean RMSE of the longitudinal measurement was 13.301, using Random Forest as the best. RMSE of less than 10 suggests that the estimated subscale scores using speech features were close to the actual JSQLS subscale scores. Ten out of 18 subjects were estimated with an RMSE of less than 10 for cross-sectional measurement. Meanwhile, five out of six had the same observation for longitudinal measurement. Future studies using a larger number of subjects and the development of more personalized models based on longitudinal measurements are needed to apply the results to telemedicine for continuous monitoring of QoL.
Collapse
Affiliation(s)
- Yuko Shibata
- Department of Life Science and System Engineering, Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology, Kitakyushu, Japan
- Correspondence: Yuko Shibata
| | - John Noel Victorino
- Department of Life Science and System Engineering, Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology, Kitakyushu, Japan
| | - Tomoya Natsuyama
- Department of Psychiatry, University of Occupational and Environmental Health, Kitakyushu, Japan
| | - Naomichi Okamoto
- Department of Psychiatry, University of Occupational and Environmental Health, Kitakyushu, Japan
| | - Reiji Yoshimura
- Department of Psychiatry, University of Occupational and Environmental Health, Kitakyushu, Japan
| | - Tomohiro Shibata
- Department of Life Science and System Engineering, Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology, Kitakyushu, Japan
| |
Collapse
|