1
|
Carrillo-Larco RM. Recognition of Patient Gender: A Machine Learning Preliminary Analysis Using Heart Sounds from Children and Adolescents. Pediatr Cardiol 2024:10.1007/s00246-024-03561-2. [PMID: 38937337 DOI: 10.1007/s00246-024-03561-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 06/29/2024]
Abstract
Research has shown that X-rays and fundus images can classify gender, age group, and race, raising concerns about bias and fairness in medical AI applications. However, the potential for physiological sounds to classify sociodemographic traits has not been investigated. Exploring this gap is crucial for understanding the implications and ensuring fairness in the field of medical sound analysis. We aimed to develop classifiers to determine gender (men/women) based on heart sound recordings and using machine learning (ML). Data-driven ML analysis. We utilized the open-access CirCor DigiScope Phonocardiogram Dataset obtained from cardiac screening programs in Brazil. Volunteers < 21 years of age. Each participant completed a questionnaire and underwent a clinical examination, including electronic auscultation at four cardiac points: aortic (AV), mitral (MV), pulmonary (PV), and tricuspid (TV). We used Mel-frequency cepstral coefficients (MFCCs) to develop the ML classifiers. From each patient and from each auscultation sound recording, we extracted 10 MFCCs. In sensitivity analysis, we additionally extracted 20, 30, 40, and 50 MFCCs. The most effective gender classifier was developed using PV recordings (AUC ROC = 70.3%). The second best came from MV recordings (AUC ROC = 58.8%). AV and TV recordings produced classifiers with an AUC ROC of 56.4% and 56.1%, respectively. Using more MFCCs did not substantially improve the classifiers. It is possible to classify between males and females using phonocardiogram data. As health-related audio recordings become more prominent in ML applications, research is required to explore if these recordings contain signals that could distinguish sociodemographic features.
Collapse
Affiliation(s)
- Rodrigo M Carrillo-Larco
- Hubert Department of Global Health, Rollins School of Public Health, Emory University, Atlanta, GA, USA.
| |
Collapse
|
2
|
Hernández-Nava G, Salazar-Colores S, Cabal-Yepez E, Ramos-Arreguín JM. Parallel Ictal-Net, a Parallel CNN Architecture with Efficient Channel Attention for Seizure Detection. SENSORS (BASEL, SWITZERLAND) 2024; 24:716. [PMID: 38339433 PMCID: PMC10856983 DOI: 10.3390/s24030716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 02/12/2024]
Abstract
Around 70 million people worldwide are affected by epilepsy, a neurological disorder characterized by non-induced seizures that occur at irregular and unpredictable intervals. During an epileptic seizure, transient symptoms emerge as a result of extreme abnormal neural activity. Epilepsy imposes limitations on individuals and has a significant impact on the lives of their families. Therefore, the development of reliable diagnostic tools for the early detection of this condition is considered beneficial to alleviate the social and emotional distress experienced by patients. While the Bonn University dataset contains five collections of EEG data, not many studies specifically focus on subsets D and E. These subsets correspond to EEG recordings from the epileptogenic zone during ictal and interictal events. In this work, the parallel ictal-net (PIN) neural network architecture is introduced, which utilizes scalograms obtained through a continuous wavelet transform to achieve the high-accuracy classification of EEG signals into ictal or interictal states. The results obtained demonstrate the effectiveness of the proposed PIN model in distinguishing between ictal and interictal events with a high degree of confidence. This is validated by the computing accuracy, precision, recall, and F1 scores, all of which consistently achieve around 99% confidence, surpassing previous approaches in the related literature.
Collapse
Affiliation(s)
- Gerardo Hernández-Nava
- Faculty of Engineering, Autonomous University of Querétaro, Queretaro 76140, Mexico; (G.H.-N.); (J.-M.R.-A.)
| | | | - Eduardo Cabal-Yepez
- Multidisciplinary Studies Department, Campus Yuriria, University of Guanajuato, Guanajuato 38954, Mexico;
| | - Juan-Manuel Ramos-Arreguín
- Faculty of Engineering, Autonomous University of Querétaro, Queretaro 76140, Mexico; (G.H.-N.); (J.-M.R.-A.)
| |
Collapse
|
3
|
Akinpelu S, Viriri S. Speech emotion classification using attention based network and regularized feature selection. Sci Rep 2023; 13:11990. [PMID: 37491423 PMCID: PMC10368662 DOI: 10.1038/s41598-023-38868-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 07/16/2023] [Indexed: 07/27/2023] Open
Abstract
Speech emotion classification (SEC) has gained the utmost height and occupied a conspicuous position within the research community in recent times. Its vital role in Human-Computer Interaction (HCI) and affective computing cannot be overemphasized. Many primitive algorithmic solutions and deep neural network (DNN) models have been proposed for efficient recognition of emotion from speech however, the suitability of these methods to accurately classify emotion from speech with multi-lingual background and other factors that impede efficient classification of emotion is still demanding critical consideration. This study proposed an attention-based network with a pre-trained convolutional neural network and regularized neighbourhood component analysis (RNCA) feature selection techniques for improved classification of speech emotion. The attention model has proven to be successful in many sequence-based and time-series tasks. An extensive experiment was carried out using three major classifiers (SVM, MLP and Random Forest) on a publicly available TESS (Toronto English Speech Sentence) dataset. The result of our proposed model (Attention-based DCNN+RNCA+RF) achieved 97.8% classification accuracy and yielded a 3.27% improved performance, which outperforms state-of-the-art SEC approaches. Our model evaluation revealed the consistency of attention mechanism and feature selection with human behavioural patterns in classifying emotion from auditory speech.
Collapse
Affiliation(s)
- Samson Akinpelu
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4000, South Africa
| | - Serestina Viriri
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4000, South Africa.
| |
Collapse
|
4
|
Mustaqeem, El Saddik A, Alotaibi FS, Pham NT. AAD-Net: Advanced end-to-end speech signal system for human emotion detection & recognition using attention-based deep echo state network. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
5
|
Lai T, Guan Y, Men S, Shang H, Zhang H. ResNet for recognition of Qi-deficiency constitution and balanced constitution based on voice. Front Psychol 2022; 13:1043955. [PMID: 36544461 PMCID: PMC9762153 DOI: 10.3389/fpsyg.2022.1043955] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 11/15/2022] [Indexed: 12/12/2022] Open
Abstract
Background According to traditional Chinese medicine theory, a Qi-deficiency constitution is characterized by a lower voice frequency, shortness of breath, reluctance to speak, an introverted personality, emotional instability, and timidity. People with Qi-deficiency constitution are prone to repeated colds and have a higher probability of chronic diseases and depression. However, a person with a Balanced constitution is relatively healthy in all physical and psychological aspects. At present, the determination of whether one has a Qi-deficiency constitution or a Balanced constitution are mostly based on a scale, which is easily affected by subjective factors. As an objective method of diagnosis, the human voice is worthy of research. Therefore, the purpose of this study is to improve the objectivity of determining Qi-deficiency constitution and Balanced constitution through one's voice and to explore the feasibility of deep learning in TCM constitution recognition. Methods The voices of 48 subjects were collected, and the constitution classification results were obtained from the classification and determination of TCM constitutions. Then, the constitution was classified according to the ResNet residual neural network model. Results A total of 720 voice data points were collected from 48 subjects. The classification accuracy rate of the Qi-deficiency constitution and Balanced constitution was 81.5% according to ResNet. The loss values of the model training and test sets gradually decreased to 0, while the ACC values of the training and test sets tended to increase, and the ACC values of the training set approached 1. The ROC curve shows an AUC value of 0.85. Conclusion The Qi-deficiency constitution and Balanced constitution determination method based on the ResNet residual neural network model proposed in this study can improve the efficiency of constitution recognition and provide decision support for clinical practice.
Collapse
Affiliation(s)
- Tong Lai
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Yutong Guan
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Shaoyang Men
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Hongcai Shang
- Key Laboratory of Chinese Internal Medicine of Ministry of Education and Beijing, Dongzhimen Hospital Affiliated to Beijing University of Chinese Medicine, Beijing, China
| | - Honglai Zhang
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China,*Correspondence: Honglai Zhang,
| |
Collapse
|
6
|
Wei Y, Zhang X, Zeng A, Huang H. Iris Recognition Method Based on Parallel Iris Localization Algorithm and Deep Learning Iris Verification. SENSORS (BASEL, SWITZERLAND) 2022; 22:7723. [PMID: 36298074 PMCID: PMC9611168 DOI: 10.3390/s22207723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/05/2022] [Accepted: 10/10/2022] [Indexed: 06/16/2023]
Abstract
Biometric recognition technology has been widely used in various fields of society. Iris recognition technology, as a stable and convenient biometric recognition technology, has been widely used in security applications. However, the iris images collected in the actual non-cooperative environment have various noises. Although mainstream iris recognition methods based on deep learning have achieved good recognition accuracy, the intention is to increase the complexity of the model. On the other hand, what the actual optical system collects is the original iris image that is not normalized. The mainstream iris recognition scheme based on deep learning does not consider the iris localization stage. In order to solve the above problems, this paper proposes an effective iris recognition scheme consisting of the iris localization and iris verification stages. For the iris localization stage, we used the parallel Hough circle to extract the inner circle of the iris and the Daugman algorithm to extract the outer circle of the iris, and for the iris verification stage, we developed a new lightweight convolutional neural network. The architecture consists of a deep residual network module and a residual pooling layer which is introduced to effectively improve the accuracy of iris verification. Iris localization experiments were conducted on 400 iris images collected under a non-cooperative environment. Compared with its processing time on a graphics processing unit with a central processing unit architecture, the experimental results revealed that the speed was increased by 26, 32, 36, and 21 times at 4 different iris datasets, respectively, and the effective iris localization accuracy is achieved. Furthermore, we chose four representative iris datasets collected under a non-cooperative environment for the iris verification experiments. The experimental results demonstrated that the network structure could achieve high-precision iris verification with fewer parameters, and the equal error rates are 1.08%, 1.01%, 1.71%, and 1.11% on 4 test databases, respectively.
Collapse
Affiliation(s)
- Yinyin Wei
- Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Shanghai 201800, China
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Aijun Zeng
- Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Shanghai 201800, China
| | - Huijie Huang
- Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Shanghai 201800, China
| |
Collapse
|
7
|
Age group prediction with panoramic radiomorphometric parameters using machine learning algorithms. Sci Rep 2022; 12:11703. [PMID: 35810213 PMCID: PMC9271070 DOI: 10.1038/s41598-022-15691-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 06/28/2022] [Indexed: 11/09/2022] Open
Abstract
The aim of this study is to investigate the relationship of 18 radiomorphometric parameters of panoramic radiographs based on age, and to estimate the age group of people with permanent dentition in a non-invasive, comprehensive, and accurate manner using five machine learning algorithms. For the study population (209 men and 262 women; mean age, 32.12 ± 18.71 years), 471 digital panoramic radiographs of Korean individuals were applied. The participants were divided into three groups (with a 20-year age gap) and six groups (with a 10-year age gap), and each age group was estimated using the following five machine learning models: a linear discriminant analysis, logistic regression, kernelized support vector machines, multilayer perceptron, and extreme gradient boosting. Finally, a Fisher discriminant analysis was used to visualize the data configuration. In the prediction of the three age-group classification, the areas under the curve (AUCs) obtained for classifying young ages (10-19 years) ranged from 0.85 to 0.88 for five different machine learning models. The AUC values of the older age group (50-69 years) ranged from 0.82 to 0.88, and those of adults (20-49 years) were approximately 0.73. In the six age-group classification, the best scores were also found in age groups 1 (10-19 years) and 6 (60-69 years), with mean AUCs ranging from 0.85 to 0.87 and 80 to 0.90, respectively. A feature analysis based on LDA weights showed that the L-Pulp Area was important for discriminating young ages (10-49 years), and L-Crown, U-Crown, L-Implant, U-Implant, and Periodontitis were used as predictors for discriminating older ages (50-69 years). We established acceptable linear and nonlinear machine learning models for a dental age group estimation using multiple maxillary and mandibular radiomorphometric parameters. Since certain radiomorphological characteristics of young and the elderly were linearly related to age, young and old groups could be easily distinguished from other age groups with automated machine learning models.
Collapse
|
8
|
Multi-Label Extreme Learning Machine (MLELMs) for Bangla Regional Speech Recognition. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115463] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Extensive research has been conducted in the past to determine age, gender, and words spoken in Bangla speech, but no work has been conducted to identify the regional language spoken by the speaker in Bangla speech. Hence, in this study, we create a dataset containing 30 h of Bangla speech of seven regional Bangla dialects with the goal of detecting synthesized Bangla speech and categorizing it. To categorize the regional language spoken by the speaker in the Bangla speech and determine its authenticity, the proposed model was created; a Stacked Convolutional Autoencoder (SCAE) and a Sequence of Multi-Label Extreme Learning machines (MLELM). SCAE creates a detailed feature map by identifying the spatial and temporal salient qualities from MFEC input data. The feature map is then sent to MLELM networks to generate soft labels and then hard labels. As aging generates physiological changes in the brain that alter the processing of aural information, the model took age class into account while generating dialect class labels, increasing classification accuracy from 85% to 95% without and with age class consideration, respectively. The classification accuracy for synthesized Bangla speech labels is 95%. The proposed methodology works well with English speaking audio sets as well.
Collapse
|
9
|
Advanced Fusion-Based Speech Emotion Recognition System Using a Dual-Attention Mechanism with Conv-Caps and Bi-GRU Features. ELECTRONICS 2022. [DOI: 10.3390/electronics11091328] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Recognizing the speaker’s emotional state from speech signals plays a very crucial role in human–computer interaction (HCI). Nowadays, numerous linguistic resources are available, but most of them contain samples of a discrete length. In this article, we address the leading challenge in Speech Emotion Recognition (SER), which is how to extract the essential emotional features from utterances of a variable length. To obtain better emotional information from the speech signals and increase the diversity of the information, we present an advanced fusion-based dual-channel self-attention mechanism using convolutional capsule (Conv-Cap) and bi-directional gated recurrent unit (Bi-GRU) networks. We extracted six spectral features (Mel-spectrograms, Mel-frequency cepstral coefficients, chromagrams, the contrast, the zero-crossing rate, and the root mean square). The Conv-Cap module was used to obtain Mel-spectrograms, while the Bi-GRU was used to obtain the rest of the spectral features from the input tensor. The self-attention layer was employed in each module to selectively focus on optimal cues and determine the attention weight to yield high-level features. Finally, we utilized a confidence-based fusion method to fuse all high-level features and pass them through the fully connected layers to classify the emotional states. The proposed model was evaluated on the Berlin (EMO-DB), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and Odia (SITB-OSED) datasets to improve the recognition rate. During experiments, we found that our proposed model achieved high weighted accuracy (WA) and unweighted accuracy (UA) values, i.e., 90.31% and 87.61%, 76.84% and 70.34%, and 87.52% and 86.19%, respectively, demonstrating that the proposed model outperformed the state-of-the-art models using the same datasets.
Collapse
|
10
|
Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech. ENTROPY 2022; 24:e24030414. [PMID: 35327924 PMCID: PMC8947568 DOI: 10.3390/e24030414] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 02/22/2022] [Accepted: 02/28/2022] [Indexed: 02/01/2023]
Abstract
Speaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the classification accuracy of developed models significantly decreases when applying them to emotional speech or in the presence of interference. Furthermore, deep models may require a large number of parameters, so constrained solutions are desirable in order to implement them on edge devices in the Internet of Things systems for real-time detection. The aim of this paper is to propose a simple and constrained convolutional neural network for speaker recognition tasks and to examine its robustness for recognition in emotional speech conditions. We examine three quantization methods for developing a constrained network: floating-point eight format, ternary scalar quantization, and binary scalar quantization. The results are demonstrated on the recently recorded SEAC dataset.
Collapse
|
11
|
Bekmanova G, Yergesh B, Sharipbay A, Mukanova A. Emotional Speech Recognition Method Based on Word Transcription. SENSORS 2022; 22:s22051937. [PMID: 35271083 PMCID: PMC8915129 DOI: 10.3390/s22051937] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 02/25/2022] [Accepted: 02/25/2022] [Indexed: 02/01/2023]
Abstract
The emotional speech recognition method presented in this article was applied to recognize the emotions of students during online exams in distance learning due to COVID-19. The purpose of this method is to recognize emotions in spoken speech through the knowledge base of emotionally charged words, which are stored as a code book. The method analyzes human speech for the presence of emotions. To assess the quality of the method, an experiment was conducted for 420 audio recordings. The accuracy of the proposed method is 79.7% for the Kazakh language. The method can be used for different languages and consists of the following tasks: capturing a signal, detecting speech in it, recognizing speech words in a simplified transcription, determining word boundaries, comparing a simplified transcription with a code book, and constructing a hypothesis about the degree of speech emotionality. In case of the presence of emotions, there occurs complete recognition of words and definitions of emotions in speech. The advantage of this method is the possibility of its widespread use since it is not demanding on computational resources. The described method can be applied when there is a need to recognize positive and negative emotions in a crowd, in public transport, schools, universities, etc. The experiment carried out has shown the effectiveness of this method. The results obtained will make it possible in the future to develop devices that begin to record and recognize a speech signal, for example, in the case of detecting negative emotions in sounding speech and, if necessary, transmitting a message about potential threats or riots.
Collapse
Affiliation(s)
- Gulmira Bekmanova
- Faculty of Information Technologies, L.N. Gumilyov Eurasian National University, Nur-Sultan 010008, Kazakhstan; (G.B.); (A.S.); (A.M.)
| | - Banu Yergesh
- Faculty of Information Technologies, L.N. Gumilyov Eurasian National University, Nur-Sultan 010008, Kazakhstan; (G.B.); (A.S.); (A.M.)
- Correspondence:
| | - Altynbek Sharipbay
- Faculty of Information Technologies, L.N. Gumilyov Eurasian National University, Nur-Sultan 010008, Kazakhstan; (G.B.); (A.S.); (A.M.)
| | - Assel Mukanova
- Faculty of Information Technologies, L.N. Gumilyov Eurasian National University, Nur-Sultan 010008, Kazakhstan; (G.B.); (A.S.); (A.M.)
- Higher School of Information Technology and Engineering, Astana International University, Nur-Sultan 010000, Kazakhstan
| |
Collapse
|