Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tursunov A, Mustaqeem, Choeh JY, Kwon S. Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms. Sensors (Basel) 2021;21:5892. [PMID: 34502785 PMCID: PMC8434188 DOI: 10.3390/s21175892] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 08/30/2021] [Accepted: 08/30/2021] [Indexed: 11/16/2022]

For:	Tursunov A, Mustaqeem, Choeh JY, Kwon S. Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms. Sensors (Basel) 2021;21:5892. [PMID: 34502785 PMCID: PMC8434188 DOI: 10.3390/s21175892] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 08/30/2021] [Accepted: 08/30/2021] [Indexed: 11/16/2022]

Number

Cited by Other Article(s)

Carrillo-Larco RM. Recognition of Patient Gender: A Machine Learning Preliminary Analysis Using Heart Sounds from Children and Adolescents. Pediatr Cardiol 2024:10.1007/s00246-024-03561-2. [PMID: 38937337 DOI: 10.1007/s00246-024-03561-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 06/29/2024]

Hernández-Nava G, Salazar-Colores S, Cabal-Yepez E, Ramos-Arreguín JM. Parallel Ictal-Net, a Parallel CNN Architecture with Efficient Channel Attention for Seizure Detection. SENSORS (BASEL, SWITZERLAND) 2024;24:716. [PMID: 38339433 PMCID: PMC10856983 DOI: 10.3390/s24030716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 02/12/2024]

Akinpelu S, Viriri S. Speech emotion classification using attention based network and regularized feature selection. Sci Rep 2023;13:11990. [PMID: 37491423 PMCID: PMC10368662 DOI: 10.1038/s41598-023-38868-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 07/16/2023] [Indexed: 07/27/2023] Open

Mustaqeem, El Saddik A, Alotaibi FS, Pham NT. AAD-Net: Advanced end-to-end speech signal system for human emotion detection & recognition using attention-based deep echo state network. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]

Lai T, Guan Y, Men S, Shang H, Zhang H. ResNet for recognition of Qi-deficiency constitution and balanced constitution based on voice. Front Psychol 2022;13:1043955. [PMID: 36544461 PMCID: PMC9762153 DOI: 10.3389/fpsyg.2022.1043955] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 11/15/2022] [Indexed: 12/12/2022] Open

Abstract

Background

According to traditional Chinese medicine theory, a Qi-deficiency constitution is characterized by a lower voice frequency, shortness of breath, reluctance to speak, an introverted personality, emotional instability, and timidity. People with Qi-deficiency constitution are prone to repeated colds and have a higher probability of chronic diseases and depression. However, a person with a Balanced constitution is relatively healthy in all physical and psychological aspects. At present, the determination of whether one has a Qi-deficiency constitution or a Balanced constitution are mostly based on a scale, which is easily affected by subjective factors. As an objective method of diagnosis, the human voice is worthy of research. Therefore, the purpose of this study is to improve the objectivity of determining Qi-deficiency constitution and Balanced constitution through one's voice and to explore the feasibility of deep learning in TCM constitution recognition.

Methods

The voices of 48 subjects were collected, and the constitution classification results were obtained from the classification and determination of TCM constitutions. Then, the constitution was classified according to the ResNet residual neural network model.

Results

A total of 720 voice data points were collected from 48 subjects. The classification accuracy rate of the Qi-deficiency constitution and Balanced constitution was 81.5% according to ResNet. The loss values of the model training and test sets gradually decreased to 0, while the ACC values of the training and test sets tended to increase, and the ACC values of the training set approached 1. The ROC curve shows an AUC value of 0.85.

Conclusion

The Qi-deficiency constitution and Balanced constitution determination method based on the ResNet residual neural network model proposed in this study can improve the efficiency of constitution recognition and provide decision support for clinical practice.

Collapse

Wei Y, Zhang X, Zeng A, Huang H. Iris Recognition Method Based on Parallel Iris Localization Algorithm and Deep Learning Iris Verification. SENSORS (BASEL, SWITZERLAND) 2022;22:7723. [PMID: 36298074 PMCID: PMC9611168 DOI: 10.3390/s22207723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/05/2022] [Accepted: 10/10/2022] [Indexed: 06/16/2023]

Abstract

Biometric recognition technology has been widely used in various fields of society. Iris recognition technology, as a stable and convenient biometric recognition technology, has been widely used in security applications. However, the iris images collected in the actual non-cooperative environment have various noises. Although mainstream iris recognition methods based on deep learning have achieved good recognition accuracy, the intention is to increase the complexity of the model. On the other hand, what the actual optical system collects is the original iris image that is not normalized. The mainstream iris recognition scheme based on deep learning does not consider the iris localization stage. In order to solve the above problems, this paper proposes an effective iris recognition scheme consisting of the iris localization and iris verification stages. For the iris localization stage, we used the parallel Hough circle to extract the inner circle of the iris and the Daugman algorithm to extract the outer circle of the iris, and for the iris verification stage, we developed a new lightweight convolutional neural network. The architecture consists of a deep residual network module and a residual pooling layer which is introduced to effectively improve the accuracy of iris verification. Iris localization experiments were conducted on 400 iris images collected under a non-cooperative environment. Compared with its processing time on a graphics processing unit with a central processing unit architecture, the experimental results revealed that the speed was increased by 26, 32, 36, and 21 times at 4 different iris datasets, respectively, and the effective iris localization accuracy is achieved. Furthermore, we chose four representative iris datasets collected under a non-cooperative environment for the iris verification experiments. The experimental results demonstrated that the network structure could achieve high-precision iris verification with fewer parameters, and the equal error rates are 1.08%, 1.01%, 1.71%, and 1.11% on 4 test databases, respectively.

Collapse

Age group prediction with panoramic radiomorphometric parameters using machine learning algorithms. Sci Rep 2022;12:11703. [PMID: 35810213 PMCID: PMC9271070 DOI: 10.1038/s41598-022-15691-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 06/28/2022] [Indexed: 11/09/2022] Open

Abstract

The aim of this study is to investigate the relationship of 18 radiomorphometric parameters of panoramic radiographs based on age, and to estimate the age group of people with permanent dentition in a non-invasive, comprehensive, and accurate manner using five machine learning algorithms. For the study population (209 men and 262 women; mean age, 32.12 ± 18.71 years), 471 digital panoramic radiographs of Korean individuals were applied. The participants were divided into three groups (with a 20-year age gap) and six groups (with a 10-year age gap), and each age group was estimated using the following five machine learning models: a linear discriminant analysis, logistic regression, kernelized support vector machines, multilayer perceptron, and extreme gradient boosting. Finally, a Fisher discriminant analysis was used to visualize the data configuration. In the prediction of the three age-group classification, the areas under the curve (AUCs) obtained for classifying young ages (10-19 years) ranged from 0.85 to 0.88 for five different machine learning models. The AUC values of the older age group (50-69 years) ranged from 0.82 to 0.88, and those of adults (20-49 years) were approximately 0.73. In the six age-group classification, the best scores were also found in age groups 1 (10-19 years) and 6 (60-69 years), with mean AUCs ranging from 0.85 to 0.87 and 80 to 0.90, respectively. A feature analysis based on LDA weights showed that the L-Pulp Area was important for discriminating young ages (10-49 years), and L-Crown, U-Crown, L-Implant, U-Implant, and Periodontitis were used as predictors for discriminating older ages (50-69 years). We established acceptable linear and nonlinear machine learning models for a dental age group estimation using multiple maxillary and mandibular radiomorphometric parameters. Since certain radiomorphological characteristics of young and the elderly were linearly related to age, young and old groups could be easily distinguished from other age groups with automated machine learning models.

Collapse

Multi-Label Extreme Learning Machine (MLELMs) for Bangla Regional Speech Recognition. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115463] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Advanced Fusion-Based Speech Emotion Recognition System Using a Dual-Attention Mechanism with Conv-Caps and Bi-GRU Features. ELECTRONICS 2022. [DOI: 10.3390/electronics11091328] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Abstract Recognizing the speaker’s emotional state from speech signals plays a very crucial role in human–computer interaction (HCI). Nowadays, numerous linguistic resources are available, but most of them contain samples of a discrete length. In this article, we address the leading challenge in Speech Emotion Recognition (SER), which is how to extract the essential emotional features from utterances of a variable length. To obtain better emotional information from the speech signals and increase the diversity of the information, we present an advanced fusion-based dual-channel self-attention mechanism using convolutional capsule (Conv-Cap) and bi-directional gated recurrent unit (Bi-GRU) networks. We extracted six spectral features (Mel-spectrograms, Mel-frequency cepstral coefficients, chromagrams, the contrast, the zero-crossing rate, and the root mean square). The Conv-Cap module was used to obtain Mel-spectrograms, while the Bi-GRU was used to obtain the rest of the spectral features from the input tensor. The self-attention layer was employed in each module to selectively focus on optimal cues and determine the attention weight to yield high-level features. Finally, we utilized a confidence-based fusion method to fuse all high-level features and pass them through the fully connected layers to classify the emotional states. The proposed model was evaluated on the Berlin (EMO-DB), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and Odia (SITB-OSED) datasets to improve the recognition rate. During experiments, we found that our proposed model achieved high weighted accuracy (WA) and unweighted accuracy (UA) values, i.e., 90.31% and 87.61%, 76.84% and 70.34%, and 87.52% and 86.19%, respectively, demonstrating that the proposed model outperformed the state-of-the-art models using the same datasets. Collapse

Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech. ENTROPY 2022;24:e24030414. [PMID: 35327924 PMCID: PMC8947568 DOI: 10.3390/e24030414] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 02/22/2022] [Accepted: 02/28/2022] [Indexed: 02/01/2023]

Bekmanova G, Yergesh B, Sharipbay A, Mukanova A. Emotional Speech Recognition Method Based on Word Transcription. SENSORS 2022;22:s22051937. [PMID: 35271083 PMCID: PMC8915129 DOI: 10.3390/s22051937] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 02/25/2022] [Accepted: 02/25/2022] [Indexed: 02/01/2023]