1
|
Lin K, Washington PY. Multimodal deep learning for dementia classification using text and audio. Sci Rep 2024; 14:13887. [PMID: 38880810 PMCID: PMC11180654 DOI: 10.1038/s41598-024-64438-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 06/10/2024] [Indexed: 06/18/2024] Open
Abstract
Dementia is a progressive neurological disorder that affects the daily lives of older adults, impacting their verbal communication and cognitive function. Early diagnosis is important to enhance the lifespan and quality of life for affected individuals. Despite its importance, diagnosing dementia is a complex process. Automated machine learning solutions involving multiple types of data have the potential to improve the process of automated dementia screening. In this study, we build deep learning models to classify dementia cases from controls using the Pitt Cookie Theft dataset from DementiaBank, a database of short participant responses to the structured task of describing a picture of a cookie theft. We fine-tune Wav2vec and Word2vec baseline models to make binary predictions of dementia from audio recordings and text transcripts, respectively. We conduct experiments with four versions of the dataset: (1) the original data, (2) the data with short sentences removed, (3) text-based augmentation of the original data, and (4) text-based augmentation of the data with short sentences removed. Our results indicate that synonym-based text data augmentation generally enhances the performance of models that incorporate the text modality. Without data augmentation, models using the text modality achieve around 60% accuracy and 70% AUROC scores, and with data augmentation, the models achieve around 80% accuracy and 90% AUROC scores. We do not observe significant improvements in performance with the addition of audio or timestamp information into the model. We include a qualitative error analysis of the sentences that are misclassified under each study condition. This study provides preliminary insights into the effects of both text-based data augmentation and multimodal deep learning for automated dementia classification.
Collapse
Affiliation(s)
- Kaiying Lin
- Department of Information and Computer Science, University of Hawai'i, Honolulu, 96822, USA.
- Department of Linguistics, University of Hawai'i, Honolulu, 96822, USA.
| | - Peter Y Washington
- Department of Information and Computer Science, University of Hawai'i, Honolulu, 96822, USA.
| |
Collapse
|
2
|
S S, V S. FACNN: fuzzy-based adaptive convolution neural network for classifying COVID-19 in noisy CXR images. Med Biol Eng Comput 2024:10.1007/s11517-024-03107-x. [PMID: 38710960 DOI: 10.1007/s11517-024-03107-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 04/22/2024] [Indexed: 05/08/2024]
Abstract
COVID-19 detection using chest X-rays (CXR) has evolved as a significant method for early diagnosis of the pandemic disease. Clinical trials and methods utilize X-ray images with computer and intelligent algorithms to improve detection and classification precision. This article thus proposes a fuzzy-based adaptive convolution neural network (FACNN) model to improve the detection precision by confining the false rates. The feature extraction process between the successive regions is validated using a fuzzy process that classifies labeled and unknown pixels. The membership functions are derived based on high precision features for detection and false rate suppression process. The convolution neural network process is responsible for increasing detection precision through recurrent training based on feature availability. This availability analysis is verified using fuzzy derivatives under local variances. Based on variance-reduced features, the appropriate regions with labeled and unknown features are used for normal or infected classification. Thus, the proposed FACNN improves accuracy, precision, and feature extraction by 14.36%, 8.74%, and 12.35%, respectively. This model reduces the false rate and extraction time by 10.35% and 10.66%, respectively.
Collapse
Affiliation(s)
- Suganyadevi S
- Department of ECE, KPR Institute of Engineering and Technology, Coimbatore, 641 407, India.
| | - Seethalakshmi V
- Department of ECE, KPR Institute of Engineering and Technology, Coimbatore, 641 407, India
| |
Collapse
|
3
|
Dong C, Hayashi S. Deep learning applications in vascular dementia using neuroimaging. Curr Opin Psychiatry 2024; 37:101-106. [PMID: 38226547 DOI: 10.1097/yco.0000000000000920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
PURPOSE OF REVIEW Vascular dementia (VaD) is the second common cause of dementia after Alzheimer's disease, and deep learning has emerged as a critical tool in dementia research. The aim of this article is to highlight the current deep learning applications in VaD-related imaging biomarkers and diagnosis. RECENT FINDINGS The main deep learning technology applied in VaD using neuroimaging data is convolutional neural networks (CNN). CNN models have been widely used for lesion detection and segmentation, such as white matter hyperintensities (WMH), cerebral microbleeds (CMBs), perivascular spaces (PVS), lacunes, cortical superficial siderosis, and brain atrophy. Applications in VaD subtypes classification also showed excellent results. CNN-based deep learning models have potential for further diagnosis and prognosis of VaD. SUMMARY Deep learning neural networks with neuroimaging data in VaD research represent significant promise for advancing early diagnosis and treatment strategies. Ongoing research and collaboration between clinicians, data scientists, and neuroimaging experts are essential to address challenges and unlock the full potential of deep learning in VaD diagnosis and management.
Collapse
Affiliation(s)
- Chao Dong
- Centre for Healthy Brain Ageing (CHeBA), Discipline of Psychiatry & Mental Health, School of Clinical Medicine, UNSW Sydney, NSW, Australia
| | | |
Collapse
|
4
|
Aldarraji M, Vega-Márquez B, Pontes B, Mahmood B, Riquelme JC. Addressing energy challenges in Iraq: Forecasting power supply and demand using artificial intelligence models. Heliyon 2024; 10:e25821. [PMID: 38375305 PMCID: PMC10875426 DOI: 10.1016/j.heliyon.2024.e25821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 01/09/2024] [Accepted: 02/02/2024] [Indexed: 02/21/2024] Open
Abstract
The global surge in energy demand, driven by technological advances and population growth, underscores the critical need for effective management of electricity supply and demand. In certain developing nations, a significant challenge arises because the energy demand of their population exceeds their capacity to generate, as is the case in Iraq. This study focuses on energy forecasting in Iraq, using a previously unstudied dataset from 2019 to 2021, sourced from the Iraqi Ministry of Electricity. The study employs a diverse set of advanced forecasting models, including Linear Regression, XGBoost, Random Forest, Long Short-Term Memory, Temporal Convolutional Networks, and Multi-Layer Perceptron, evaluating their performance across four distinct forecast horizons (24, 48, 72, and 168 hours ahead). Key findings reveal that Linear Regression is a consistent top performer in demand forecasting, while XGBoost excels in supply forecasting. Statistical analysis detects differences in models performances for both datasets, although no significant differences are found in pairwise comparisons for the supply dataset. This study emphasizes the importance of accurate energy forecasting for energy security, resource allocation, and policy-making in Iraq. It provides tools for decision-makers to address energy challenges, mitigate power shortages, and stimulate economic growth. It also encourages innovative forecasting methods, the use of external variables like weather and economic data, and region-specific models tailored to Iraq's energy landscape. The research contributes valuable insights into the dynamics of electricity supply and demand in Iraq and offers performance evaluations for better energy planning and management, ultimately promoting sustainable development and improving the quality of life for the Iraqi population.
Collapse
Affiliation(s)
- Morteza Aldarraji
- Dept. Computer Languages & Systems, University of Seville, Seville, 41012, Spain
| | - Belén Vega-Márquez
- Dept. Computer Languages & Systems, University of Seville, Seville, 41012, Spain
| | - Beatriz Pontes
- Dept. Computer Languages & Systems, University of Seville, Seville, 41012, Spain
| | - Basim Mahmood
- Dept. Computer Science, University of Mosul, Mosul, 41002, Iraq
| | - José C. Riquelme
- Dept. Computer Languages & Systems, University of Seville, Seville, 41012, Spain
| |
Collapse
|
5
|
Gupta NS, Kumar P. Perspective of artificial intelligence in healthcare data management: A journey towards precision medicine. Comput Biol Med 2023; 162:107051. [PMID: 37271113 DOI: 10.1016/j.compbiomed.2023.107051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 05/06/2023] [Accepted: 05/20/2023] [Indexed: 06/06/2023]
Abstract
Mounting evidence has highlighted the implementation of big data handling and management in the healthcare industry to improve the clinical services. Various private and public companies have generated, stored, and analyzed different types of big healthcare data, such as omics data, clinical data, electronic health records, personal health records, and sensing data with the aim to move in the direction of precision medicine. Additionally, with the advancement in technologies, researchers are curious to extract the potential involvement of artificial intelligence and machine learning on big healthcare data to enhance the quality of patient's lives. However, seeking solutions from big healthcare data requires proper management, storage, and analysis, which imposes hinderances associated with big data handling. Herein, we briefly discuss the implication of big data handling and the role of artificial intelligence in precision medicine. Further, we also highlighted the potential of artificial intelligence in integrating and analyzing the big data that offer personalized treatment. In addition, we briefly discuss the applications of artificial intelligence in personalized treatment, especially in neurological diseases. Lastly, we discuss the challenges and limitations imposed by artificial intelligence in big data management and analysis to hinder precision medicine.
Collapse
Affiliation(s)
- Nancy Sanjay Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, India.
| |
Collapse
|
6
|
Song W, Xin L, Wang J. A grading method for Kayser Fleischer ring images based on ResNet. Heliyon 2023; 9:e16149. [PMID: 37234668 PMCID: PMC10205591 DOI: 10.1016/j.heliyon.2023.e16149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 05/04/2023] [Accepted: 05/07/2023] [Indexed: 05/28/2023] Open
Abstract
The corneal K-F ring is the most common ophthalmic manifestation of WD patients. Early diagnosis and treatment have an important impact on the patient's condition. K-F ring is one of the gold standards for the diagnosis of WD disease. Therefore, this paper mainly focused on the detection and grading of the K-F ring. The aim of this study is three-fold. Firstly, to create a meaningful database, the K-F ring images are collected which contains 1850 images with 399 different WD patients, and then this paper uses the chi-square test and Friedman test to analyze the statistical significance. Subsequently, the all collected images were graded and labeled with an appropriate treatment approach, as a result, these images could be used to detect the corneal through the YOLO. After the detection of corneal, image segmentation was realized in batches. Finally, in this paper, different deep convolutional neural networks (VGG, ResNet, and DenseNet) were used to realize the grading of the K-F ring images in the KFID. Experimental results reveal that the entire pre-trained models obtain excellent performance. The global accuracies achieved by the six models i.e., VGG-16, VGG-19, ResNet18, ResNet34, ResNet50, and DenseNet are 89.88%, 91.89%, 94.18%, 95.31%, 93.59%, and 94.58% respectively. ResNet34 displayed the highest recall, specificity, and F1-score of 95.23%, 96.99%, and 95.23%. DenseNet showed the best precision of 95.66%. As such, the findings are encouraging, demonstrating the effectiveness of ResNet in the automatic grading of the K-F ring. Moreover, it provides effective help for the clinical diagnosis of HLD.
Collapse
Affiliation(s)
- Wei Song
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, 230031, China
| | - Ling Xin
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, 230031, China
| | - Jiemei Wang
- Department of Otolaryngology, The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, 230031, China
| |
Collapse
|
7
|
Alsabhan W. Human-Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention. SENSORS (BASEL, SWITZERLAND) 2023; 23:1386. [PMID: 36772427 PMCID: PMC9921095 DOI: 10.3390/s23031386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 01/19/2023] [Accepted: 01/20/2023] [Indexed: 06/18/2023]
Abstract
Emotions have a crucial function in the mental existence of humans. They are vital for identifying a person's behaviour and mental condition. Speech Emotion Recognition (SER) is extracting a speaker's emotional state from their speech signal. SER is a growing discipline in human-computer interaction, and it has recently attracted more significant interest. This is because there are not so many universal emotions; therefore, any intelligent system with enough computational capacity can educate itself to recognise them. However, the issue is that human speech is immensely diverse, making it difficult to create a single, standardised recipe for detecting hidden emotions. This work attempted to solve this research difficulty by combining a multilingual emotional dataset with building a more generalised and effective model for recognising human emotions. A two-step process was used to develop the model. The first stage involved the extraction of features, and the second stage involved the classification of the features that were extracted. ZCR, RMSE, and the renowned MFC coefficients were retrieved as features. Two proposed models, 1D CNN combined with LSTM and attention and a proprietary 2D CNN architecture, were used for classification. The outcomes demonstrated that the suggested 1D CNN with LSTM and attention performed better than the 2D CNN. For the EMO-DB, SAVEE, ANAD, and BAVED datasets, the model's accuracy was 96.72%, 97.13%, 96.72%, and 88.39%, respectively. The model beat several earlier efforts on the same datasets, demonstrating the generality and efficacy of recognising multiple emotions from various languages.
Collapse
Affiliation(s)
- Waleed Alsabhan
- College of Engineering, Al Faisal University, P.O. Box 50927, Riyadh 11533, Saudi Arabia
| |
Collapse
|
8
|
A Proposed Framework for Early Prediction of Schistosomiasis. Diagnostics (Basel) 2022; 12:diagnostics12123138. [PMID: 36553145 PMCID: PMC9777618 DOI: 10.3390/diagnostics12123138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/08/2022] [Accepted: 12/08/2022] [Indexed: 12/15/2022] Open
Abstract
Schistosomiasis is a neglected tropical disease that continues to be a leading cause of illness and mortality around the globe. The causing parasites are affixed to the skin through defiled water and enter the human body. Failure to diagnose Schistosomiasis can result in various medical complications, such as ascites, portal hypertension, esophageal varices, splenomegaly, and growth retardation. Early prediction and identification of risk factors may aid in treating disease before it becomes incurable. We aimed to create a framework by incorporating the most significant features to predict Schistosomiasis using machine learning techniques. A dataset of advanced Schistosomiasis has been employed containing recovery and death cases. A total data of 4316 individuals containing recovery and death cases were included in this research. The dataset contains demographics, socioeconomic, and clinical factors with lab reports. Data preprocessing techniques (missing values imputation, outlier removal, data normalisation, and data transformation) have also been employed for better results. Feature selection techniques, including correlation-based feature selection, Information gain, gain ratio, ReliefF, and OneR, have been utilised to minimise a large number of features. Data resampling algorithms, including Random undersampling, Random oversampling, Cluster Centroid, Near miss, and SMOTE, are applied to address the data imbalance problem. We applied four machine learning algorithms to construct the model: Gradient Boosting, Light Gradient Boosting, Extreme Gradient Boosting and CatBoost. The performance of the proposed framework has been evaluated based on Accuracy, Precision, Recall and F1-Score. The results of our proposed framework stated that the CatBoost model showed the best performance with the highest accuracy of (87.1%) compared with Gradient Boosting (86%), Light Gradient Boosting (86.7%) and Extreme Gradient Boosting (86.9%). Our proposed framework will assist doctors and healthcare professionals in the early diagnosis of Schistosomiasis.
Collapse
|