1
|
Binson VA, Thomas S, Subramoniam M, Arun J, Naveen S, Madhu S. A Review of Machine Learning Algorithms for Biomedical Applications. Ann Biomed Eng 2024; 52:1159-1183. [PMID: 38383870 DOI: 10.1007/s10439-024-03459-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 01/24/2024] [Indexed: 02/23/2024]
Abstract
As the amount and complexity of biomedical data continue to increase, machine learning methods are becoming a popular tool in creating prediction models for the underlying biomedical processes. Although all machine learning methods aim to fit models to data, the methodologies used can vary greatly and may seem daunting at first. A comprehensive review of various machine learning algorithms per biomedical applications is presented. The key concepts of machine learning are supervised and unsupervised learning, feature selection, and evaluation metrics. Technical insights on the major machine learning methods such as decision trees, random forests, support vector machines, and k-nearest neighbors are analyzed. Next, the dimensionality reduction methods like principal component analysis and t-distributed stochastic neighbor embedding methods, and their applications in biomedical data analysis were reviewed. Moreover, in biomedical applications predominantly feedforward neural networks, convolutional neural networks, and recurrent neural networks are utilized. In addition, the identification of emerging directions in machine learning methodology will serve as a useful reference for individuals involved in biomedical research, clinical practice, and related professions who are interested in understanding and applying machine learning algorithms in their research or practice.
Collapse
Affiliation(s)
- V A Binson
- Department of Electronics Engineering, Saintgits College of Engineering, Kottayam, India
| | - Sania Thomas
- Department of Computer Science and Engineering, Saintgits College of Engineering, Kottayam, India
| | - M Subramoniam
- Department of Electronics Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu, India
| | - J Arun
- Centre for Waste Management-International Research Centre, Sathyabama Institute of Science and Technology, Chennai, 600119, India
| | - S Naveen
- Department of Automobile Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India
| | - S Madhu
- Department of Automobile Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India.
| |
Collapse
|
2
|
Sancar N, Tabrizi SS. Machine learning approach for the detection of vitamin D level: a comparative study. BMC Med Inform Decis Mak 2023; 23:219. [PMID: 37845674 PMCID: PMC10580577 DOI: 10.1186/s12911-023-02323-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 10/03/2023] [Indexed: 10/18/2023] Open
Abstract
BACKGROUND After the World Health Organization declared the COVID-19 pandemic, the role of Vitamin D has become even more critical for people worldwide. The most accurate way to define vitamin D level is 25-hydroxy vitamin D(25-OH-D) blood test. However, this blood test is not always feasible. Most data sets used in health science research usually contain highly correlated features, which is referred to as multicollinearity problem. This problem can lead to misleading results and overfitting problems in the ML training process. Therefore, the proposed study aims to determine a clinically acceptable ML model for the detection of the vitamin D status of the North Cyprus adult participants accurately, without the need to determine 25-OH-D level, taking into account the multicollinearity problem. METHOD The study was conducted with 481 observations who applied voluntarily to Internal Medicine Department at NEU Hospital. The classification performance of four conventional supervised ML models, namely, Ordinal logistic regression(OLR), Elastic-net ordinal regression(ENOR), Support Vector Machine(SVM), and Random Forest (RF) was compared. The comparative analysis is performed regarding the model's sensitivity to the participant's metabolic syndrome(MtS)'positive status, hyper-parameter tuning, sensitivities to the size of training data, and the classification performance of the models. RESULTS Due to the presence of multicollinearity, the findings showed that the performance of the SVM(RBF) is obviously negatively affected when the test is examined. Moreover, it can be obviously detected that RF is more robust than other models when the variations in the size of training data are examined. This experiment's result showed that the selected RF and ENOR showed better performances than the other two models when the size of training samples was reduced. Since the multicollinearity is more severe in the small samples, it can be concluded that RF and ENOR are not affected by the presence of the multicollinearity problem. The comparative analysis revealed that the RF classifier performed better and was more robust than the other proposed models in terms of accuracy (0.94), specificity (0.96), sensitivity or recall (0.94), precision (0.95), F1-score (0.95), and Cohen's kappa (0.90). CONCLUSION It is evident that the RF achieved better than the SVM(RBF), ENOR, and OLR. These comparison findings will be applied to develop a Vitamin D level intelligent detection system for being used in routine clinical, biochemical tests, and lifestyle characteristics of individuals to decrease the cost and time of vitamin D level detection.
Collapse
Affiliation(s)
- Nuriye Sancar
- Department of Mathematics, Near East University, Nicosia, 99138, Turkey.
| | - Sahar S Tabrizi
- Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| |
Collapse
|
3
|
Qu Z, Yao T, Liu X, Wang G. A Graph Convolutional Network Based on Univariate Neurodegeneration Biomarker for Alzheimer's Disease Diagnosis. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2023; 11:405-416. [PMID: 37492469 PMCID: PMC10365071 DOI: 10.1109/jtehm.2023.3285723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 01/20/2023] [Accepted: 06/05/2023] [Indexed: 07/27/2023]
Abstract
OBJECTIVE Alzheimer's disease (AD) is a progressive and irreversible neurodegenerative disease that is not easily detectable in the early stage. This study proposed an efficient method of applying a graph convolutional network (GCN) on the early prediction of AD. METHODS We proposed a univariate neurodegeneration biomarker (UNB) based GCN semi-supervised classification framework. We generated UNB by comparing the similarity of individual morphological atrophy pattern and the atrophy pattern of [Formula: see text] AD group according to the brain morphological abnormalities induced by AD. For the GCN semi-supervised classification model, we took the UNBs of individuals as the features of nodes and constructed the weight of edges according to the similarity of phenotypic information between individuals, which explored the essential features of individuals through spectral graph convolution. The attention module was constructed and embedded into the GCN framework, which may refine the input morphological features to highlight the main impact of AD on the cerebral cortex and weaken the instability caused by individual diversities, thereby identifying the significant ROIs affected by AD and improving the classification accuracy. RESULTS We tested the UNB-GCN framework on the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The estimated minimum sample sizes were 156, 349 and 423 for the longitudinal [Formula: see text] AD, [Formula: see text] mild cognitive impairment (MCI) and [Formula: see text] cognitively unimpaired (CU) groups, respectively. And the proposed UNB-GCN framework combined with the attention module can effectively improve the classification performance with 93.90% classification accuracy for AD vs. CU and 82.05% for AD vs. MCI on the validation set. CONCLUSION The proposed UNB measures were superior to the conventional volume measures in describing the AD-induced cerebral cortex morphological changes. And the UNB-GCN framework combined with attention module may effectively improve the classification performance between MCI subjects and AD patients. Clinical and Translational Impact Statement: This study aims to predict the early AD patients, so as to help clinicians develop effective interventions to delay the deterioration of AD symptoms.
Collapse
Affiliation(s)
- Zongshuai Qu
- School of Information and Electrical EngineeringLudong UniversityYantai264025China
| | - Tao Yao
- School of Information and Electrical EngineeringLudong UniversityYantai264025China
| | - Xinghui Liu
- Shandong Vheng Data Technology Company Ltd.Yantai264003China
| | - Gang Wang
- School of Ulsan Ship and Ocean CollegeLudong UniversityYantai264025China
| |
Collapse
|
4
|
Qin Y, Wu J, Xiao W, Wang K, Huang A, Liu B, Yu J, Li C, Yu F, Ren Z. Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph192215027. [PMID: 36429751 PMCID: PMC9690067 DOI: 10.3390/ijerph192215027] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 11/04/2022] [Accepted: 11/10/2022] [Indexed: 06/01/2023]
Abstract
The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999-2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients.
Collapse
Affiliation(s)
- Yifan Qin
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Jinlong Wu
- College of Physical Education, Southwest University, Chongqing 400715, China
| | - Wen Xiao
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Kun Wang
- Physical Education College, Yanching Institute of Technology, Langfang 065201, China
| | - Anbing Huang
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Bowen Liu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Jingxuan Yu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Chuhao Li
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Fengyu Yu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Zhanbing Ren
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| |
Collapse
|
5
|
Teji JS, Jain S, Gupta SK, Suri JS. NeoAI 1.0: Machine learning-based paradigm for prediction of neonatal and infant risk of death. Comput Biol Med 2022; 147:105639. [DOI: 10.1016/j.compbiomed.2022.105639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 05/01/2022] [Accepted: 05/01/2022] [Indexed: 11/29/2022]
|
6
|
Research on Students' Mental Health Based on Data Mining Algorithms. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:1382559. [PMID: 34733450 PMCID: PMC8560244 DOI: 10.1155/2021/1382559] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/14/2021] [Accepted: 10/08/2021] [Indexed: 11/24/2022]
Abstract
With the diversification and rapid development of society, people's living conditions, learning and friendship conditions, and employment conditions are facing increasing pressure, which greatly challenges people's psychological endurance. Therefore, strengthening the mental health education of students has become an urgent need of society and a hot issue of common concern. In order to solve the problems of high misjudgment rate and low work efficiency in the current mental health intelligence evaluation process, a mental health intelligence evaluation system based on a joint optimization algorithm is proposed. The joint optimization algorithm consists of an improved decision tree algorithm and an improved ANN algorithm. First, analyze the current research status of mental health intelligence evaluation, and construct the framework of mental health intelligence evaluation system; then collect mental health intelligence evaluation data based on data mining, use joint learning algorithm to analyze and classify mental health intelligence evaluation data, and obtain mental health intelligence evaluation results. Finally, through specific simulation experiments, the feasibility and superiority of the mental health intelligent evaluation system are analyzed. The results show that the system in the article overcomes the shortcomings of the existing mental health intelligence evaluation system, improves the accuracy of mental health intelligence evaluation, and improves the efficiency of mental health intelligence evaluation. It has good system stability and can meet the actual current situation, which are requirements for mental health intelligence evaluation.
Collapse
|
7
|
Tan MS, Cheah PL, Chin AV, Looi LM, Chang SW. A review on omics-based biomarkers discovery for Alzheimer's disease from the bioinformatics perspectives: Statistical approach vs machine learning approach. Comput Biol Med 2021; 139:104947. [PMID: 34678481 DOI: 10.1016/j.compbiomed.2021.104947] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 10/12/2021] [Accepted: 10/12/2021] [Indexed: 12/26/2022]
Abstract
Alzheimer's Disease (AD) is a neurodegenerative disease that affects cognition and is the most common cause of dementia in the elderly. As the number of elderly individuals increases globally, the incidence and prevalence of AD are expected to increase. At present, AD is diagnosed clinically, according to accepted criteria. The essential elements in the diagnosis of AD include a patients history, a physical examination and neuropsychological testing, in addition to appropriate investigations such as neuroimaging. The omics-based approach is an emerging field of study that may not only aid in the diagnosis of AD but also facilitate the exploration of factors that influence the development of the disease. Omics techniques, including genomics, transcriptomics, proteomics and metabolomics, may reveal the pathways that lead to neuronal death and identify biomolecular markers associated with AD. This will further facilitate an understanding of AD neuropathology. In this review, omics-based approaches that were implemented in studies on AD were assessed from a bioinformatics perspective. Current state-of-the-art statistical and machine learning approaches used in the single omics analysis of AD were compared based on correlations of variants, differential expression, functional analysis and network analysis. This was followed by a review of the approaches used in the integration and analysis of multi-omics of AD. The strengths and limitations of multi-omics analysis methods were explored and the issues and challenges associated with omics studies of AD were highlighted. Lastly, future studies in this area of research were justified.
Collapse
Affiliation(s)
- Mei Sze Tan
- Bioinformatics Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Phaik-Leng Cheah
- Department of Pathology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Ai-Vyrn Chin
- Division of Geriatric Medicine, Department of Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Lai-Meng Looi
- Department of Pathology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Siow-Wee Chang
- Bioinformatics Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia.
| |
Collapse
|
8
|
Veluppal A, Sadhukhan D, Gopinath V, Swaminathan R. Detection of Mild Cognitive Impairment using Kernel Density Estimation based texture analysis of the Corpus Callosum in brain MR images. Ing Rech Biomed 2021. [DOI: 10.1016/j.irbm.2021.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
9
|
Sun H, Wang A, Wang W, Liu C. An Improved Deep Residual Network Prediction Model for the Early Diagnosis of Alzheimer's Disease. SENSORS (BASEL, SWITZERLAND) 2021; 21:4182. [PMID: 34207145 PMCID: PMC8235495 DOI: 10.3390/s21124182] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 06/14/2021] [Accepted: 06/16/2021] [Indexed: 12/16/2022]
Abstract
The early diagnosis of Alzheimer's disease (AD) can allow patients to take preventive measures before irreversible brain damage occurs. It can be seen from cross-sectional imaging studies of AD that the features of the lesion areas in AD patients, as observed by magnetic resonance imaging (MRI), show significant variation, and these features are distributed throughout the image space. Since the convolutional layer of the general convolutional neural network (CNN) cannot satisfactorily extract long-distance correlation in the feature space, a deep residual network (ResNet) model, based on spatial transformer networks (STN) and the non-local attention mechanism, is proposed in this study for the early diagnosis of AD. In this ResNet model, a new Mish activation function is selected in the ResNet-50 backbone to replace the Relu function, STN is introduced between the input layer and the improved ResNet-50 backbone, and a non-local attention mechanism is introduced between the fourth and the fifth stages of the improved ResNet-50 backbone. This ResNet model can extract more information from the layers by deepening the network structure through deep ResNet. The introduced STN can transform the spatial information in MRI images of Alzheimer's patients into another space and retain the key information. The introduced non-local attention mechanism can find the relationship between the lesion areas and normal areas in the feature space. This model can solve the problem of local information loss in traditional CNN and can extract the long-distance correlation in feature space. The proposed method was validated using the ADNI (Alzheimer's disease neuroimaging initiative) experimental dataset, and compared with several models. The experimental results show that the classification accuracy of the algorithm proposed in this study can reach 97.1%, the macro precision can reach 95.5%, the macro recall can reach 95.3%, and the macro F1 value can reach 95.4%. The proposed model is more effective than other algorithms.
Collapse
Affiliation(s)
- Haijing Sun
- College of Information Science and Engineering, Northeastern University, Shenyang 110819, China; (H.S.); (W.W.); (C.L.)
- College of Information Engineering, Shenyang University, Shenyang 110044, China
| | - Anna Wang
- College of Information Science and Engineering, Northeastern University, Shenyang 110819, China; (H.S.); (W.W.); (C.L.)
| | - Wenhui Wang
- College of Information Science and Engineering, Northeastern University, Shenyang 110819, China; (H.S.); (W.W.); (C.L.)
| | - Chen Liu
- College of Information Science and Engineering, Northeastern University, Shenyang 110819, China; (H.S.); (W.W.); (C.L.)
| |
Collapse
|