1
|
Chellappan D, Rajaguru H. Machine Learning Meets Meta-Heuristics: Bald Eagle Search Optimization and Red Deer Optimization for Feature Selection in Type II Diabetes Diagnosis. Bioengineering (Basel) 2024; 11:766. [PMID: 39199724 PMCID: PMC11351847 DOI: 10.3390/bioengineering11080766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 07/10/2024] [Accepted: 07/22/2024] [Indexed: 09/01/2024] Open
Abstract
This article investigates the effectiveness of feature extraction and selection techniques in enhancing the performance of classifier accuracy in Type II Diabetes Mellitus (DM) detection using microarray gene data. To address the inherent high dimensionality of the data, three feature extraction (FE) methods are used, namely Short-Time Fourier Transform (STFT), Ridge Regression (RR), and Pearson's Correlation Coefficient (PCC). To further refine the data, meta-heuristic algorithms like Bald Eagle Search Optimization (BESO) and Red Deer Optimization (RDO) are utilized for feature selection. The performance of seven classification techniques, Non-Linear Regression-NLR, Linear Regression-LR, Gaussian Mixture Models-GMMs, Expectation Maximization-EM, Logistic Regression-LoR, Softmax Discriminant Classifier-SDC, and Support Vector Machine with Radial Basis Function kernel-SVM-RBF, are evaluated with and without feature selection. The analysis reveals that the combination of PCC with SVM-RBF achieved a promising accuracy of 92.85% even without feature selection. Notably, employing BESO with PCC and SVM-RBF maintained this high accuracy. However, the highest overall accuracy of 97.14% was achieved when RDO was used for feature selection alongside PCC and SVM-RBF. These findings highlight the potential of feature extraction and selection techniques, particularly RDO with PCC, in improving the accuracy of DM detection using microarray gene data.
Collapse
Affiliation(s)
- Dinesh Chellappan
- Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore 641 407, Tamil Nadu, India;
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638 401, Tamil Nadu, India
| |
Collapse
|
2
|
Wang Y, Zhang J, Yuan J, Li Q, Zhang S, Wang C, Wang H, Wang L, Zhang B, Wang C, Sun Y, Lu X. Application of a novel nested ensemble algorithm in predicting motor function recovery in patients with traumatic cervical spinal cord injury. Sci Rep 2024; 14:17403. [PMID: 39075134 PMCID: PMC11286788 DOI: 10.1038/s41598-024-65755-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 06/24/2024] [Indexed: 07/31/2024] Open
Abstract
Traumatic cervical spinal cord injury (TCSCI) often causes varying degrees of motor dysfunction, common assessed by the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI), in association with the American Spinal Injury Association (ASIA) Impairment Scale. Accurate prediction of motor function recovery is extremely important for formulating effective diagnosis, therapeutic and rehabilitation programs. The aim of this study is to investigate the validity of a novel nested ensemble algorithm that uses the very early ASIA motor score (AMS) of ISNCSCI examination to predict motor function recovery 6 months after injury in TCSCI patients. This retrospective study included complete data of 315 TCSCI patients. The dataset consisting of the first AMS at ≤ 24 h post-injury and follow-up AMS at 6 months post-injury was divided into a training set (80%) and a test set (20%). The nested ensemble algorithm was established in a two-stage manner. Support Vector Classification (SVC), Adaboost, Weak-learner and Dummy were used in the first stage, and Adaboost was selected as second-stage model. The prediction results of the first stage models were uploaded into second-stage model to obtain the final prediction results. The model performance was evaluated using precision, recall, accuracy, F1 score, and confusion matrix. The nested ensemble algorithm was applied to predict motor function recovery of TCSCI, achieving an accuracy of 80.6%, a F1 score of 80.6%, and balancing sensitivity and specificity. The confusion matrix showed few false-negative rate, which has crucial practical implications for prognostic prediction of TCSCI. This novel nested ensemble algorithm, simply based on very early AMS, provides a useful tool for predicting motor function recovery 6 months after TCSCI, which is graded in gradients that progressively improve the accuracy and reliability of the prediction, demonstrating a strong potential of ensemble learning to personalize and optimize the rehabilitation and care of TCSCI patients.
Collapse
Affiliation(s)
- Yijin Wang
- North Sichuan Medical College, No. 234 Fuljiang Road, Shunqing District, Nanchong, 637100, Sichuan, People's Republic of China
- Department of Orthopedic Surgery, Changzheng Hospital, Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, People's Republic of China
| | - Jianjun Zhang
- North Sichuan Medical College, No. 234 Fuljiang Road, Shunqing District, Nanchong, 637100, Sichuan, People's Republic of China
- Department of Orthopedic Surgery, Changzheng Hospital, Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, People's Republic of China
| | - Jincan Yuan
- Department of Orthopedic Surgery, Changzheng Hospital, Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, People's Republic of China
| | - Qingyuan Li
- North Sichuan Medical College, No. 234 Fuljiang Road, Shunqing District, Nanchong, 637100, Sichuan, People's Republic of China
| | - Shiyu Zhang
- UCSI University, No. 1, Jalan UCSI, UCSI Heights, 56000, Cheras, Kuala Lumpur, Malaysia
| | - Chenfeng Wang
- Zhejiang University, No. 866 Yuhangtang Road, Xihu District, Hangzhou, 310058, Zhejiang, People's Republic of China
| | - Haibing Wang
- Department of Orthopedic Surgery, Changzheng Hospital, Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, People's Republic of China
| | - Liang Wang
- Department of Orthopedic Surgery, Changzheng Hospital, Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, People's Republic of China
| | - Bangke Zhang
- Department of Orthopedic Surgery, Changzheng Hospital, Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, People's Republic of China
| | - Can Wang
- North Sichuan Medical College, No. 234 Fuljiang Road, Shunqing District, Nanchong, 637100, Sichuan, People's Republic of China
- Department of Orthopedic Surgery, Changzheng Hospital, Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, People's Republic of China
| | - Yuling Sun
- Department of Orthopedic Surgery, Changzheng Hospital, Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, People's Republic of China.
| | - Xuhua Lu
- Department of Orthopedic Surgery, Changzheng Hospital, Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, People's Republic of China.
| |
Collapse
|
3
|
Soares Dias Portela A, Saxena V, Rosenn E, Wang SH, Masieri S, Palmieri J, Pasinetti GM. Role of Artificial Intelligence in Multinomial Decisions and Preventative Nutrition in Alzheimer's Disease. Mol Nutr Food Res 2024; 68:e2300605. [PMID: 38175857 DOI: 10.1002/mnfr.202300605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 10/04/2023] [Indexed: 01/06/2024]
Abstract
Alzheimer's disease (AD) affects 50 million people worldwide, an increase of 35 million since 2015, and it is known for memory loss and cognitive decline. Considering the morbidity associated with AD, it is important to explore lifestyle elements influencing the chances of developing AD, with special emphasis on nutritional aspects. This review will first discuss how dietary factors have an impact in AD development and the possible role of Artificial Intelligence (AI) and Machine Learning (ML) in preventative care of AD patients through nutrition. The Mediterranean-DASH diets provide individuals with many nutrient benefits which assists the prevention of neurodegeneration by having neuroprotective roles. Lack of micronutrients, protein-energy, and polyunsaturated fatty acids increase the chance of cognitive decline, loss of memory, and synaptic dysfunction among others. ML software has the ability to design models of algorithms from data introduced to present practical solutions that are accessible and easy to use. It can give predictions for a precise medicine approach to evaluate individuals as a whole. There is no doubt the future of nutritional science lies on customizing diets for individuals to reduce dementia risk factors, maintain overall health and brain function.
Collapse
Affiliation(s)
| | - Vrinda Saxena
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Eric Rosenn
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Shu-Han Wang
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Sibilla Masieri
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Joshua Palmieri
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
| | - Giulio Maria Pasinetti
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10019, USA
- Geriatrics Research, Education and Clinical Center, JJ Peters VA Medical Center, Bronx, NY, 10468, USA
| |
Collapse
|
4
|
Grudza M, Salinel B, Zeien S, Murphy M, Adkins J, Jensen CT, Bay C, Kodibagkar V, Koo P, Dragovich T, Choti MA, Kundranda M, Syeda-Mahmood T, Wang HZ, Chang J. Methods for improving colorectal cancer annotation efficiency for artificial intelligence-observer training. World J Radiol 2023; 15:359-369. [PMID: 38179201 PMCID: PMC10762523 DOI: 10.4329/wjr.v15.i12.359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/13/2023] [Accepted: 12/05/2023] [Indexed: 12/26/2023] Open
Abstract
BACKGROUND Missing occult cancer lesions accounts for the most diagnostic errors in retrospective radiology reviews as early cancer can be small or subtle, making the lesions difficult to detect. Second-observer is the most effective technique for reducing these events and can be economically implemented with the advent of artificial intelligence (AI). AIM To achieve appropriate AI model training, a large annotated dataset is necessary to train the AI models. Our goal in this research is to compare two methods for decreasing the annotation time to establish ground truth: Skip-slice annotation and AI-initiated annotation. METHODS We developed a 2D U-Net as an AI second observer for detecting colorectal cancer (CRC) and an ensemble of 5 differently initiated 2D U-Net for ensemble technique. Each model was trained with 51 cases of annotated CRC computed tomography of the abdomen and pelvis, tested with 7 cases, and validated with 20 cases from The Cancer Imaging Archive cases. The sensitivity, false positives per case, and estimated Dice coefficient were obtained for each method of training. We compared the two methods of annotations and the time reduction associated with the technique. The time differences were tested using Friedman's two-way analysis of variance. RESULTS Sparse annotation significantly reduces the time for annotation particularly skipping 2 slices at a time (P < 0.001). Reduction of up to 2/3 of the annotation does not reduce AI model sensitivity or false positives per case. Although initializing human annotation with AI reduces the annotation time, the reduction is minimal, even when using an ensemble AI to decrease false positives. CONCLUSION Our data support the sparse annotation technique as an efficient technique for reducing the time needed to establish the ground truth.
Collapse
Affiliation(s)
- Matthew Grudza
- School of Biological Health and Systems Engineering, Arizona State University, Tempe, AZ 85287, United States
| | - Brandon Salinel
- Department of Radiology, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| | - Sarah Zeien
- School of Osteopathic Medicine, A.T. Still University, Mesa, AZ 85206, United States
| | - Matthew Murphy
- School of Osteopathic Medicine, A.T. Still University, Mesa, AZ 85206, United States
| | - Jake Adkins
- Department of Abdominal Imaging, MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Corey T Jensen
- Department of Abdominal Imaging, University Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Curtis Bay
- Department of Interdisciplinary Sciences, A.T. Still University, Mesa, AZ 85206, United States
| | - Vikram Kodibagkar
- School of Biological and Health Systems Engineering, Arizona State University, Tempe, AZ 85287, United States
| | - Phillip Koo
- Department of Radiology, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| | - Tomislav Dragovich
- Division of Cancer Medicine, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| | - Michael A Choti
- Department of Surgical Oncology, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| | - Madappa Kundranda
- Division of Cancer Medicine, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| | | | - Hong-Zhi Wang
- IBM Almaden Research Center, IBM, San Jose, CA 95120, United States
| | - John Chang
- Department of Radiology, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| |
Collapse
|
5
|
Chellappan D, Rajaguru H. Enhancement of Classifier Performance Using Swarm Intelligence in Detection of Diabetes from Pancreatic Microarray Gene Data. Biomimetics (Basel) 2023; 8:503. [PMID: 37887634 PMCID: PMC10604158 DOI: 10.3390/biomimetics8060503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/08/2023] [Accepted: 10/20/2023] [Indexed: 10/28/2023] Open
Abstract
In this study, we focused on using microarray gene data from pancreatic sources to detect diabetes mellitus. Dimensionality reduction (DR) techniques were used to reduce the dimensionally high microarray gene data. DR methods like the Bessel function, Discrete Cosine Transform (DCT), Least Squares Linear Regression (LSLR), and Artificial Algae Algorithm (AAA) are used. Subsequently, we applied meta-heuristic algorithms like the Dragonfly Optimization Algorithm (DOA) and Elephant Herding Optimization Algorithm (EHO) for feature selection. Classifiers such as Nonlinear Regression (NLR), Linear Regression (LR), Gaussian Mixture Model (GMM), Expectation Maximum (EM), Bayesian Linear Discriminant Classifier (BLDC), Logistic Regression (LoR), Softmax Discriminant Classifier (SDC), and Support Vector Machine (SVM) with three types of kernels, Linear, Polynomial, and Radial Basis Function (RBF), were utilized to detect diabetes. The classifier's performance was analyzed based on parameters like accuracy, F1 score, MCC, error rate, FM metric, and Kappa. Without feature selection, the SVM (RBF) classifier achieved a high accuracy of 90% using the AAA DR methods. The SVM (RBF) classifier using the AAA DR method for EHO feature selection outperformed the other classifiers with an accuracy of 95.714%. This improvement in the accuracy of the classifier's performance emphasizes the role of feature selection methods.
Collapse
Affiliation(s)
- Dinesh Chellappan
- Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore 641 407, Tamil Nadu, India;
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638 401, Tamil Nadu, India
| |
Collapse
|
6
|
Chellappan D, Rajaguru H. Detection of Diabetes through Microarray Genes with Enhancement of Classifiers Performance. Diagnostics (Basel) 2023; 13:2654. [PMID: 37627916 PMCID: PMC10453776 DOI: 10.3390/diagnostics13162654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/06/2023] [Accepted: 08/07/2023] [Indexed: 08/27/2023] Open
Abstract
Diabetes is a life-threatening, non-communicable disease. Diabetes mellitus is a prevalent chronic disease with a significant global impact. The timely detection of diabetes in patients is necessary for an effective treatment. The primary objective of this study is to propose a novel approach for identifying type II diabetes mellitus using microarray gene data. Specifically, our research focuses on the performance enhancement of methods for detecting diabetes. Four different Dimensionality Reduction techniques, Detrend Fluctuation Analysis (DFA), the Chi-square probability density function (Chi2pdf), the Firefly algorithm, and Cuckoo Search, are used to reduce high dimensional data. Metaheuristic algorithms like Particle Swarm Optimization (PSO) and Harmonic Search (HS) are used for feature selection. Seven classifiers, Non-Linear Regression (NLR), Linear Regression (LR), Logistics Regression (LoR), Gaussian Mixture Model (GMM), Bayesian Linear Discriminant Classifier (BLDC), Softmax Discriminant Classifier (SDC), and Support Vector Machine-Radial Basis Function (SVM-RBF), are utilized to classify the diabetic and non-diabetic classes. The classifiers' performances are analyzed through parameters such as accuracy, recall, precision, F1 score, error rate, Matthews Correlation Coefficient (MCC), Jaccard metric, and kappa. The SVM (RBF) classifier with the Chi2pdf Dimensionality Reduction technique with a PSO feature selection method attained a high accuracy of 91% with a Kappa of 0.7961, outperforming all of the other classifiers.
Collapse
Affiliation(s)
- Dinesh Chellappan
- Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore 641 407, Tamil Nadu, India
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638 401, Tamil Nadu, India;
| |
Collapse
|
7
|
Hsu CT, Pai KC, Chen LC, Lin SH, Wu MJ. Machine Learning Models to Predict the Risk of Rapidly Progressive Kidney Disease and the Need for Nephrology Referral in Adult Patients with Type 2 Diabetes. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3396. [PMID: 36834088 PMCID: PMC9967274 DOI: 10.3390/ijerph20043396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/10/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
Early detection of rapidly progressive kidney disease is key to improving the renal outcome and reducing complications in adult patients with type 2 diabetes mellitus (T2DM). We aimed to construct a 6-month machine learning (ML) predictive model for the risk of rapidly progressive kidney disease and the need for nephrology referral in adult patients with T2DM and an initial estimated glomerular filtration rate (eGFR) ≥ 60 mL/min/1.73 m2. We extracted patients and medical features from the electronic medical records (EMR), and the cohort was divided into a training/validation and testing data set to develop and validate the models on the basis of three algorithms: logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost). We also applied an ensemble approach using soft voting classifier to classify the referral group. We used the area under the receiver operating characteristic curve (AUROC), precision, recall, and accuracy as the metrics to evaluate the performance. Shapley additive explanations (SHAP) values were used to evaluate the feature importance. The XGB model had higher accuracy and relatively higher precision in the referral group as compared with the LR and RF models, but LR and RF models had higher recall in the referral group. In general, the ensemble voting classifier had relatively higher accuracy, higher AUROC, and higher recall in the referral group as compared with the other three models. In addition, we found a more specific definition of the target improved the model performance in our study. In conclusion, we built a 6-month ML predictive model for the risk of rapidly progressive kidney disease. Early detection and then nephrology referral may facilitate appropriate management.
Collapse
Affiliation(s)
- Chia-Tien Hsu
- Division of Nephrology, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung 40705, Taiwan
- School of Medicine, National Yang Ming Chiao Tung University, Taipei 112304, Taiwan
| | - Kai-Chih Pai
- College of Engineering, Tunghai University, Taichung 407224, Taiwan
| | - Lun-Chi Chen
- College of Engineering, Tunghai University, Taichung 407224, Taiwan
| | - Shau-Hung Lin
- DDS-THU AI Center, Tunghai University, Taichung 407224, Taiwan
| | - Ming-Ju Wu
- Division of Nephrology, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung 40705, Taiwan
- Department of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung 40227, Taiwan
- RongHsing Research Center for Translational Medicine, College of Life Sciences, National Chung Hsing University, Taichung 40227, Taiwan
- Ph.D. Program in Translational Medicine, National Chung Hsing University, Taichung 40227, Taiwan
- School of Medicine, Chung Shan Medical University, Taichung 40201, Taiwan
- Graduate Institute of Biomedical Sciences, College of Medicine, China Medical University, Taichung 404333, Taiwan
| |
Collapse
|
8
|
Liu Q, Zhang M, He Y, Zhang L, Zou J, Yan Y, Guo Y. Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques. J Pers Med 2022; 12:jpm12060905. [PMID: 35743691 PMCID: PMC9224915 DOI: 10.3390/jpm12060905] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Revised: 05/21/2022] [Accepted: 05/27/2022] [Indexed: 02/04/2023] Open
Abstract
Early identification of individuals at high risk of diabetes is crucial for implementing early intervention strategies. However, algorithms specific to elderly Chinese adults are lacking. The aim of this study is to build effective prediction models based on machine learning (ML) for the risk of type 2 diabetes mellitus (T2DM) in Chinese elderly. A retrospective cohort study was conducted using the health screening data of adults older than 65 years in Wuhan, China from 2018 to 2020. With a strict data filtration, 127,031 records from the eligible participants were utilized. Overall, 8298 participants were diagnosed with incident T2DM during the 2-year follow-up (2019–2020). The dataset was randomly split into training set (n = 101,625) and test set (n = 25,406). We developed prediction models based on four ML algorithms: logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost). Using LASSO regression, 21 prediction features were selected. The Random under-sampling (RUS) was applied to address the class imbalance, and the Shapley Additive Explanations (SHAP) was used to calculate and visualize feature importance. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy. The XGBoost model achieved the best performance (AUC = 0.7805, sensitivity = 0.6452, specificity = 0.7577, accuracy = 0.7503). Fasting plasma glucose (FPG), education, exercise, gender, and waist circumference (WC) were the top five important predictors. This study showed that XGBoost model can be applied to screen individuals at high risk of T2DM in the early phrase, which has the strong potential for intelligent prevention and control of diabetes. The key features could also be useful for developing targeted diabetes prevention interventions.
Collapse
Affiliation(s)
- Qing Liu
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (M.Z.)
| | - Miao Zhang
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (M.Z.)
| | - Yifeng He
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Lei Zhang
- School of Mathematics and Statistics, Wuhan University, Wuhan 430070, China;
| | - Jingui Zou
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Yaqiong Yan
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
| | - Yan Guo
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
- Correspondence:
| |
Collapse
|
9
|
Wu C, Zhou T, Tian Y, Wu J, Li J, Liu Z. A method for the early prediction of chronic diseases based on short sequential medical data. Artif Intell Med 2022; 127:102262. [DOI: 10.1016/j.artmed.2022.102262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 02/18/2022] [Accepted: 02/23/2022] [Indexed: 11/30/2022]
|
10
|
Haneef R, Tijhuis M, Thiébaut R, Májek O, Pristaš I, Tolenan H, Gallay A. Methodological guidelines to estimate population-based health indicators using linked data and/or machine learning techniques. Arch Public Health 2022; 80:9. [PMID: 34983651 PMCID: PMC8725299 DOI: 10.1186/s13690-021-00770-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 12/17/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The capacity to use data linkage and artificial intelligence to estimate and predict health indicators varies across European countries. However, the estimation of health indicators from linked administrative data is challenging due to several reasons such as variability in data sources and data collection methods resulting in reduced interoperability at various levels and timeliness, availability of a large number of variables, lack of skills and capacity to link and analyze big data. The main objective of this study is to develop the methodological guidelines calculating population-based health indicators to guide European countries using linked data and/or machine learning (ML) techniques with new methods. METHOD We have performed the following step-wise approach systematically to develop the methodological guidelines: i. Scientific literature review, ii. Identification of inspiring examples from European countries, and iii. Developing the checklist of guidelines contents. RESULTS We have developed the methodological guidelines, which provide a systematic approach for studies using linked data and/or ML-techniques to produce population-based health indicators. These guidelines include a detailed checklist of the following items: rationale and objective of the study (i.e., research question), study design, linked data sources, study population/sample size, study outcomes, data preparation, data analysis (i.e., statistical techniques, sensitivity analysis and potential issues during data analysis) and study limitations. CONCLUSIONS This is the first study to develop the methodological guidelines for studies focused on population health using linked data and/or machine learning techniques. These guidelines would support researchers to adopt and develop a systematic approach for high-quality research methods. There is a need for high-quality research methodologies using more linked data and ML-techniques to develop a structured cross-disciplinary approach for improving the population health information and thereby the population health.
Collapse
Affiliation(s)
- Romana Haneef
- Department of Non-Communicable Diseases and Injuries, Santé Publique France, Saint-Maurice, France.
| | - Mariken Tijhuis
- National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Rodolphe Thiébaut
- Bordeaux University, Bordeaux School of Public Health, Bordeaux, France.,INSERM / INRIA SISTM team, Bordeaux Population health, Bordeaux, France.,Medical Information Department, Bordeaux University Hospital, Bordeaux, France
| | - Ondřej Májek
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic.,Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Ivan Pristaš
- National Institute of public health, division of health informatics and biostatistics, Zagreb, Croatia
| | - Hanna Tolenan
- Finnish Institute for Health and Welfare (THL), Helsinki, Finland
| | - Anne Gallay
- Department of Non-Communicable Diseases and Injuries, Santé Publique France, Saint-Maurice, France
| |
Collapse
|
11
|
Fregoso-Aparicio L, Noguez J, Montesinos L, García-García JA. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 2021; 13:148. [PMID: 34930452 PMCID: PMC8686642 DOI: 10.1186/s13098-021-00767-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Diabetes Mellitus is a severe, chronic disease that occurs when blood glucose levels rise above certain limits. Over the last years, machine and deep learning techniques have been used to predict diabetes and its complications. However, researchers and developers still face two main challenges when building type 2 diabetes predictive models. First, there is considerable heterogeneity in previous studies regarding techniques used, making it challenging to identify the optimal one. Second, there is a lack of transparency about the features used in the models, which reduces their interpretability. This systematic review aimed at providing answers to the above challenges. The review followed the PRISMA methodology primarily, enriched with the one proposed by Keele and Durham Universities. Ninety studies were included, and the type of model, complementary techniques, dataset, and performance parameters reported were extracted. Eighteen different types of models were compared, with tree-based algorithms showing top performances. Deep Neural Networks proved suboptimal, despite their ability to deal with big and dirty data. Balancing data and feature selection techniques proved helpful to increase the model's efficiency. Models trained on tidy datasets achieved almost perfect models.
Collapse
Affiliation(s)
- Luis Fregoso-Aparicio
- School of Engineering and Sciences, Tecnologico de Monterrey, Av Lago de Guadalupe KM 3.5, Margarita Maza de Juarez, 52926 Cd Lopez Mateos, Mexico
| | - Julieta Noguez
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - Luis Montesinos
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - José A. García-García
- Hospital General de Mexico Dr. Eduardo Liceaga, Dr. Balmis 148, Doctores, Cuauhtemoc, 06720 Mexico City, Mexico
| |
Collapse
|