1
|
Kivrak M, Avci U, Uzun H, Ardic C. The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients. Diagnostics (Basel) 2024; 14:2634. [PMID: 39682541 DOI: 10.3390/diagnostics14232634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 11/13/2024] [Accepted: 11/18/2024] [Indexed: 12/18/2024] Open
Abstract
BACKGROUND AND OBJECTIVE Diabetes Mellitus is a long-term, multifaceted metabolic condition that necessitates ongoing medical management. Hypogonadism is a syndrome that is a clinical and/or biochemical indicator of testosterone deficiency. Cross-sectional studies have reported that 20-80.4% of all men with Type 2 diabetes have hypogonadism, and Type 2 diabetes is related to low testosterone. This study presents an analysis of the use of ML and EL classifiers in predicting testosterone deficiency. In our study, we compared optimized traditional ML classifiers and three EL classifiers using grid search and stratified k-fold cross-validation. We used the SMOTE method for the class imbalance problem. METHODS This database contains 3397 patients for the assessment of testosterone deficiency. Among these patients, 1886 patients with Type 2 diabetes were included in the study. In the data preprocessing stage, firstly, outlier/excessive observation analyses were performed with LOF and missing value analyses were performed with random forest. The SMOTE is a method for generating synthetic samples of the minority class. Four basic classifiers, namely MLP, RF, ELM and LR, were used as first-level classifiers. Tree ensemble classifiers, namely ADA, XGBoost and SGB, were used as second-level classifiers. RESULTS After the SMOTE, while the diagnostic accuracy decreased in all base classifiers except ELM, sensitivity values increased in all classifiers. Similarly, while the specificity values decreased in all classifiers, F1 score increased. The RF classifier gave more successful results on the base-training dataset. The most successful ensemble classifier in the training dataset was the ADA classifier in the original data and in the SMOTE data. In terms of the testing data, XGBoost is the most suitable model for your intended use in evaluating model performance. XGBoost, which exhibits a balanced performance especially when the SMOTE is used, can be preferred to correct class imbalance. CONCLUSIONS The SMOTE is used to correct the class imbalance in the original data. However, as seen in this study, when the SMOTE was applied, the diagnostic accuracy decreased in some models but the sensitivity increased significantly. This shows the positive effects of the SMOTE in terms of better predicting the minority class.
Collapse
Affiliation(s)
- Mehmet Kivrak
- Faculty of Medicine, Biostatistics and Medical Informatics, Recep Tayyip Erdogan University, Rize 53100, Türkiye
| | - Ugur Avci
- Faculty of Medicine, Endocrinology and Metabolism, Recep Tayyip Erdogan University, Rize 53100, Türkiye
| | - Hakki Uzun
- Faculty of Medicine, Urology, Recep Tayyip Erdogan University, Rize 53100, Türkiye
| | - Cuneyt Ardic
- Faculty of Medicine, Primary Care Physician, Recep Tayyip Erdogan University, Rize 53100, Türkiye
| |
Collapse
|
2
|
Kwon H, Lee S, Georgoulis H, Beauregard E, Sea J. Understanding sexual homicide in Korea using machine learning algorithms. BEHAVIORAL SCIENCES & THE LAW 2024; 42:495-510. [PMID: 38857247 DOI: 10.1002/bsl.2676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 05/22/2024] [Accepted: 05/23/2024] [Indexed: 06/12/2024]
Abstract
The current study was conducted to confirm the characteristics in sexual homicide and to explore variables that effectively differentiate sexual homicide and nonsexual homicide. Further, newer methods that have received attention in criminology, such as the machine learning method, were used to explore the ideal algorithm for classifying sexual homicide and patterns for sexual homicide in Korea. To do this, 542 homicide cases were analyzed utilizing eight algorithms, and the classification performance of each algorithm was analyzed along with the importance of variables. The results of the analysis revealed that the Naive Bayes, K-Nearest Neighbors, and RF algorithms demonstrate good classification accuracy, and generally, factors such as relationships, marriage, planning, personal weapons, and overkill were identified as crucial variables that distinguish sexual homicide in Korea. In addition, the crime scene information of the crime occurring in the dark (at night) and body disposal were found to have high importance. The current study proposes ways to enhance the efficacy of crime investigation and advance the research on sexual homicides in Korea through a more scientific understanding of sexual homicide that has not been thoroughly explored domestically.
Collapse
Affiliation(s)
- Hyeokjun Kwon
- Department of Psychology, Yeungnam University, Gyeongsan-si, Republic of Korea
| | - Sanggyung Lee
- Seoul Metropolitan Police Agency, Seoul, Republic of Korea
| | - Hana Georgoulis
- School of Criminology, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Eric Beauregard
- School of Criminology, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Jonghan Sea
- Department of Psychology, Yeungnam University, Gyeongsan-si, Republic of Korea
| |
Collapse
|
3
|
Tehrani SSM, Zarvani M, Amiri P, Ghods Z, Raoufi M, Safavi-Naini SAA, Soheili A, Gharib M, Abbasi H. Visual transformer and deep CNN prediction of high-risk COVID-19 infected patients using fusion of CT images and clinical data. BMC Med Inform Decis Mak 2023; 23:265. [PMID: 37978393 PMCID: PMC10656999 DOI: 10.1186/s12911-023-02344-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 10/16/2023] [Indexed: 11/19/2023] Open
Abstract
BACKGROUND Despite the globally reducing hospitalization rates and the much lower risks of Covid-19 mortality, accurate diagnosis of the infection stage and prediction of outcomes are clinically of interest. Advanced current technology can facilitate automating the process and help identifying those who are at higher risks of developing severe illness. This work explores and represents deep-learning-based schemes for predicting clinical outcomes in Covid-19 infected patients, using Visual Transformer and Convolutional Neural Networks (CNNs), fed with 3D data fusion of CT scan images and patients' clinical data. METHODS We report on the efficiency of Video Swin Transformers and several CNN models fed with fusion datasets and CT scans only vs. a set of conventional classifiers fed with patients' clinical data only. A relatively large clinical dataset from 380 Covid-19 diagnosed patients was used to train/test the models. RESULTS Results show that the 3D Video Swin Transformers fed with the fusion datasets of 64 sectional CT scans + 67 clinical labels outperformed all other approaches for predicting outcomes in Covid-19-infected patients amongst all techniques (i.e., TPR = 0.95, FPR = 0.40, F0.5 score = 0.82, AUC = 0.77, Kappa = 0.6). CONCLUSIONS We demonstrate how the utility of our proposed novel 3D data fusion approach through concatenating CT scan images with patients' clinical data can remarkably improve the performance of the models in predicting Covid-19 infection outcomes. SIGNIFICANCE Findings indicate possibilities of predicting the severity of outcome using patients' CT images and clinical data collected at the time of admission to hospital.
Collapse
Affiliation(s)
| | - Maral Zarvani
- Faculty of Engineering, Alzahra University, Tehran, Iran
| | - Paria Amiri
- University of Erlangen-Nuremberg, Bavaria, Germany
| | - Zahra Ghods
- Faculty of Engineering, Alzahra University, Tehran, Iran
| | - Masoomeh Raoufi
- Department of Radiology, School of Medicine, Imam Hossein Hospital, Shahid Beheshti, University of Medical Sciences, Tehran, Iran
| | - Seyed Amir Ahmad Safavi-Naini
- Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amirali Soheili
- School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | | - Hamid Abbasi
- Auckland Bioengineering Institute, University of Auckland, Auckland, 1010, New Zealand.
| |
Collapse
|
4
|
Abbas YM, Khan MI. Robust Machine Learning Framework for Modeling the Compressive Strength of SFRC: Database Compilation, Predictive Analysis, and Empirical Verification. MATERIALS (BASEL, SWITZERLAND) 2023; 16:7178. [PMID: 38005107 PMCID: PMC10673118 DOI: 10.3390/ma16227178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 11/05/2023] [Accepted: 11/13/2023] [Indexed: 11/26/2023]
Abstract
In recent years, the field of construction engineering has experienced a significant paradigm shift, embracing the integration of machine learning (ML) methodologies, with a particular emphasis on forecasting the characteristics of steel-fiber-reinforced concrete (SFRC). Despite the theoretical sophistication of existing models, persistent challenges remain-their opacity, lack of transparency, and real-world relevance for practitioners. To address this gap and advance our current understanding, this study employs the extra gradient (XG) boosting algorithm, crafting a comprehensive approach. Grounded in a meticulously curated database drawn from 43 seminal publications, encompassing 420 distinct records, this research focuses predominantly on three primary fiber types: crimped, hooked, and mil-cut. Complemented by hands-on experimentation involving 20 diverse SFRC mixtures, this empirical campaign is further illuminated through the strategic use of partial dependence plots (PDPs), revealing intricate relationships between input parameters and consequent compressive strength. A pivotal revelation of this research lies in the identification of optimal SFRC formulations, offering tangible insights for real-world applications. The developed ML model stands out not only for its sophistication but also its tangible accuracy, evidenced by exemplary performance against independent datasets, boasting a commendable mean target-prediction ratio of 99%. To bridge the theory-practice gap, we introduce a user-friendly digital interface, thoroughly designed to guide professionals in optimizing and accurately predicting the compressive strength of SFRC. This research thus contributes to the construction and civil engineering sectors by enhancing predictive capabilities and refining mix designs, fostering innovation, and addressing the evolving needs of the industry.
Collapse
Affiliation(s)
| | - Mohammad Iqbal Khan
- Department of Civil Engineering, College of Engineering, King Saud University, Riyadh 800-11421, Saudi Arabia;
| |
Collapse
|
5
|
Yang C. Prediction of hearing preservation after acoustic neuroma surgery based on SMOTE-XGBoost. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:10757-10772. [PMID: 37322959 DOI: 10.3934/mbe.2023477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Prior to the surgical removal of an acoustic neuroma, the majority of patients anticipate that their hearing will be preserved to the greatest possible extent following surgery. This paper proposes a postoperative hearing preservation prediction model for the characteristics of class-imbalanced hospital real data based on the extreme gradient boost tree (XGBoost). In order to eliminate sample imbalance, the synthetic minority oversampling technique (SMOTE) is applied to increase the number of underclass samples in the data. Multiple machine learning models are also used for the accurate prediction of surgical hearing preservation in acoustic neuroma patients. In comparison to research results from existing literature, the experimental results found the model proposed in this paper to be superior. In summary, the method this paper proposes can make a significant contribution to the development of personalized preoperative diagnosis and treatment plans for patients, leading to effective judgment for the hearing retention of patients with acoustic neuroma following surgery, a simplified long medical treatment process and saved medical resources.
Collapse
Affiliation(s)
- Cenyi Yang
- School of Mathematics and Statistics, Central South University, Changsha 410083, China
| |
Collapse
|
6
|
Guinsburg AM, Jiao Y, Bessone MID, Monaghan CK, Magalhães B, Kraus MA, Kotanko P, Hymes JL, Kossmann RJ, Berbessi JC, Maddux FW, Usvyat LA, Larkin JW. Predictors of shorter- and longer-term mortality after COVID-19 presentation among dialysis patients: parallel use of machine learning models in Latin and North American countries. BMC Nephrol 2022; 23:340. [PMID: 36273142 PMCID: PMC9587666 DOI: 10.1186/s12882-022-02961-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 10/04/2022] [Indexed: 11/21/2022] Open
Abstract
Background We developed machine learning models to understand the predictors of shorter-, intermediate-, and longer-term mortality among hemodialysis (HD) patients affected by COVID-19 in four countries in the Americas. Methods We used data from adult HD patients treated at regional institutions of a global provider in Latin America (LatAm) and North America who contracted COVID-19 in 2020 before SARS-CoV-2 vaccines were available. Using 93 commonly captured variables, we developed machine learning models that predicted the likelihood of death overall, as well as during 0–14, 15–30, > 30 days after COVID-19 presentation and identified the importance of predictors. XGBoost models were built in parallel using the same programming with a 60%:20%:20% random split for training, validation, & testing data for the datasets from LatAm (Argentina, Columbia, Ecuador) and North America (United States) countries. Results Among HD patients with COVID-19, 28.8% (1,001/3,473) died in LatAm and 20.5% (4,426/21,624) died in North America. Mortality occurred earlier in LatAm versus North America; 15.0% and 7.3% of patients died within 0–14 days, 7.9% and 4.6% of patients died within 15–30 days, and 5.9% and 8.6% of patients died > 30 days after COVID-19 presentation, respectively. Area under curve ranged from 0.73 to 0.83 across prediction models in both regions. Top predictors of death after COVID-19 consistently included older age, longer vintage, markers of poor nutrition and more inflammation in both regions at all timepoints. Unique patient attributes (higher BMI, male sex) were top predictors of mortality during 0–14 and 15–30 days after COVID-19, yet not mortality > 30 days after presentation. Conclusions Findings showed distinct profiles of mortality in COVID-19 in LatAm and North America throughout 2020. Mortality rate was higher within 0–14 and 15–30 days after COVID-19 in LatAm, while mortality rate was higher in North America > 30 days after presentation. Nonetheless, a remarkable proportion of HD patients died > 30 days after COVID-19 presentation in both regions. We were able to develop a series of suitable prognostic prediction models and establish the top predictors of death in COVID-19 during shorter-, intermediate-, and longer-term follow up periods. Supplementary Information The online version contains supplementary material available at 10.1186/s12882-022-02961-x.
Collapse
Affiliation(s)
| | - Yue Jiao
- Fresenius Medical Care, Global Medical Office, 920 Winter Street, Waltham, MA, 02451, USA
| | | | - Caitlin K Monaghan
- Fresenius Medical Care, Global Medical Office, 920 Winter Street, Waltham, MA, 02451, USA
| | | | | | - Peter Kotanko
- Renal Research Institute, New York, USA.,Icahn School of Medicine at Mount Sinai, New York, USA
| | - Jeffrey L Hymes
- Fresenius Medical Care, Global Medical Office, 920 Winter Street, Waltham, MA, 02451, USA
| | | | | | - Franklin W Maddux
- Fresenius Medical Care AG & Co. KGaA, Global Medical Office, Bad Homburg, Germany
| | - Len A Usvyat
- Fresenius Medical Care, Global Medical Office, 920 Winter Street, Waltham, MA, 02451, USA
| | - John W Larkin
- Fresenius Medical Care, Global Medical Office, 920 Winter Street, Waltham, MA, 02451, USA.
| |
Collapse
|
7
|
Tirandi A, Ramoni D, Montecucco F, Liberale L. Predicting mortality in hospitalized COVID-19 patients. Intern Emerg Med 2022; 17:1571-1574. [PMID: 35704169 PMCID: PMC9198615 DOI: 10.1007/s11739-022-03017-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 05/23/2022] [Indexed: 11/25/2022]
Affiliation(s)
- Amedeo Tirandi
- First Clinic of Internal Medicine, Department of Internal Medicine, University of Genoa, 6 viale Benedetto XV, 16132, Genoa, Italy
| | - Davide Ramoni
- First Clinic of Internal Medicine, Department of Internal Medicine, University of Genoa, 6 viale Benedetto XV, 16132, Genoa, Italy
| | - Fabrizio Montecucco
- First Clinic of Internal Medicine, Department of Internal Medicine, University of Genoa, 6 viale Benedetto XV, 16132, Genoa, Italy
- IRCCS Ospedale Policlinico San Martino Genova-Italian Cardiovascular Network, Genoa, Italy
| | - Luca Liberale
- First Clinic of Internal Medicine, Department of Internal Medicine, University of Genoa, 6 viale Benedetto XV, 16132, Genoa, Italy.
- IRCCS Ospedale Policlinico San Martino Genova-Italian Cardiovascular Network, Genoa, Italy.
| |
Collapse
|
8
|
Nagpal S, Pinna NK, Pant N, Singh R, Srivastava D, Mande SS. Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis? J Mol Biol 2022; 434:167684. [PMID: 35700770 PMCID: PMC9188262 DOI: 10.1016/j.jmb.2022.167684] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 06/05/2022] [Accepted: 06/08/2022] [Indexed: 11/30/2022]
Abstract
MOTIVATION Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implications. The sheer plurality of variants and huge scale of genomic data have added to the challenges of tracing the mutations/variants and their relationship to infection severity (if any). RESULTS We explored the suitability of virus-genotype guided machine-learning in infection prognosis and identification of features/mutations-of-interest. Total 199,519 outcome-traced genomes, representing 45,625 nucleotide-mutations, were employed. Among these, post data-cleaning, Low and High severity genomes were classified using an integrated model (employing virus genotype, epitopic-influence and patient-age) with consistently high ROC-AUC (Asia:0.97 ± 0.01, Europe:0.94 ± 0.01, N.America:0.92 ± 0.02, Africa:0.94 ± 0.07, S.America:0.93 ± 03). Although virus-genotype alone could enable high predictivity (0.97 ± 0.01, 0.89 ± 0.02, 0.86 ± 0.04, 0.95 ± 0.06, 0.9 ± 0.04), the performance was not found to be consistent and the models for a few geographies displayed significant improvement in predictivity when the influence of age and/or epitope was incorporated with virus-genotype (Wilcoxon p_BH < 0.05). Neither age or epitopic-influence or clade information could out-perform the integrated features. A sparse model (6 features), developed using patient-age and epitopic-influence of the mutations, performed reasonably well (>0.87 ± 0.03, 0.91 ± 0.01, 0.87 ± 0.03, 0.84 ± 0.08, 0.89 ± 0.05). High-performance models were employed for inferring the important mutations-of-interest using Shapley Additive exPlanations (SHAP). The changes in HLA interactions of the mutated epitopes of reference SARS-CoV-2 were then subsequently probed. Notably, we also describe the significance of a 'temporal-modeling approach' to benchmark the models linked with continuously evolving pathogens. We conclude that while machine learning can play a vital role in identifying relevant mutations and factors driving the severity, caution should be exercised in using the genotypic signatures for predictive prognosis.
Collapse
Affiliation(s)
- Sunil Nagpal
- Tata Consultancy Services Ltd, Pune 411013, India; CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi 110025, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India. https://twitter.com/NagpalSun
| | - Nishal Kumar Pinna
- Tata Consultancy Services Ltd, Pune 411013, India. https://twitter.com/nishal_pinna
| | - Namrata Pant
- Tata Consultancy Services Ltd, Pune 411013, India
| | - Rohan Singh
- Tata Consultancy Services Ltd, Pune 411013, India
| | | | | |
Collapse
|
9
|
Rathakrishnan V, Bt Beddu S, Ahmed AN. Predicting compressive strength of high-performance concrete with high volume ground granulated blast-furnace slag replacement using boosting machine learning algorithms. Sci Rep 2022; 12:9539. [PMID: 35680937 PMCID: PMC9184605 DOI: 10.1038/s41598-022-12890-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/18/2022] [Indexed: 11/30/2022] Open
Abstract
Predicting the compressive strength of concrete is a complicated process due to the heterogeneous mixture of concrete and high variable materials. Researchers have predicted the compressive strength of concrete for various mixes using machine learning and deep learning models. In this research, compressive strength of high-performance concrete with high volume ground granulated blast-furnace slag replacement is predicted using boosting machine learning (BML) algorithms, namely, Light Gradient Boosting Machine, CatBoost Regressor, Gradient Boosting Regressor (GBR), Adaboost Regressor, and Extreme Gradient Boosting. In these studies, the BML model’s performance is evaluated based on prediction accuracy and prediction error rates, i.e., R2, MSE, RMSE, MAE, RMSLE, and MAPE. Additionally, the BML models were further optimised with Random Search algorithms and compared to BML models with default hyperparameters. Comparing all 5 BML models, the GBR model shows the highest prediction accuracy with R2 of 0.96 and lowest model error with MAE and RMSE of 2.73 and 3.40, respectively for test dataset. In conclusion, the GBR model are the best performing BML for predicting the compressive strength of concrete with the highest prediction accuracy, and lowest modelling error.
Collapse
Affiliation(s)
- Vimal Rathakrishnan
- Department of Civil Engineering, College of Engineering, Universiti Tenaga Nasional (UNITEN), 43000, Selangor, Malaysia.
| | - Salmia Bt Beddu
- Department of Civil Engineering, College of Engineering, Universiti Tenaga Nasional (UNITEN), 43000, Selangor, Malaysia
| | - Ali Najah Ahmed
- Institute of Energy Infrastructure (IEI), Universiti Tenaga Nasional (UNITEN), 43000, Selangor, Malaysia
| |
Collapse
|
10
|
Towards an Approach for Filtration Efficiency Estimation of Consumer-Grade Face Masks Using Thermography. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12042071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Due to the increasing need for continuous use of face masks caused by COVID-19, it is essential to evaluate the filtration quality that each face mask provides. In this research, an estimation method based on thermal image processing was developed; the main objective was to evaluate the effectiveness of different face masks while being used during breathing. For the acquisition of heat distribution images, a thermographic imaging system was built; moreover, a deep learning model detected the leakage percentage of each face mask with a mAP of 0.9345, recall of 0.842 and F1-score of 0.82. The results obtained from this research revealed that the filtration effectiveness depended on heat loss through the manufacturing material; the proposed estimation method is simple, fast, and can be replicated and operated by people who are not experts in the computer field.
Collapse
|
11
|
Nwanosike EM, Conway BR, Merchant HA, Hasan SS. Potential applications and performance of machine learning techniques and algorithms in clinical practice: A systematic review. Int J Med Inform 2021; 159:104679. [PMID: 34990939 DOI: 10.1016/j.ijmedinf.2021.104679] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Revised: 12/08/2021] [Accepted: 12/27/2021] [Indexed: 12/11/2022]
Abstract
PURPOSE The advent of clinically adapted machine learning algorithms can solve numerous problems ranging from disease diagnosis and prognosis to therapy recommendations. This systematic review examines the performance of machine learning (ML) algorithms and evaluates the progress made to date towards their implementation in clinical practice. METHODS Systematic searching of databases (PubMed, MEDLINE, Scopus, Google Scholar, Cochrane Library and WHO Covid-19 database) to identify original articles published between January 2011 and October 2021. Studies reporting ML techniques in clinical practice involving humans and ML algorithms with a performance metric were considered. RESULTS Of 873 unique articles identified, 36 studies were eligible for inclusion. The XGBoost (extreme gradient boosting) algorithm showed the highest potential for clinical applications (n = 7 studies); this was followed jointly by random forest algorithm, logistic regression, and the support vector machine, respectively (n = 5 studies). Prediction of outcomes (n = 33), in particular Inflammatory diseases (n = 7) received the most attention followed by cancer and neuropsychiatric disorders (n = 5 for each) and Covid-19 (n = 4). Thirty-three out of the thirty-six included studies passed more than 50% of the selected quality assessment criteria in the TRIPOD checklist. In contrast, none of the studies could achieve an ideal overall bias rating of 'low' based on the PROBAST checklist. In contrast, only three studies showed evidence of the deployment of ML algorithm(s) in clinical practice. CONCLUSIONS ML is potentially a reliable tool for clinical decision support. Although advocated widely in clinical practice, work is still in progress to validate clinically adapted ML algorithms. Improving quality standards, transparency, and interpretability of ML models will further lower the barriers to acceptability.
Collapse
Affiliation(s)
- Ezekwesiri Michael Nwanosike
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom
| | - Barbara R Conway
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom
| | - Hamid A Merchant
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom
| | - Syed Shahzad Hasan
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom; School of Biomedical Sciences & Pharmacy, University of Newcastle, Callaghan, Australia.
| |
Collapse
|
12
|
Giancotti M, Lopreite M, Mauro M, Puliga M. The role of European health system characteristics in affecting Covid 19 lethality during the early days of the pandemic. Sci Rep 2021; 11:23739. [PMID: 34887452 PMCID: PMC8660820 DOI: 10.1038/s41598-021-03120-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 11/26/2021] [Indexed: 12/21/2022] Open
Abstract
This article examines the main factors affecting COVID-19 lethality across 16 European Countries with a focus on the role of health system characteristics during the first phase of the diffusion of the virus. Specifically, we investigate the leading causes of lethality at 10, 20, 30, 40 days in the first hit of the pandemic. Using a random forest regression (ML), with lethality as outcome variable, we show that the percentage of people older than 65 years (with two or more chronic diseases) is the main predictor variable of lethality by COVID-19, followed by the number of hospital intensive care unit beds, investments in healthcare spending compared to GDP, number of nurses and doctors. Moreover, the variable of general practitioners has little but significant predicting quality. These findings contribute to provide evidence for the prediction of lethality caused by COVID-19 in Europe and open the discussion on health policy and management of health care and ICU beds during a severe epidemic.
Collapse
Affiliation(s)
- Monica Giancotti
- Department of Clinical and Experimental Medicine, Magna Graecia University, Viale Europa, Catanzaro, Italy
| | - Milena Lopreite
- Department of Economics, Statistics and Finance, University of Calabria, Calabria, Italy.
| | - Marianna Mauro
- Department of Clinical and Experimental Medicine, Magna Graecia University, Catanzaro, Italy
| | - Michelangelo Puliga
- Institute of Management, Sant'Anna School of Advanced Studies, Pisa, Italy
- Linkalab Computational Laboratory, Cagliari, Italy
| |
Collapse
|
13
|
Shams MY, Elzeki OM, Abouelmagd LM, Hassanien AE, Elfattah MA, Salem H. HANA: A Healthy Artificial Nutrition Analysis model during COVID-19 pandemic. Comput Biol Med 2021; 135:104606. [PMID: 34247134 PMCID: PMC8241585 DOI: 10.1016/j.compbiomed.2021.104606] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/12/2022]
Abstract
BACKGROUND AND OBJECTIVE The impact of diet on COVID-19 patients has been a global concern since the pandemic began. Choosing different types of food affects peoples' mental and physical health and, with persistent consumption of certain types of food and frequent eating, there may be an increased likelihood of death. In this paper, a regression system is employed to evaluate the prediction of death status based on food categories. METHODS A Healthy Artificial Nutrition Analysis (HANA) model is proposed. The proposed model is used to generate a food recommendation system and track individual habits during the COVID-19 pandemic to ensure healthy foods are recommended. To collect information about the different types of foods that most of the world's population eat, the COVID-19 Healthy Diet Dataset was used. This dataset includes different types of foods from 170 countries around the world as well as obesity, undernutrition, death, and COVID-19 data as percentages of the total population. The dataset was used to predict the status of death using different machine learning regression models, i.e., linear regression (ridge regression, simple linear regularization, and elastic net regression), and AdaBoost models. RESULTS The death status was predicted with high accuracy, and the food categories related to death were identified with promising accuracy. The Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R2 metrics and 20-fold cross-validation were used to evaluate the accuracy of the prediction models for the COVID-19 Healthy Diet Dataset. The evaluations demonstrated that elastic net regression was the most efficient prediction model. Based on an in-depth analysis of recent nutrition recommendations by WHO, we confirm the same advice already introduced in the WHO report1. Overall, the outcomes also indicate that the remedying effects of COVID-19 patients are most important to people which eat more vegetal products, oilcrops grains, beverages, and cereals - excluding beer. Moreover, people consuming more animal products, animal fats, meat, milk, sugar and sweetened foods, sugar crops, were associated with a higher number of deaths and fewer patient recoveries. The outcome of sugar consumption was important and the rates of death and recovery were influenced by obesity. CONCLUSIONS Based on evaluation metrics, the proposed HANA model may outperform other algorithms used to predict death status. The results of this study may direct patients to eat particular types of food to reduce the possibility of becoming infected with the COVID-19 virus.
Collapse
Affiliation(s)
- Mahmoud Y Shams
- Faculty of Artificial Intelligence, Kafrelsheikh University, 33511, Egypt
| | - Omar M Elzeki
- Faculty of Computers and Information, Mansoura University, 35516, Mansoura, Egypt.
| | | | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Egypt; Scientific Research Group in Egypt (SRGE), Cairo, Egypt
| | | | - Hanaa Salem
- Faculty of Engineering, Delta University for Science and Technology, Gamasa, Egypt
| |
Collapse
|
14
|
Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, Bonten MMJ, Dahly DL, Damen JAA, Debray TPA, de Jong VMT, De Vos M, Dhiman P, Haller MC, Harhay MO, Henckaerts L, Heus P, Kammer M, Kreuzberger N, Lohmann A, Luijken K, Ma J, Martin GP, McLernon DJ, Andaur Navarro CL, Reitsma JB, Sergeant JC, Shi C, Skoetz N, Smits LJM, Snell KIE, Sperrin M, Spijker R, Steyerberg EW, Takada T, Tzoulaki I, van Kuijk SMJ, van Bussel B, van der Horst ICC, van Royen FS, Verbakel JY, Wallisch C, Wilkinson J, Wolff R, Hooft L, Moons KGM, van Smeden M. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 2020; 369:m1328. [PMID: 32265220 PMCID: PMC7222643 DOI: 10.1136/bmj.m1328] [Citation(s) in RCA: 1732] [Impact Index Per Article: 346.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/31/2020] [Indexed: 12/12/2022]
Abstract
OBJECTIVE To review and appraise the validity and usefulness of published and preprint reports of prediction models for diagnosing coronavirus disease 2019 (covid-19) in patients with suspected infection, for prognosis of patients with covid-19, and for detecting people in the general population at increased risk of covid-19 infection or being admitted to hospital with the disease. DESIGN Living systematic review and critical appraisal by the COVID-PRECISE (Precise Risk Estimation to optimise covid-19 Care for Infected or Suspected patients in diverse sEttings) group. DATA SOURCES PubMed and Embase through Ovid, up to 1 July 2020, supplemented with arXiv, medRxiv, and bioRxiv up to 5 May 2020. STUDY SELECTION Studies that developed or validated a multivariable covid-19 related prediction model. DATA EXTRACTION At least two authors independently extracted data using the CHARMS (critical appraisal and data extraction for systematic reviews of prediction modelling studies) checklist; risk of bias was assessed using PROBAST (prediction model risk of bias assessment tool). RESULTS 37 421 titles were screened, and 169 studies describing 232 prediction models were included. The review identified seven models for identifying people at risk in the general population; 118 diagnostic models for detecting covid-19 (75 were based on medical imaging, 10 to diagnose disease severity); and 107 prognostic models for predicting mortality risk, progression to severe disease, intensive care unit admission, ventilation, intubation, or length of hospital stay. The most frequent types of predictors included in the covid-19 prediction models are vital signs, age, comorbidities, and image features. Flu-like symptoms are frequently predictive in diagnostic models, while sex, C reactive protein, and lymphocyte counts are frequent prognostic factors. Reported C index estimates from the strongest form of validation available per model ranged from 0.71 to 0.99 in prediction models for the general population, from 0.65 to more than 0.99 in diagnostic models, and from 0.54 to 0.99 in prognostic models. All models were rated at high or unclear risk of bias, mostly because of non-representative selection of control patients, exclusion of patients who had not experienced the event of interest by the end of the study, high risk of model overfitting, and unclear reporting. Many models did not include a description of the target population (n=27, 12%) or care setting (n=75, 32%), and only 11 (5%) were externally validated by a calibration plot. The Jehi diagnostic model and the 4C mortality score were identified as promising models. CONCLUSION Prediction models for covid-19 are quickly entering the academic literature to support medical decision making at a time when they are urgently needed. This review indicates that almost all pubished prediction models are poorly reported, and at high risk of bias such that their reported predictive performance is probably optimistic. However, we have identified two (one diagnostic and one prognostic) promising models that should soon be validated in multiple cohorts, preferably through collaborative efforts and data sharing to also allow an investigation of the stability and heterogeneity in their performance across populations and settings. Details on all reviewed models are publicly available at https://www.covprecise.org/. Methodological guidance as provided in this paper should be followed because unreliable predictions could cause more harm than benefit in guiding clinical decisions. Finally, prediction model authors should adhere to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) reporting guideline. SYSTEMATIC REVIEW REGISTRATION Protocol https://osf.io/ehc47/, registration https://osf.io/wy245. READERS' NOTE This article is a living systematic review that will be updated to reflect emerging evidence. Updates may occur for up to two years from the date of original publication. This version is update 3 of the original article published on 7 April 2020 (BMJ 2020;369:m1328). Previous updates can be found as data supplements (https://www.bmj.com/content/369/bmj.m1328/related#datasupp). When citing this paper please consider adding the update number and date of access for clarity.
Collapse
Affiliation(s)
- Laure Wynants
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Peter Debyeplein 1, 6229 HA Maastricht, Netherlands
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, Netherlands
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Musculoskeletal Sciences, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Primary, Community and Social Care, Keele University, Keele, UK
| | - Georg Heinze
- Section for Clinical Biometrics, Centre for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | - Ewoud Schuit
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Marc M J Bonten
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Department of Medical Microbiology, University Medical Centre Utrecht, Utrecht, Netherlands
| | - Darren L Dahly
- HRB Clinical Research Facility, Cork, Ireland
- School of Public Health, University College Cork, Cork, Ireland
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Thomas P A Debray
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Valentijn M T de Jong
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Maarten De Vos
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Electrical Engineering, ESAT Stadius, KU Leuven, Leuven, Belgium
| | - Paul Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Musculoskeletal Sciences, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, UK
| | - Maria C Haller
- Section for Clinical Biometrics, Centre for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria
- Ordensklinikum Linz, Hospital Elisabethinen, Department of Nephrology, Linz, Austria
| | - Michael O Harhay
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Palliative and Advanced Illness Research Center and Division of Pulmonary and Critical Care Medicine, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Liesbet Henckaerts
- Department of Microbiology, Immunology and Transplantation, KU Leuven-University of Leuven, Leuven, Belgium
- Department of General Internal Medicine, KU Leuven-University Hospitals Leuven, Leuven, Belgium
| | - Pauline Heus
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Michael Kammer
- Section for Clinical Biometrics, Centre for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria
- Department of Nephrology, Medical University of Vienna, Vienna, Austria
| | - Nina Kreuzberger
- Evidence-Based Oncology, Department I of Internal Medicine and Centre for Integrated Oncology Aachen Bonn Cologne Dusseldorf, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Anna Lohmann
- Department of Clinical Epidemiology, Leiden University Medical Centre, Leiden, Netherlands
| | - Kim Luijken
- Department of Clinical Epidemiology, Leiden University Medical Centre, Leiden, Netherlands
| | - Jie Ma
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, UK
| | - Glen P Martin
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK
| | - David J McLernon
- Institute of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
| | - Constanza L Andaur Navarro
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Johannes B Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Jamie C Sergeant
- Centre for Biostatistics, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
- Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Chunhu Shi
- Division of Nursing, Midwifery and Social Work, School of Health Sciences, University of Manchester, Manchester, UK
| | - Nicole Skoetz
- Department of Nephrology, Medical University of Vienna, Vienna, Austria
| | - Luc J M Smits
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Peter Debyeplein 1, 6229 HA Maastricht, Netherlands
| | - Kym I E Snell
- Centre for Prognosis Research, School of Primary, Community and Social Care, Keele University, Keele, UK
| | - Matthew Sperrin
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - René Spijker
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Amsterdam UMC, University of Amsterdam, Amsterdam Public Health, Medical Library, Netherlands
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, Netherlands
| | - Toshihiko Takada
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Ioanna Tzoulaki
- Department of Epidemiology and Biostatistics, Imperial College London School of Public Health, London, UK
- Department of Hygiene and Epidemiology, University of Ioannina Medical School, Ioannina, Greece
| | - Sander M J van Kuijk
- Department of Clinical Epidemiology and Medical Technology Assessment, Maastricht University Medical Centre+, Maastricht, Netherlands
| | - Bas van Bussel
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Peter Debyeplein 1, 6229 HA Maastricht, Netherlands
- Department of Intensive Care, Maastricht University Medical Centre+, Maastricht University, Maastricht, Netherlands
| | - Iwan C C van der Horst
- Department of Intensive Care, Maastricht University Medical Centre+, Maastricht University, Maastricht, Netherlands
| | - Florien S van Royen
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Jan Y Verbakel
- EPI-Centre, Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - Christine Wallisch
- Section for Clinical Biometrics, Centre for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria
- Charité Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Jack Wilkinson
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK
| | | | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| |
Collapse
|