1
|
Kuo PH, Li YH, Yau HT. Development of feline infectious peritonitis diagnosis system by using CatBoost algorithm. Comput Biol Chem 2024; 113:108227. [PMID: 39342699 DOI: 10.1016/j.compbiolchem.2024.108227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 08/29/2024] [Accepted: 09/25/2024] [Indexed: 10/01/2024]
Abstract
This study employed machine learning techniques to predict the rate of feline infectious peritonitis (FIP) diagnoses, with a specific focus on mutations in the spike protein gene of the feline coronavirus (FCoV). FIP is a fatal viral disease affecting the peritoneum of cats and is primarily caused by mutations in FCoV. Its diagnosis largely relies on evaluations of various biomarkers and clinical symptoms. The current analysis of FCoV spike protein gene mutations exhibits certain limitations. To address this problem, the present study employed a large dataset-comprising information on FCoV copy numbers, spike protein mutation outcomes, and related clinical data-and used machine learning models to analyze the association between spike protein gene mutations and FIP diagnosis. Various algorithms were used to establish highly accurate predictive models, namely logistic regression, random forest, decision tree, neural network, support vector machine, gradient boosting tree, and categorical boosting (CatBoost) algorithms. The model obtained using the CatBoost algorithm was discovered to have accuracy of 0.9541. Accordingly, a highly accurate predictive model was developed to enable early diagnosis of FIP and improve the rate of survival in cats. The application of machine learning technology in this study yielded research findings that provide veterinarians with effective tools for managing and preventing FIP, a painful and deadly disease for cats. This study is a pioneering work in the systematic application of multiple machine learning models to the prediction of FIP and comparison of performance results to improve diagnostic accuracy and efficiency. This study is the first of its kind in the field of FIP.
Collapse
Affiliation(s)
- Ping-Huan Kuo
- Department of Mechanical Engineering, National Chung Cheng University, Chiayi 62102, Taiwan; Advanced Institute of Manufacturing with High-Tech Innovations (AIM-HI), National Chung Cheng University, Chiayi 62102, Taiwan.
| | - Yu-Hsiang Li
- Department of Mechanical Engineering, National Chung Cheng University, Chiayi 62102, Taiwan.
| | - Her-Terng Yau
- Department of Mechanical Engineering, National Chung Cheng University, Chiayi 62102, Taiwan; Advanced Institute of Manufacturing with High-Tech Innovations (AIM-HI), National Chung Cheng University, Chiayi 62102, Taiwan.
| |
Collapse
|
2
|
Asadi S, Tartibian B, Moni MA, Eslami R. Prediction of white blood cell count during exercise: a comparison between standalone and hybrid intelligent algorithms. Sci Rep 2024; 14:20683. [PMID: 39237538 PMCID: PMC11377723 DOI: 10.1038/s41598-024-71576-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 08/29/2024] [Indexed: 09/07/2024] Open
Abstract
Decades of research in exercise immunology have demonstrated the profound impact of exercise on the immune response, influencing an individual's disease susceptibility. Accurate prediction of white blood cells (WBCs) count during exercise can help to design effective training programs to maintain optimal the immune system function and prevent its suppression. In this regard, this study aimed to develop an easy-to-use and efficient modelling tool for predicting WBCs count during exercise. To achieve this goal, the predictive power of a range of machine-learning algorithms, including six standalone models (M5 prime (M5P), random forest (RF), alternating model trees (AMT), reduced error pruning tree (REPT), locally weighted learning (LWL), and support vector regression (SVR)) were assessed along with six types of hybrid models trained with a bagging (BA) algorithm (BA-M5P, BA-RF, BA-AMT, BA-REPT, BA-LWL, and BA- SVR). A comprehensive database was constructed from 200 eligible people. The models employed post-exercise training WBCs counts as the output parameter and seven WBCs-influencing factors, including intensity and duration of exercise, pre-exercise training WBCs counts, age, body fat percentage, maximal aerobic capacity, and muscle mass as input parameters. Comparing the prediction results of the models to the observed WBCs using standard statistics indicated that the BA-M5P model had the greatest potential to produce a robust prediction of the number of lymphocytes, neutrophils, monocytes, and WBC compared to other models. Moreover, pre-exercise training WBCs counts, intensity and duration of exercise and body fat percentage were the most important features in predicting WBCs counts. These findings hold significant implications for the advancement of exercise immunology and the promotion of public health.
Collapse
Affiliation(s)
- Shirin Asadi
- Department of Exercise Physiology, Faculty of Physical Education and Sports Sciences, Allameh Tabataba'i University, Tehran, Iran.
| | - Bakhtyar Tartibian
- Department of Exercise Physiology, Faculty of Physical Education and Sports Sciences, Allameh Tabataba'i University, Tehran, Iran
| | - Mohammad Ali Moni
- AI & Digital Health Technology, Artificial Intelligence and Cyber Futures Institute, Charles Sturt University, Bathurst, NSW, 2795, Australia
| | - Rasoul Eslami
- Department of Exercise Physiology, Faculty of Physical Education and Sports Sciences, Allameh Tabataba'i University, Tehran, Iran
| |
Collapse
|
3
|
Hong R, Li Q, Ma J, Lu C, Zhong Z. Computed tomography-based radiomics machine learning models for differentiating enchondroma and atypical cartilaginous tumor in long bones. ROFO-FORTSCHR RONTG 2024. [PMID: 39074797 DOI: 10.1055/a-2344-5398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/31/2024]
Abstract
To explore the value of CT-based radiomics machine learning models for differentiating enchondroma from atypical cartilaginous tumor (ACT) in long bones and methods to improve model performance.59 enchondromas and 53 ACTs in long bones confirmed by pathology were collected retrospectively. The features were extracted from preoperative CT images of these patients, and least absolute shrinkage and selection operator (LASSO) regression was used for feature selection and dimensionality reduction. The selected features were used to construct classification models by thirteen machine learning algorithms. The data set was randomly divided into a training set and a test set at a proportion of 7:3 by ten-fold cross-validation to evaluate the performance of these models.A total of 1199 features were extracted, 9 features were selected, and 13 radiomics machine learning models were constructed. The area under the curve (AUC) of 11 models was more than 0.8, and that of 3 models was more than 0.9. The Extremely Randomized Trees model achieved the best performance (AUC = 0.9375 ± 0.0884), followed by the Adaptive Boosting model (AUC = 0.9188 ± 0.1010) and the Linear Discriminant Analysis model (AUC = 0.9062 ± 0.1459).CT-based radiomics machine learning models had great ability to distinguish enchondroma and ACT in long bones. By using filters to deeply mine high-order features in the original image and selecting appropriate machine learning algorithms, the performance of the model can be improved. · CT-based radiomics machine learning models can distinguish enchondroma and ACT in long bones.. · Using filters and selecting advanced machine learning algorithms can improve model performance.. · Clinical features have limited utility in distinguishing enchondroma and ACT in long bones.. · Hong R, Li Q, Ma J et al. Computed tomography-based radiomics machine learning models for differentiating enchondroma and atypical cartilaginous tumor in long bones. Fortschr Röntgenstr 2024; DOI 10.1055/a-2344-5398.
Collapse
Affiliation(s)
- Rui Hong
- Radiology, The Third Hospital of Hebei Medical University, Shijiazhuang, China
| | - Qian Li
- Radiology, The Third Hospital of Hebei Medical University, Shijiazhuang, China
| | - Jielin Ma
- Oncology, The Third Hospital of Hebei Medical University, Shijiazhuang, China
| | - Chunmiao Lu
- Radiology, The Third Hospital of Hebei Medical University, Shijiazhuang, China
| | - Zhiwei Zhong
- Radiology, The Third Hospital of Hebei Medical University, Shijiazhuang, China
| |
Collapse
|
4
|
Zhang M, Zheng Y, Maidaiti X, Liang B, Wei Y, Sun F. Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review. HEALTH DATA SCIENCE 2024; 4:0165. [PMID: 39050273 PMCID: PMC11266123 DOI: 10.34133/hds.0165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 06/20/2024] [Indexed: 07/27/2024]
Abstract
Background: Disease prediction models often use statistical methods or machine learning, both with their own corresponding application scenarios, raising the risk of errors when used alone. Integrating machine learning into statistical methods may yield robust prediction models. This systematic review aims to comprehensively assess current development of global disease prediction integration models. Methods: PubMed, EMbase, Web of Science, CNKI, VIP, WanFang, and SinoMed databases were searched to collect studies on prediction models integrating machine learning into statistical methods from database inception to 2023 May 1. Information including basic characteristics of studies, integrating approaches, application scenarios, modeling details, and model performance was extracted. Results: A total of 20 eligible studies in English and 1 in Chinese were included. Five studies concentrated on diagnostic models, while 16 studies concentrated on predicting disease occurrence or prognosis. Integrating strategies of classification models included majority voting, weighted voting, stacking, and model selection (when statistical methods and machine learning disagreed). Regression models adopted strategies including simple statistics, weighted statistics, and stacking. AUROC of integration models surpassed 0.75 and performed better than statistical methods and machine learning in most studies. Stacking was used for situations with >100 predictors and needed relatively larger amount of training data. Conclusion: Research on integrating machine learning into statistical methods in prediction models remains limited, but some studies have exhibited great potential that integration models outperform single models. This study provides insights for the selection of integration methods for different scenarios. Future research could emphasize on the improvement and validation of integrating strategies.
Collapse
Affiliation(s)
- Meng Zhang
- Department of Epidemiology and Biostatistics, School of Public Health,
Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | - Yongqi Zheng
- Department of Epidemiology and Biostatistics, School of Public Health,
Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | | | - Baosheng Liang
- Department of Biostatistics, School of Public Health,
Peking University, Beijing, China
| | - Yongyue Wei
- Department of Epidemiology and Biostatistics, School of Public Health,
Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | - Feng Sun
- Department of Epidemiology and Biostatistics, School of Public Health,
Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| |
Collapse
|
5
|
Li Y, Cao Y, Wang M, Wang L, Wu Y, Fang Y, Zhao Y, Fan Y, Liu X, Liang H, Yang M, Yuan R, Zhou F, Zhang Z, Kang H. Development and validation of machine learning models to predict MDRO colonization or infection on ICU admission by using electronic health record data. Antimicrob Resist Infect Control 2024; 13:74. [PMID: 38971777 PMCID: PMC11227715 DOI: 10.1186/s13756-024-01428-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 06/24/2024] [Indexed: 07/08/2024] Open
Abstract
BACKGROUND Multidrug-resistant organisms (MDRO) pose a significant threat to public health. Intensive Care Units (ICU), characterized by the extensive use of antimicrobial agents and a high prevalence of bacterial resistance, are hotspots for MDRO proliferation. Timely identification of patients at high risk for MDRO can aid in curbing transmission, enhancing patient outcomes, and maintaining the cleanliness of the ICU environment. This study focused on developing a machine learning (ML) model to identify patients at risk of MDRO during the initial phase of their ICU stay. METHODS Utilizing patient data from the First Medical Center of the People's Liberation Army General Hospital (PLAGH-ICU) and the Medical Information Mart for Intensive Care (MIMIC-IV), the study analyzed variables within 24 h of ICU admission. Machine learning algorithms were applied to these datasets, emphasizing the early detection of MDRO colonization or infection. Model efficacy was evaluated by the area under the receiver operating characteristics curve (AUROC), alongside internal and external validation sets. RESULTS The study evaluated 3,536 patients in PLAGH-ICU and 34,923 in MIMIC-IV, revealing MDRO prevalence of 11.96% and 8.81%, respectively. Significant differences in ICU and hospital stays, along with mortality rates, were observed between MDRO positive and negative patients. In the temporal validation, the PLAGH-ICU model achieved an AUROC of 0.786 [0.748, 0.825], while the MIMIC-IV model reached 0.744 [0.723, 0.766]. External validation demonstrated reduced model performance across different datasets. Key predictors included biochemical markers and the duration of pre-ICU hospital stay. CONCLUSIONS The ML models developed in this study demonstrated their capability in early identification of MDRO risks in ICU patients. Continuous refinement and validation in varied clinical contexts remain essential for future applications.
Collapse
Affiliation(s)
- Yun Li
- Medical School of Chinese PLA, Beijing, 100853, China
- Department of Critical Care Medicine, The First Medical Centre, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Yuan Cao
- Medical School of Chinese PLA, Beijing, 100853, China
- Department of Critical Care Medicine, The First Medical Centre, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Min Wang
- Medical School of Chinese PLA, Beijing, 100853, China
- Department of Critical Care Medicine, The First Medical Centre, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Lu Wang
- Medical School of Chinese PLA, Beijing, 100853, China
- Department of Critical Care Medicine, The First Medical Centre, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Yiqi Wu
- Medical School of Chinese PLA, Beijing, 100853, China
- Department of Critical Care Medicine, The First Medical Centre, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Yuan Fang
- Medical School of Chinese PLA, Beijing, 100853, China
- Department of Critical Care Medicine, The First Medical Centre, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Yan Zhao
- Department of Critical Care Medicine, The First Medical Centre, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Yong Fan
- Center for Artificial Intelligence in Medicine, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Xiaoli Liu
- Center for Artificial Intelligence in Medicine, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Hong Liang
- Center for Artificial Intelligence in Medicine, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Mengmeng Yang
- Department of Critical Care Medicine, The First Medical Centre, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Rui Yuan
- Medical School of Chinese PLA, Beijing, 100853, China
- Department of Critical Care Medicine, The First Medical Centre, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Feihu Zhou
- Department of Critical Care Medicine, The First Medical Centre, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China
| | - Zhengbo Zhang
- Center for Artificial Intelligence in Medicine, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China.
| | - Hongjun Kang
- Department of Critical Care Medicine, The First Medical Centre, Chinese PLA General Hospital, No. 28, Fuxing Road, Haidian District, Beijing, 100853, China.
| |
Collapse
|
6
|
Giacobbe DR, Marelli C, Mora S, Cappello A, Signori A, Vena A, Guastavino S, Rosso N, Campi C, Giacomini M, Bassetti M. Prediction of candidemia with machine learning techniques: state of the art. Future Microbiol 2024; 19:931-940. [PMID: 39072500 PMCID: PMC11290752 DOI: 10.2217/fmb-2023-0269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 03/06/2024] [Indexed: 07/30/2024] Open
Abstract
In this narrative review, we discuss studies assessing the use of machine learning (ML) models for the early diagnosis of candidemia, focusing on employed models and the related implications. There are currently few studies evaluating ML techniques for the early diagnosis of candidemia as a prediction task based on clinical and laboratory features. The use of ML tools holds promise to provide highly accurate and real-time support to clinicians for relevant therapeutic decisions at the bedside of patients with suspected candidemia. However, further research is needed in terms of sample size, data quality, recognition of biases and interpretation of model outputs by clinicians to better understand if and how these techniques could be safely adopted in daily clinical practice.
Collapse
Affiliation(s)
- Daniele Roberto Giacobbe
- Department of Health Sciences (DISSAL), University of Genoa, Genoa, Italy
- UO Clinica Malattie Infettive, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Cristina Marelli
- UO Clinica Malattie Infettive, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Sara Mora
- UO Information & Communication Technologies (ICT), IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Alice Cappello
- UO Clinica Malattie Infettive, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Alessio Signori
- Section of Biostatistics, Department of Health Sciences (DISSAL), University of Genoa, Genoa, Italy
| | - Antonio Vena
- Department of Health Sciences (DISSAL), University of Genoa, Genoa, Italy
- UO Clinica Malattie Infettive, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | | | - Nicola Rosso
- UO Information & Communication Technologies (ICT), IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Cristina Campi
- Department of Mathematics (DIMA), University of Genoa, Genoa, Italy
- Life Science Computational Laboratory (LISCOMP), IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Mauro Giacomini
- Department of Informatics, Bioengineering, Robotics & System Engineering (DIBRIS), University of Genoa, Genoa, Italy
| | - Matteo Bassetti
- Department of Health Sciences (DISSAL), University of Genoa, Genoa, Italy
- UO Clinica Malattie Infettive, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| |
Collapse
|
7
|
Ganie SM, Dutta Pramanik PK, Zhao Z. Improved liver disease prediction from clinical data through an evaluation of ensemble learning approaches. BMC Med Inform Decis Mak 2024; 24:160. [PMID: 38849815 PMCID: PMC11157956 DOI: 10.1186/s12911-024-02550-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 05/21/2024] [Indexed: 06/09/2024] Open
Abstract
PURPOSE Liver disease causes two million deaths annually, accounting for 4% of all deaths globally. Prediction or early detection of the disease via machine learning algorithms on large clinical data have become promising and potentially powerful, but such methods often have some limitations due to the complexity of the data. In this regard, ensemble learning has shown promising results. There is an urgent need to evaluate different algorithms and then suggest a robust ensemble algorithm in liver disease prediction. METHOD Three ensemble approaches with nine algorithms are evaluated on a large dataset of liver patients comprising 30,691 samples with 11 features. Various preprocessing procedures are utilized to feed the proposed model with better quality data, in addition to the appropriate tuning of hyperparameters and selection of features. RESULTS The models' performances with each algorithm are extensively evaluated with several positive and negative performance metrics along with runtime. Gradient boosting is found to have the overall best performance with 98.80% accuracy and 98.50% precision, recall and F1-score for each. CONCLUSIONS The proposed model with gradient boosting bettered in most metrics compared with several recent similar works, suggesting its efficacy in predicting liver disease. It can be further applied to predict other diseases with the commonality of predicate indicators.
Collapse
Affiliation(s)
- Shahid Mohammad Ganie
- AI Research Centre, Department of Analytics, School of Business, Woxsen University, Hyderabad, Telangana, 502345, India
| | - Pijush Kanti Dutta Pramanik
- School of Computer Applications and Technology, Galgotias University, Greater Noida, Uttar Pradesh, 203201, India.
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|
8
|
Hagen M, Dass R, Westhues C, Blom J, Schultheiss SJ, Patz S. Interpretable machine learning decodes soil microbiome's response to drought stress. ENVIRONMENTAL MICROBIOME 2024; 19:35. [PMID: 38812054 PMCID: PMC11138018 DOI: 10.1186/s40793-024-00578-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 05/10/2024] [Indexed: 05/31/2024]
Abstract
BACKGROUND Extreme weather events induced by climate change, particularly droughts, have detrimental consequences for crop yields and food security. Concurrently, these conditions provoke substantial changes in the soil bacterial microbiota and affect plant health. Early recognition of soil affected by drought enables farmers to implement appropriate agricultural management practices. In this context, interpretable machine learning holds immense potential for drought stress classification of soil based on marker taxa. RESULTS This study demonstrates that the 16S rRNA-based metagenomic approach of Differential Abundance Analysis methods and machine learning-based Shapley Additive Explanation values provide similar information. They exhibit their potential as complementary approaches for identifying marker taxa and investigating their enrichment or depletion under drought stress in grass lineages. Additionally, the Random Forest Classifier trained on a diverse range of relative abundance data from the soil bacterial micobiome of various plant species achieves a high accuracy of 92.3 % at the genus rank for drought stress prediction. It demonstrates its generalization capacity for the lineages tested. CONCLUSIONS In the detection of drought stress in soil bacterial microbiota, this study emphasizes the potential of an optimized and generalized location-based ML classifier. By identifying marker taxa, this approach holds promising implications for microbe-assisted plant breeding programs and contributes to the development of sustainable agriculture practices. These findings are crucial for preserving global food security in the face of climate change.
Collapse
Affiliation(s)
- Michelle Hagen
- Computomics GmbH, Eisenbahnstraße 1, 72072, Tübingen, Baden-Württemberg, Germany
| | - Rupashree Dass
- Computomics GmbH, Eisenbahnstraße 1, 72072, Tübingen, Baden-Württemberg, Germany
| | - Cathy Westhues
- Computomics GmbH, Eisenbahnstraße 1, 72072, Tübingen, Baden-Württemberg, Germany
| | - Jochen Blom
- Bioinformatics and Systems Biology, Justus Liebig University Gießen, Heinrich-Buff-Ring 58, 35390, Gießen, Hesse, Germany
| | | | - Sascha Patz
- Computomics GmbH, Eisenbahnstraße 1, 72072, Tübingen, Baden-Württemberg, Germany.
| |
Collapse
|
9
|
Kasim S, Amir Rudin PNF, Malek S, Ibrahim KS, Wan Ahmad WA, Fong AYY, Lin WY, Aziz F, Ibrahim N. Ensemble machine learning for predicting in-hospital mortality in Asian women with ST-elevation myocardial infarction (STEMI). Sci Rep 2024; 14:12378. [PMID: 38811643 PMCID: PMC11137033 DOI: 10.1038/s41598-024-61151-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 05/02/2024] [Indexed: 05/31/2024] Open
Abstract
The accurate prediction of in-hospital mortality in Asian women after ST-Elevation Myocardial Infarction (STEMI) remains a crucial issue in medical research. Existing models frequently neglect this demographic's particular attributes, resulting in poor treatment outcomes. This study aims to improve the prediction of in-hospital mortality in multi-ethnic Asian women with STEMI by employing both base and ensemble machine learning (ML) models. We centred on the development of demographic-specific models using data from the Malaysian National Cardiovascular Disease Database spanning 2006 to 2016. Through a careful iterative feature selection approach that included feature importance and sequential backward elimination, significant variables such as systolic blood pressure, Killip class, fasting blood glucose, beta-blockers, angiotensin-converting enzyme inhibitors (ACE), and oral hypoglycemic medications were identified. The findings of our study revealed that ML models with selected features outperformed the conventional Thrombolysis in Myocardial Infarction (TIMI) Risk score, with area under the curve (AUC) ranging from 0.60 to 0.93 versus TIMI's AUC of 0.81. Remarkably, our best-performing ensemble ML model was surpassed by the base ML model, support vector machine (SVM) Linear with SVM selected features (AUC: 0.93, CI: 0.89-0.98 versus AUC: 0.91, CI: 0.87-0.96). Furthermore, the women-specific model outperformed a non-gender-specific STEMI model (AUC: 0.92, CI: 0.87-0.97). Our findings demonstrate the value of women-specific ML models over standard approaches, emphasizing the importance of continued testing and validation to improve clinical care for women with STEMI.
Collapse
Affiliation(s)
- Sazzli Kasim
- Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- Cardiac Vascular and Lung Research Institute, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
- Faculty of Medicine, Universiti Teknologi MARA (UiTM), Sungai Buloh Campus, Sungai Buloh, Malaysia
| | | | - Sorayya Malek
- Institute of Biological Sciences, Faculty of Science, University Malaya, Kuala Lumpur, Malaysia.
| | - Khairul Shafiq Ibrahim
- Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- Cardiac Vascular and Lung Research Institute, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
| | - Wan Azman Wan Ahmad
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
- Division of Cardiology, University Malaya Medical Centre (UMMC), Kuala Lumpur, Malaysia
| | - Alan Yean Yip Fong
- National Heart Association of Malaysia, Heart House, Kuala Lumpur, Malaysia
- Department of Cardiology, Sarawak General Hospital, Kuching, Sarawak, Malaysia
| | - Wan Yin Lin
- Institute of Biological Sciences, Faculty of Science, University Malaya, Kuala Lumpur, Malaysia
| | - Firdaus Aziz
- School of Liberal Studies, Universiti Kebangsaan Malaysia, Bangi, Malaysia
| | - Nurulain Ibrahim
- Faculty of Medicine, Universiti Teknologi MARA (UiTM), Sungai Buloh Campus, Sungai Buloh, Malaysia
| |
Collapse
|
10
|
Chaudhary R, Nourelahi M, Thoma FW, Gellad WF, Lo-Ciganic WH, Bliden KP, Gurbel PA, Neal MD, Jain SK, Bhonsale A, Mulukutla SR, Wang Y, Harinstein ME, Saba S, Visweswaran S. Machine Learning - Based Bleeding Risk Predictions in Atrial Fibrillation Patients on Direct Oral Anticoagulants. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.27.24307985. [PMID: 38854094 PMCID: PMC11160827 DOI: 10.1101/2024.05.27.24307985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Importance Accurately predicting major bleeding events in non-valvular atrial fibrillation (AF) patients on direct oral anticoagulants (DOACs) is crucial for personalized treatment and improving patient outcomes, especially with emerging alternatives like left atrial appendage closure devices. The left atrial appendage closure devices reduce stroke risk comparably but with significantly fewer non-procedural bleeding events. Objective To evaluate the performance of machine learning (ML) risk models in predicting clinically significant bleeding events requiring hospitalization and hemorrhagic stroke in non-valvular AF patients on DOACs compared to conventional bleeding risk scores (HAS-BLED, ORBIT, and ATRIA) at the index visit to a cardiologist for AF management. Design Prognostic modeling with retrospective cohort study design using electronic health record (EHR) data, with clinical follow-up at one-, two-, and five-years. Setting University of Pittsburgh Medical Center (UPMC) system. Participants 24,468 non-valvular AF patients aged ≥18 years treated with DOACs, excluding those with prior history of significant bleeding, other indications for DOACs, on warfarin or contraindicated to DOACs. Exposures DOAC therapy for non-valvular AF. Main Outcomes and Measures The primary endpoint was clinically significant bleeding requiring hospitalization within one year of index visit. The models incorporated demographic, clinical, and laboratory variables available in the EHR at the index visit. Results Among 24,468 patients, 553 (2.3%) had bleeding events within one year, 829 (3.5%) within two years, and 1,292 (5.8%) within five years of index visit. We evaluated multivariate logistic regression and ML models including random forest, classification trees, k-nearest neighbor, naive Bayes, and extreme gradient boosting (XGBoost) which modestly outperformed HAS-BLED, ATRIA, and ORBIT scores in predicting clinically significant bleeding at 1-year follow-up. The best performing model (random forest) showed area under the curve (AUC-ROC) 0.76 (0.70-0.81), G-Mean score of 0.67, net reclassification index 0.14 compared to 0.57 (0.50-0.63), G-Mean score of 0.57 for HASBLED score, p-value for difference <0.001. The ML models had improved performance compared to conventional risk across time-points of 2-year and 5-years and within the subgroup of hemorrhagic stroke. SHAP analysis identified novel risk factors including measures from body mass index, cholesterol profile, and insurance type beyond those used in conventional risk scores. Conclusions and Relevance Our findings demonstrate the superior performance of ML models compared to conventional bleeding risk scores and identify novel risk factors highlighting the potential for personalized bleeding risk assessment in AF patients on DOACs.
Collapse
|
11
|
Hassan A, Gulzar Ahmad S, Ullah Munir E, Ali Khan I, Ramzan N. Predictive modelling and identification of key risk factors for stroke using machine learning. Sci Rep 2024; 14:11498. [PMID: 38769427 PMCID: PMC11106277 DOI: 10.1038/s41598-024-61665-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 05/08/2024] [Indexed: 05/22/2024] Open
Abstract
Strokes are a leading global cause of mortality, underscoring the need for early detection and prevention strategies. However, addressing hidden risk factors and achieving accurate prediction become particularly challenging in the presence of imbalanced and missing data. This study encompasses three imputation techniques to deal with missing data. To tackle data imbalance, it employs the synthetic minority oversampling technique (SMOTE). The study initiates with a baseline model and subsequently employs an extensive range of advanced models. This study thoroughly evaluates the performance of these models by employing k-fold cross-validation on various imbalanced and balanced datasets. The findings reveal that age, body mass index (BMI), average glucose level, heart disease, hypertension, and marital status are the most influential features in predicting strokes. Furthermore, a Dense Stacking Ensemble (DSE) model is built upon previous advanced models after fine-tuning, with the best-performing model as a meta-classifier. The DSE model demonstrated over 96% accuracy across diverse datasets, with an AUC score of 83.94% on imbalanced imputed dataset and 98.92% on balanced one. This research underscores the remarkable performance of the DSE model, compared to the previous research on the same dataset. It highlights the model's potential for early stroke detection to improve patient outcomes.
Collapse
Affiliation(s)
- Ahmad Hassan
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Grand Trunk Road, Wah, 47010, Pakistan
| | - Saima Gulzar Ahmad
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Grand Trunk Road, Wah, 47010, Pakistan
| | - Ehsan Ullah Munir
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Grand Trunk Road, Wah, 47010, Pakistan
| | - Imtiaz Ali Khan
- Department of Computer Science, Cardiff School of Technologies, Llandaff Campus, Western Avenue, Cardiff, CF5 2YB, UK
| | - Naeem Ramzan
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High Street, Paisley, PA1 2BE, UK.
| |
Collapse
|
12
|
Tao L, Zhou T, Wu Z, Hu F, Yang S, Kong X, Li C. ESPDHot: An Effective Machine Learning-Based Approach for Predicting Protein-DNA Interaction Hotspots. J Chem Inf Model 2024; 64:3548-3557. [PMID: 38587997 DOI: 10.1021/acs.jcim.3c02011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Protein-DNA interactions are pivotal to various cellular processes. Precise identification of the hotspot residues for protein-DNA interactions holds great significance for revealing the intricate mechanisms in protein-DNA recognition and for providing essential guidance for protein engineering. Aiming at protein-DNA interaction hotspots, this work introduces an effective prediction method, ESPDHot based on a stacked ensemble machine learning framework. Here, the interface residue whose mutation leads to a binding free energy change (ΔΔG) exceeding 2 kcal/mol is defined as a hotspot. To tackle the imbalanced data set issue, the adaptive synthetic sampling (ADASYN), an oversampling technique, is adopted to synthetically generate new minority samples, thereby rectifying data imbalance. As for molecular characteristics, besides traditional features, we introduce three new characteristic types including residue interface preference proposed by us, residue fluctuation dynamics characteristics, and coevolutionary features. Combining the Boruta method with our previously developed Random Grouping strategy, we obtained an optimal set of features. Finally, a stacking classifier is constructed to output prediction results, which integrates three classical predictors, Support Vector Machine (SVM), XGBoost, and Artificial Neural Network (ANN) as the first layer, and Logistic Regression (LR) algorithm as the second one. Notably, ESPDHot outperforms the current state-of-the-art predictors, achieving superior performance on the independent test data set, with F1, MCC, and AUC reaching 0.571, 0.516, and 0.870, respectively.
Collapse
Affiliation(s)
- Lianci Tao
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Tong Zhou
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fangrui Hu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Shuang Yang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Xiaotian Kong
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
13
|
Uddin S, Lu H. Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data. PLoS One 2024; 19:e0301541. [PMID: 38635591 PMCID: PMC11025817 DOI: 10.1371/journal.pone.0301541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 03/18/2024] [Indexed: 04/20/2024] Open
Abstract
Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from a wide range of research contexts to statistically confirm the superiority of tree-based ML algorithms over their counterparts. Specifically, it examines two tree-based ML (Decision tree and Random forest) and three non-tree-based ML (Support vector machine, Logistic regression and k-nearest neighbour) algorithms. Results from paired-sample t-tests show that both tree-based ML algorithms reveal better performance than each non-tree-based ML algorithm for the four ML performance measures (accuracy, precision, recall and F1 score) considered in this study, each at p<0.001 significance level. This performance superiority is consistent across both the model development and test phases. This study also used paired-sample t-tests for the subsets of the research datasets from disease prediction (66) and university-ranking (50) research contexts for further validation. The observed superiority of the tree-based ML algorithms remains valid for these subsets. Tree-based ML algorithms significantly outperformed non-tree-based algorithms for these two research contexts for all four performance measures. We discuss the research implications of these findings in detail in this article.
Collapse
Affiliation(s)
- Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, NSW, Australia
| | - Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, NSW, Australia
| |
Collapse
|
14
|
Zhang Y, Zhang L, Lv H, Zhang G. Ensemble machine learning prediction of hyperuricemia based on a prospective health checkup population. Front Physiol 2024; 15:1357404. [PMID: 38665596 PMCID: PMC11043598 DOI: 10.3389/fphys.2024.1357404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 03/11/2024] [Indexed: 04/28/2024] Open
Abstract
Objectives: An accurate prediction model for hyperuricemia (HUA) in adults remain unavailable. This study aimed to develop a stacking ensemble prediction model for HUA to identify high-risk groups and explore risk factors. Methods: A prospective health checkup cohort of 40899 subjects was examined and randomly divided into the training and validation sets with the ratio of 7:3. LASSO regression was employed to screen out important features and then the ROSE sampling was used to handle the imbalanced classes. An ensemble model using stacking strategy was constructed based on three individual models, including support vector machine, decision tree C5.0, and eXtreme gradient boosting. Model validations were conducted using the area under the receiver operating characteristic curve (AUC) and the calibration curve, as well as metrics including accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score. A model agnostic instance level variable attributions technique (iBreakdown) was used to illustrate the black-box nature of our ensemble model, and to identify contributing risk factors. Results: Fifteen important features were screened out of 23 clinical variables. Our stacking ensemble model with an AUC of 0.854, outperformed the other three models, support vector machine, decision tree C5.0, and eXtreme gradient boosting with AUCs of 0.848, 0.851 and 0.849 respectively. Calibration accuracy as well as other metrics including accuracy, specificity, negative predictive value, and F1 score were also proved our ensemble model's superiority. The contributing risk factors were estimated using six randomly selected subjects, which showed that being female and relatively younger, together with having higher baseline uric acid, body mass index, γ-glutamyl transpeptidase, total protein, triglycerides, creatinine, and fasting blood glucose can increase the risk of HUA. To further validate our model's applicability in the health checkup population, we used another cohort of 8559 subjects that also showed our ensemble prediction model had favorable performances with an AUC of 0.846. Conclusion: In this study, the stacking ensemble prediction model for HUA was developed, and it outperformed three individual models that compose it (support vector machine, decision tree C5.0, and eXtreme gradient boosting). The contributing risk factors were identified with insightful ideas.
Collapse
Affiliation(s)
- Yongsheng Zhang
- Health Management Center, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
- Institute of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
- Shandong Engineering Laboratory of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Li Zhang
- Department of Pharmacology, Jinan Central Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Haoyue Lv
- Health Management Center, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
- Institute of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
- Shandong Engineering Laboratory of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Guang Zhang
- Health Management Center, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
- Institute of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
- Shandong Engineering Laboratory of Health Management, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| |
Collapse
|
15
|
Mukherjee A, Abraham S, Singh A, Balaji S, Mukunthan KS. From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies. Mol Biotechnol 2024:10.1007/s12033-024-01133-6. [PMID: 38565775 DOI: 10.1007/s12033-024-01133-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
In the dynamic landscape of targeted therapeutics, drug discovery has pivoted towards understanding underlying disease mechanisms, placing a strong emphasis on molecular perturbations and target identification. This paradigm shift, crucial for drug discovery, is underpinned by big data, a transformative force in the current era. Omics data, characterized by its heterogeneity and enormity, has ushered biological and biomedical research into the big data domain. Acknowledging the significance of integrating diverse omics data strata, known as multi-omics studies, researchers delve into the intricate interrelationships among various omics layers. This review navigates the expansive omics landscape, showcasing tailored assays for each molecular layer through genomes to metabolomes. The sheer volume of data generated necessitates sophisticated informatics techniques, with machine-learning (ML) algorithms emerging as robust tools. These datasets not only refine disease classification but also enhance diagnostics and foster the development of targeted therapeutic strategies. Through the integration of high-throughput data, the review focuses on targeting and modeling multiple disease-regulated networks, validating interactions with multiple targets, and enhancing therapeutic potential using network pharmacology approaches. Ultimately, this exploration aims to illuminate the transformative impact of multi-omics in the big data era, shaping the future of biological research.
Collapse
Affiliation(s)
- Arnab Mukherjee
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Suzanna Abraham
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Akshita Singh
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - S Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - K S Mukunthan
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| |
Collapse
|
16
|
Nwadiugwu M, Onwuekwe I, Ezeanolue E, Deng H. Beyond Amyloid: A Machine Learning-Driven Approach Reveals Properties of Potent GSK-3β Inhibitors Targeting Neurofibrillary Tangles. Int J Mol Sci 2024; 25:2646. [PMID: 38473895 PMCID: PMC10931970 DOI: 10.3390/ijms25052646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 02/16/2024] [Accepted: 02/21/2024] [Indexed: 03/14/2024] Open
Abstract
Current treatments for Alzheimer's disease (AD) focus on slowing memory and cognitive decline, but none offer curative outcomes. This study aims to explore and curate the common properties of active, drug-like molecules that modulate glycogen synthase kinase 3β (GSK-3β), a well-documented kinase with increased activity in tau hyperphosphorylation and neurofibrillary tangles-hallmarks of AD pathology. Leveraging quantitative structure-activity relationship (QSAR) data from the PubChem and ChEMBL databases, we employed seven machine learning models: logistic regression (LogR), k-nearest neighbors (KNN), random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB), neural networks (NNs), and ensemble majority voting. Our goal was to correctly predict active and inactive compounds that inhibit GSK-3β activity and identify their key properties. Among the six individual models, the NN demonstrated the highest performance with a 79% AUC-ROC on unbalanced external validation data, while the SVM model was superior in accurately classifying the compounds. The SVM and RF models surpassed NN in terms of Kappa values, and the ensemble majority voting model demonstrated slightly better accuracy to the NN on the external validation data. Feature importance analysis revealed that hydrogen bonds, phenol groups, and specific electronic characteristics are important features of molecular descriptors that positively correlate with active GSK-3β inhibition. Conversely, structural features like imidazole rings, sulfides, and methoxy groups showed a negative correlation. Our study highlights the significance of structural, electronic, and physicochemical descriptors in screening active candidates against GSK-3β. These predictive features could prove useful in therapeutic strategies to understand the important properties of GSK-3β candidate inhibitors that may potentially benefit non-amyloid-based AD treatments targeting neurofibrillary tangles.
Collapse
Affiliation(s)
- Martin Nwadiugwu
- Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Ikenna Onwuekwe
- Neurology Unit, Department of Medicine, University of Nigeria Teaching Hospital, Ituku-Ozalla 400001, Enugu, Nigeria;
- Department of Medicine, College of Medicine, University of Nigeria, Enugu Campus, Nsukka 400001, Enugu, Nigeria
| | - Echezona Ezeanolue
- Center for Translation and Implementation Research (CTAIR), University of Nigeria, Nsukka 410001, Enugu, Nigeria;
- Healthy Sunrise Foundation, Las Vegas, NV 89107, USA
| | - Hongwen Deng
- Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, Tulane University, New Orleans, LA 70112, USA
| |
Collapse
|
17
|
Kim SH, Park SH, Lee H. Machine learning for predicting hepatitis B or C virus infection in diabetic patients. Sci Rep 2023; 13:21518. [PMID: 38057379 PMCID: PMC10700585 DOI: 10.1038/s41598-023-49046-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 12/04/2023] [Indexed: 12/08/2023] Open
Abstract
Highly prevalent hepatitis B and hepatitis C virus (HBV and HCV) infections have been reported among individuals with diabetes. Given the frequently asymptomatic nature of hepatitis and the challenges associated with screening in some vulnerable populations such as diabetes patients, we conducted an investigation into the performance of various machine learning models for the identification of hepatitis in diabetic patients while also evaluating the significance of features. Analyzing NHANES data from 2013 to 2018, machine learning models were evaluated; random forest (RF), support vector machine (SVM), eXtreme Gradient Boosting (XGBoost), and least absolute shrinkage and selection operator (LASSO) along with stacked ensemble model. We performed hyperparameter tuning to improve the performance of the model, and selected important predictors using the best performance model. LASSO showed the highest predictive performance (AUC-ROC = 0.810) rather than other models. Illicit drug use, poverty, and race were highly ranked as predictive factors for developing hepatitis in diabetes patients. Our study demonstrated that a machine-learning-based model performed optimally in the detection of hepatitis among diabetes patients, achieving high performance. Furthermore, models and predictors evaluated from the current study, we expect, could be supportive information for developing screening or treatment methods for hepatitis care in diabetes patients.
Collapse
Affiliation(s)
- Sun-Hwa Kim
- Department of Clinical Medicinal Sciences, Konyang University, Nonsan, Republic of Korea
| | - So-Hyeon Park
- Department of Clinical Medicinal Sciences, Konyang University, Nonsan, Republic of Korea
| | - Heeyoung Lee
- College of Pharmacy, Inje University, Gimhae, Republic of Korea.
| |
Collapse
|
18
|
Mohsin SN, Gapizov A, Ekhator C, Ain NU, Ahmad S, Khan M, Barker C, Hussain M, Malineni J, Ramadhan A, Halappa Nagaraj R. The Role of Artificial Intelligence in Prediction, Risk Stratification, and Personalized Treatment Planning for Congenital Heart Diseases. Cureus 2023; 15:e44374. [PMID: 37664359 PMCID: PMC10469091 DOI: 10.7759/cureus.44374] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/30/2023] [Indexed: 09/05/2023] Open
Abstract
This narrative review delves into the potential of artificial intelligence (AI) in predicting, stratifying risk, and personalizing treatment planning for congenital heart disease (CHD). CHD is a complex condition that affects individuals across various age groups. The review highlights the challenges in predicting risks, planning treatments, and prognosticating long-term outcomes due to CHD's multifaceted nature, limited data, ethical concerns, and individual variabilities. AI, with its ability to analyze extensive data sets, presents a promising solution. The review emphasizes the need for larger, diverse datasets, the integration of various data sources, and the analysis of longitudinal data. Prospective validation in real-world clinical settings, interpretability, and the importance of human clinical expertise are also underscored. The ethical considerations surrounding privacy, consent, bias, monitoring, and human oversight are examined. AI's implications include improved patient outcomes, cost-effectiveness, and real-time decision support. The review aims to provide a comprehensive understanding of AI's potential for revolutionizing CHD management and highlights the significance of collaboration and transparency to address challenges and limitations.
Collapse
Affiliation(s)
| | | | - Chukwuyem Ekhator
- Neuro-Oncology, New York Institute of Technology, College of Osteopathic Medicine, Old Westbury, USA
| | - Noor U Ain
- Medicine, Mayo Hospital, Lahore, PAK
- Medicine, King Edward Medical University, Lahore, PAK
| | | | - Mavra Khan
- Medicine and Surgery, Mayo Hospital, Lahore , PAK
| | - Chad Barker
- Public Health, University of South Florida, Tampa, USA
| | | | - Jahnavi Malineni
- Medicine and Surgery, Maharajah's Institute of Medical Sciences, Vizianagaram, IND
| | - Afif Ramadhan
- Medicine, Universal Scientific Education and Research Network (USERN), Yogyakarta, IDN
- Medicine, Faculty of Medicine, Public Health, and Nursing, Gadjah Mada University, Yogyakarta, IDN
| | | |
Collapse
|