Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

158
(from Reference Citation Analysis)

Article PDFs (28)

Cited by > 0 (70)

Searched Name

SHAP

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Schoonemann J, Nagelkerke J, Seuntjens TG, Osinga N, van Liere D. Applying XGBoost and SHAP to Open Source Data to Identify Key Drivers and Predict Likelihood of Wolf Pair Presence. Environ Manage 2024;73:1072-1087. [PMID: 38372749 DOI: 10.1007/s00267-024-01941-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 01/20/2024] [Indexed: 02/20/2024]

Tanwar N, Hasija Y. Explicate molecular landscape of combined pulmonary fibrosis and emphysema through explainable artificial intelligence: a comprehensive analysis of ILD and COPD interactions using RNA from whole lung homogenates. Med Biol Eng Comput 2024:10.1007/s11517-024-03099-8. [PMID: 38644448 DOI: 10.1007/s11517-024-03099-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 04/14/2024] [Indexed: 04/23/2024]

Zhang L, Wang L, Ji D, Xia Z, Nan P, Zhang J, Li K, Qi B, Du R, Sun Y, Wang Y, Hu B. Explainable ensemble machine learning revealing the effect of meteorology and sources on ozone formation in megacity Hangzhou, China. Sci Total Environ 2024;922:171295. [PMID: 38417501 DOI: 10.1016/j.scitotenv.2024.171295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/23/2024] [Accepted: 02/25/2024] [Indexed: 03/01/2024]

Abstract

Megacity Hangzhou, located in eastern China, has experienced severe O3 pollution in recent years, thereby clarifying the key drivers of the formation is essential to suppress O3 deterioration. In this study, the ensemble machine learning model (EML) coupled with Shapley additive explanations (SHAP), and positive matrix factorization were used to explore the impact of various factors (including meteorology, chemical components, sources) on O3 formation during the whole period, pollution days, and typical persistent pollution events from April to October in 2021-2022. The EML model achieved better performance than the single model, with R2 values of 0.91. SHAP analysis revealed that meteorological conditions had the greatest effects on O3 variability with the contribution of 57 %-60 % for different pollution levels, and the main drivers were relative humidity and radiation. The effects of chemical factors on O3 formation presented a positive response to volatile organic compounds (VOCs) and fine particulate matter (PM2.5), and a negative response to nitrogen oxides (NOx). Oxygenated compounds (OVOCs), alkenes, and aromatic of VOCs subgroups had higher contribution; additionally, the effects of PM2.5 and NOx were also important and increased with the O3 deterioration. The impact of seven emission sources on O3 formation in Hangzhou indicated that vehicle exhaust (35 %), biomass combustion (16 %), and biogenic emissions (12 %) were the dominant drivers. However, for the O3 pollution days, the effects of biomass combustion and biogenic emissions increased. Especially in persistent pollution events with highest O3 concentrations, the magnitude of biogenic emission effect elevated significantly by 156 % compared to the whole situations. Our finding revealed that the combination of the EML model and SHAP analysis could provide a reliable method for rapid diagnosis of the cause of O3 pollution at different event scales, supporting the formulation of control measures.

Collapse

Affiliation(s)

Lei Zhang Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China; University of the Chinese Academy of Sciences, Beijing 100049, China
Lili Wang Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China; Zhejiang Key Laboratory of Ecological and Environmental Big Data (2022P10005), Zhejiang Ecological and Environmental Monitoring Center, Hangzhou 310012, China; Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Jiangsu Key Laboratory of Atmospheric Environment Monitoring and Pollution Control, School of Environmental Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China.
Dan Ji Suichang Meteorological Bureau, Suichang 323000, China
Zheng Xia Zhejiang Key Laboratory of Ecological and Environmental Big Data (2022P10005), Zhejiang Ecological and Environmental Monitoring Center, Hangzhou 310012, China; Zhejiang Key Laboratory of Ecological and Environmental Monitoring, Forewarning and Quality Control, Hangzhou 310012, China
Peifan Nan Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China; College of Resources Environment and Tourism, Capital Normal University, Beijing 100048, China
Jiaxin Zhang Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China; University of the Chinese Academy of Sciences, Beijing 100049, China
Ke Li Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Jiangsu Key Laboratory of Atmospheric Environment Monitoring and Pollution Control, School of Environmental Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China
Bing Qi Hangzhou Meteorological Bureau, Hangzhou 310051, China
Rongguang Du Hangzhou Meteorological Bureau, Hangzhou 310051, China
Yang Sun Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China
Yuesi Wang Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China; University of the Chinese Academy of Sciences, Beijing 100049, China
Bo Hu Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China

Collapse

Vimbi V, Shaffi N, Mahmud M. Interpreting artificial intelligence models: a systematic review on the application of LIME and SHAP in Alzheimer's disease detection. Brain Inform 2024;11:10. [PMID: 38578524 PMCID: PMC10997568 DOI: 10.1186/s40708-024-00222-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 03/04/2024] [Indexed: 04/06/2024] Open

Shinohara I, Mifune Y, Inui A, Nishimoto H, Yoshikawa T, Kato T, Furukawa T, Tanaka S, Kusunose M, Hoshino Y, Matsushita T, Mitani M, Kuroda R. Re-tear after arthroscopic rotator cuff tear surgery: risk analysis using machine learning. J Shoulder Elbow Surg 2024;33:815-822. [PMID: 37625694 DOI: 10.1016/j.jse.2023.07.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 07/06/2023] [Accepted: 07/16/2023] [Indexed: 08/27/2023]

Abstract

BACKGROUND

Postoperative rotator cuff retear after arthroscopic rotator cuff repair (ARCR) is still a major problem. Various risk factors such as age, gender, and tear size have been reported. Recently, magnetic resonance imaging-based stump classification was reported as an index of rotator cuff fragility. Although stump type 3 is reported to have a high retear rate, there are few reports on the risk of postoperative retear based on this classification. Machine learning (ML), an artificial intelligence technique, allows for more flexible predictive models than conventional statistical methods and has been applied to predict clinical outcomes. In this study, we used ML to predict postoperative retear risk after ARCR.

METHODS

The retrospective case-control study included 353 patients who underwent surgical treatment for complete rotator cuff tear using the suture-bridge technique. Patients who initially presented with retears and traumatic tears were excluded. In study participants, after the initial tear repair, rotator cuff retears were diagnosed by magnetic resonance imaging; Sugaya classification types IV and V were defined as re-tears. Age, gender, stump classification, tear size, Goutallier classification, presence of diabetes, and hyperlipidemia were used for ML parameters to predict the risk of retear. Using Python's Scikit-learn as an ML library, five different AI models (logistic regression, random forest, AdaBoost, CatBoost, LightGBM) were trained on the existing data, and the prediction models were applied to the test dataset. The performance of these ML models was measured by the area under the receiver operating characteristic curve. Additionally, key features affecting retear were evaluated.

RESULTS

The area under the receiver operating characteristic curve for logistic regression was 0.78, random forest 0.82, AdaBoost 0.78, CatBoost 0.83, and LightGBM 0.87, respectively for each model. LightGBM showed the highest score. The important factors for model prediction were age, stump classification, and tear size.

CONCLUSIONS

The ML classifier model predicted retears after ARCR with high accuracy, and the AI model showed that the most important characteristics affecting retears were age and imaging findings, including stump classification. This model may be able to predict postoperative rotator cuff retears based on clinical features.

Collapse

Huang F, Zhang X. A new interpretable streamflow prediction approach based on SWAT-BiLSTM and SHAP. Environ Sci Pollut Res Int 2024;31:23896-23908. [PMID: 38430443 DOI: 10.1007/s11356-024-32725-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 02/27/2024] [Indexed: 03/03/2024]

Abstract

Streamflow is a crucial variable for assessing the available water resources for both human and environmental use. Accurate streamflow prediction plays a significant role in water resource management and assessing the impacts of climate change. This study explores the potential of coupling conceptual hydrological models based on physical processes with machine learning algorithms to enhance the performance of streamflow simulations. Four coupled models, namely SWAT-Transformer, SWAT-LSTM, SWAT-GRU, and SWAT-BiLSTM, were constructed in this research. SWAT served as a transfer function to convert four meteorological features, including precipitation, temperature, relative humidity, and wind speed, into six hydrological features: soil water content, lateral flow, percolation, groundwater discharge, surface runoff, and evapotranspiration. Machine learning algorithms were employed to capture the underlying relationships between these ten feature variables and the target variable (streamflow) to predict daily streamflow in the Sandu-River Basin (SRB). Among the four coupled models and the calibrated SWAT model, SWAT-BiLSTM exhibited the best streamflow simulation performance. During the calibration period (training period), it achieved R2 and NSE values of 0.92 and 0.91, respectively, and maintained them at 0.90 during the validation period (testing period). Additionally, the performance of all four coupled models surpassed that of the calibrated SWAT model. Compared to the tendency of the SWAT model to underestimate streamflow, the absolute values of PBIAS for all coupled models are below 10%, which indicates that there is no significant systematic bias evident. SHapley Additive exPlanations (SHAP) were used to analyze the impact of different feature variables on streamflow prediction. The results indicated that precipitation contributed the most to streamflow prediction, with a global importance of 29.7%. Hydrological feature variable output by the SWAT model played a dominant role in the Bi-LSTM's prediction process. Coupling conceptual hydrological models with machine learning algorithms can significantly enhance the predictive performance of streamflow. The application of SHAP improves the interpretability of the coupled models and enhances researchers' confidence in the prediction results.

Collapse

Wu J, Chen X, Li R, Wang A, Huang S, Li Q, Qi H, Liu M, Cheng H, Wang Z. A novel framework for high resolution air quality index prediction with interpretable artificial intelligence and uncertainties estimation. J Environ Manage 2024;357:120785. [PMID: 38583378 DOI: 10.1016/j.jenvman.2024.120785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 02/02/2024] [Accepted: 03/27/2024] [Indexed: 04/09/2024]

Abstract

Accurate air quality index (AQI) prediction is essential in environmental monitoring and management. Given that previous studies neglect the importance of uncertainty estimation and the necessity of constraining the output during prediction, we proposed a new hybrid model, namely TMSSICX, to forecast the AQI of multiple cities. Firstly, time-varying filtered based empirical mode decomposition (TVFEMD) was adopted to decompose the AQI sequence into multiple internal mode functions (IMF) components. Secondly, multi-scale fuzzy entropy (MFE) was applied to evaluate the complexity of each IMF component and clustered them into high and low-frequency portions. In addition, the high-frequency portion was secondarily decomposed by successive variational mode decomposition (SVMD) to reduce volatility. Then, six air pollutant concentrations, namely CO, SO2, PM2.5, PM10, O3, and NO2, were used as inputs. The secondary decomposition and preliminary portion were employed as the outputs for the bidirectional long short-term memory network optimized by the snake optimization algorithm (SOABiLSTM) and improved Catboost (ICatboost), respectively. Furthermore, extreme gradient boosting (XGBoost) was applied to ensemble each predicted sub-model to acquire the consequence. Ultimately, we introduced adaptive kernel density estimation (AKDE) for interval estimation. The empirical outcome indicated the TMSSICX model achieved the best performance among the other 23 models across all datasets. Moreover, implementing the XGBoost to ensemble each predicted sub-model led to an 8.73%, 8.94%, and 0.19% reduction in RMSE, compared to SVM. Additionally, by utilizing SHapley Additive exPlanations (SHAP) to assess the impact of the six pollutant concentrations on AQI, the results reveal that PM2.5 and PM10 had the most notable positive effects on the long-term trend of AQI. We hope this model can provide guidance for air quality management.

Collapse

Yilmaz R, Yagin FH, Colak C, Toprak K, Abdel Samee N, Mahmoud NF, Alshahrani AA. Analysis of hematological indicators via explainable artificial intelligence in the diagnosis of acute heart failure: a retrospective study. Front Med (Lausanne) 2024;11:1285067. [PMID: 38633310 PMCID: PMC11023638 DOI: 10.3389/fmed.2024.1285067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 03/14/2024] [Indexed: 04/19/2024] Open

Abstract

Introduction

Acute heart failure (AHF) is a serious medical problem that necessitates hospitalization and often results in death. Patients hospitalized in the emergency department (ED) should therefore receive an immediate diagnosis and treatment. Unfortunately, there is not yet a fast and accurate laboratory test for identifying AHF. The purpose of this research is to apply the principles of explainable artificial intelligence (XAI) to the analysis of hematological indicators for the diagnosis of AHF.

Methods

In this retrospective analysis, 425 patients with AHF and 430 healthy individuals served as assessments. Patients' demographic and hematological information was analyzed to diagnose AHF. Important risk variables for AHF diagnosis were identified using the Least Absolute Shrinkage and Selection Operator (LASSO) feature selection. To test the efficacy of the suggested prediction model, Extreme Gradient Boosting (XGBoost), a 10-fold cross-validation procedure was implemented. The area under the receiver operating characteristic curve (AUC), F1 score, Brier score, Positive Predictive Value (PPV), and Negative Predictive Value (NPV) were all computed to evaluate the model's efficacy. Permutation-based analysis and SHAP were used to assess the importance and influence of the model's incorporated risk factors.

Results

White blood cell (WBC), monocytes, neutrophils, neutrophil-lymphocyte ratio (NLR), red cell distribution width-standard deviation (RDW-SD), RDW-coefficient of variation (RDW-CV), and platelet distribution width (PDW) values were significantly higher than the healthy group (p < 0.05). On the other hand, erythrocyte, hemoglobin, basophil, lymphocyte, mean platelet volume (MPV), platelet, hematocrit, mean erythrocyte hemoglobin (MCH), and procalcitonin (PCT) values were found to be significantly lower in AHF patients compared to healthy controls (p < 0.05). When XGBoost was used in conjunction with LASSO to diagnose AHF, the resulting model had an AUC of 87.9%, an F1 score of 87.4%, a Brier score of 0.036, and an F1 score of 87.4%. PDW, age, RDW-SD, and PLT were identified as the most crucial risk factors in differentiating AHF.

Conclusion

The results of this study showed that XAI combined with ML could successfully diagnose AHF. SHAP descriptions show that advanced age, low platelet count, high RDW-SD, and PDW are the primary hematological parameters for the diagnosis of AHF.

Collapse

Yimit Y, Yasin P, Tuersun A, Wang J, Wang X, Huang C, Abudoubari S, Chen X, Ibrahim I, Nijiati P, Wang Y, Zou X, Nijiati M. Multiparametric MRI-Based Interpretable Radiomics Machine Learning Model Differentiates Medulloblastoma and Ependymoma in Children: A Two-Center Study. Acad Radiol 2024:S1076-6332(24)00131-4. [PMID: 38508934 DOI: 10.1016/j.acra.2024.02.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 02/23/2024] [Accepted: 02/24/2024] [Indexed: 03/22/2024]

Abstract

RATIONALE AND OBJECTIVES

Medulloblastoma (MB) and Ependymoma (EM) in children, share similarities in age group, tumor location, and clinical presentation. Distinguishing between them through clinical diagnosis is challenging. This study aims to explore the effectiveness of using radiomics and machine learning on multiparametric magnetic resonance imaging (MRI) to differentiate between MB and EM and validate its diagnostic ability with an external set.

MATERIALS AND METHODS

Axial T2 weighted image (T2WI) and contrast-enhanced T1weighted image (CE-T1WI) MRI sequences of 135 patients from two centers were collected as train/test sets. Volume of interest (VOI) was manually delineated by an experienced neuroradiologist, supervised by a senior. Feature selection analysis and the least absolute shrinkage and selection operator (LASSO) algorithm identified valuable features, and Shapley additive explanations (SHAP) evaluated their significance. Five machine-learning classifiers-extreme gradient boosting (XGBoost), Bernoulli naive Bayes (Bernoulli NB), Logistic Regression (LR), support vector machine (SVM), linear support vector machine (Linear SVC) classifiers were built based on T2WI (T2 model), CE-T1WI (T1 model), and T1 + T2WI (T1 + T2 model). A human expert diagnosis was developed and corrected by senior radiologists. External validation was performed at Sun Yat-Sen University Cancer Center.

RESULTS

31 valuable features were extracted from T2WI and CE-T1WI. XGBoost demonstrated the highest performance with an area under the curve (AUC) of 0.92 on the test set and maintained an AUC of 0.80 during external validation. For the T1 model, XGBoost achieved the highest AUC of 0.85 on the test set and the highest accuracy of 0.71 on the external validation set. In the T2 model, XGBoost achieved the highest AUC of 0.86 on the test set and the highest accuracy of 0.82 on the external validation set. The human expert diagnosis had an AUC of 0.66 on the test set and 0.69 on the external validation set. The integrated T1 + T2 model achieved an AUC of 0.92 on the test set, 0.80 on the external validation set, achieved the best performance. Overall, XGBoost consistently outperformed in different classification models.

CONCLUSION

The combination of radiomics and machine learning on multiparametric MRI effectively distinguishes between MB and EM in childhood, surpassing human expert diagnosis in training and testing sets.

Collapse

Affiliation(s)

Yasen Yimit Department of Radiology, The First People's Hospital of Kashi (Kashgar) Prefecture, Xinjiang, China, 844000; Xinjiang Key Laboratory of Artificial Intelligence assisted Imaging Diagnosis, Kashi (Kashgar), China, 844000
Parhat Yasin Department of Spine Surgery, First Affiliated Hospital of Xinjiang Medical University, Urumqi, China, 830054
Abudouresuli Tuersun Department of Radiology, The First People's Hospital of Kashi (Kashgar) Prefecture, Xinjiang, China, 844000; Xinjiang Key Laboratory of Artificial Intelligence assisted Imaging Diagnosis, Kashi (Kashgar), China, 844000
Jingru Wang Department of Research Collaboration, R&D center, Beijing Deepwise & League of PHD Technology Co., Ltd, Beijing, PR China, 100080
Xiaohong Wang Department of Radiology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China, 510630
Chencui Huang Department of Research Collaboration, R&D center, Beijing Deepwise & League of PHD Technology Co., Ltd, Beijing, PR China, 100080
Saimaitikari Abudoubari Department of Radiology, The First People's Hospital of Kashi (Kashgar) Prefecture, Xinjiang, China, 844000; Xinjiang Key Laboratory of Artificial Intelligence assisted Imaging Diagnosis, Kashi (Kashgar), China, 844000
Xingzhi Chen Department of Research Collaboration, R&D center, Beijing Deepwise & League of PHD Technology Co., Ltd, Beijing, PR China, 100080
Irshat Ibrahim Department of General Surgery, The First People's Hospital of Kashi (Kashgar) Prefecture, Xinjiang, China, 844000
Pahatijiang Nijiati Department of Radiology, The First People's Hospital of Kashi (Kashgar) Prefecture, Xinjiang, China, 844000; Xinjiang Key Laboratory of Artificial Intelligence assisted Imaging Diagnosis, Kashi (Kashgar), China, 844000
Yunling Wang Department of Imaging Center, First Affiliated Hospital of Xinjiang Medical University, Urumqi, China, 830054
Xiaoguang Zou Xinjiang Key Laboratory of Artificial Intelligence assisted Imaging Diagnosis, Kashi (Kashgar), China, 844000; Clinical Medical Research Center, The First People's Hospital of Kashi (Kashgar) Prefecture, Xinjiang, China, 844000
Mayidili Nijiati Department of Radiology, The First People's Hospital of Kashi (Kashgar) Prefecture, Xinjiang, China, 844000; Xinjiang Key Laboratory of Artificial Intelligence assisted Imaging Diagnosis, Kashi (Kashgar), China, 844000.

Collapse

Liu L, Zhang P, Liu Z, Sun T, Qiao H. Joint global and local interpretation method for CIN status classification in breast cancer. Heliyon 2024;10:e27054. [PMID: 38562500 PMCID: PMC10982965 DOI: 10.1016/j.heliyon.2024.e27054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 12/10/2023] [Accepted: 02/22/2024] [Indexed: 04/04/2024] Open

Sylvester S, Sagehorn M, Gruber T, Atzmueller M, Schöne B. SHAP value-based ERP analysis (SHERPA): Increasing the sensitivity of EEG signals with explainable AI methods. Behav Res Methods 2024:10.3758/s13428-023-02335-7. [PMID: 38453828 DOI: 10.3758/s13428-023-02335-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/27/2023] [Indexed: 03/09/2024]

Kodipalli A, Fernandes SL, Dasar S. An Empirical Evaluation of a Novel Ensemble Deep Neural Network Model and Explainable AI for Accurate Segmentation and Classification of Ovarian Tumors Using CT Images. Diagnostics (Basel) 2024;14:543. [PMID: 38473015 DOI: 10.3390/diagnostics14050543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 02/18/2024] [Accepted: 02/29/2024] [Indexed: 03/14/2024] Open

Lhamo P, Mahanty B. Impact of Acetic Acid Supplementation in Polyhydroxyalkanoates Production by Cupriavidus necator Using Mixture-Process Design and Artificial Neural Network. Appl Biochem Biotechnol 2024;196:1155-1174. [PMID: 37166651 DOI: 10.1007/s12010-023-04567-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/03/2023] [Indexed: 05/12/2023]

Bloch L, Friedrich CM. Systematic comparison of 3D Deep learning and classical machine learning explanations for Alzheimer's Disease detection. Comput Biol Med 2024;170:108029. [PMID: 38308870 DOI: 10.1016/j.compbiomed.2024.108029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 01/25/2024] [Accepted: 01/25/2024] [Indexed: 02/05/2024]

Abstract

Black-box deep learning (DL) models trained for the early detection of Alzheimer's Disease (AD) often lack systematic model interpretation. This work computes the activated brain regions during DL and compares those with classical Machine Learning (ML) explanations. The architectures used for DL were 3D DenseNets, EfficientNets, and Squeeze-and-Excitation (SE) networks. The classical models include Random Forests (RFs), Support Vector Machines (SVMs), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting (LightGBM), Decision Trees (DTs), and Logistic Regression (LR). For explanations, SHapley Additive exPlanations (SHAP) values, Local Interpretable Model-agnostic Explanations (LIME), Gradient-weighted Class Activation Mapping (GradCAM), GradCAM++ and permutation-based feature importance were implemented. During interpretation, correlated features were consolidated into aspects. All models were trained on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. The validation includes internal and external validation on the Australian Imaging and Lifestyle flagship study of Ageing (AIBL) and the Open Access Series of Imaging Studies (OASIS). DL and ML models reached similar classification performances. Regarding the brain regions, both types focus on different regions. The ML models focus on the inferior and middle temporal gyri, and the hippocampus, and amygdala regions previously associated with AD. The DL models focus on a wider range of regions including the optical chiasm, the entorhinal cortices, the left and right vessels, and the 4th ventricle which were partially associated with AD. One explanation for the differences is the input features (textures vs. volumes). Both types show reasonable similarity to a ground truth Voxel-Based Morphometry (VBM) analysis. Slightly higher similarities were measured for ML models.

Collapse

Rahmani P, Gholami H, Golzari S. An interpretable deep learning model to map land subsidence hazard. Environ Sci Pollut Res Int 2024;31:17448-17460. [PMID: 38340298 DOI: 10.1007/s11356-024-32280-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/27/2024] [Indexed: 02/12/2024]

Niu M, Wang C, Chen Y, Zou Q, Qi R, Xu L. CircRNA identification and feature interpretability analysis. BMC Biol 2024;22:44. [PMID: 38408987 PMCID: PMC10898045 DOI: 10.1186/s12915-023-01804-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 12/18/2023] [Indexed: 02/28/2024] Open

Liu K, Zhang J, Liu J, Wang M, Yue Q. Projection of land susceptibility to subsidence hazard in China using an interpretable CNN deep learning model. Sci Total Environ 2024;913:169502. [PMID: 38145687 DOI: 10.1016/j.scitotenv.2023.169502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/05/2023] [Accepted: 12/17/2023] [Indexed: 12/27/2023]

Abstract

Land subsidence is a worldwide geo-environmental hazard. Clarifying the influencing factors of land subsidence hazards susceptibility (LSHS) and their spatial distribution are critical to the prevention and control of subsidence disasters. In this study, we selected natural and anthropogenic features or variables on LSHS and used the interpretable convolutional neural network (CNN) method to successfully construct a LSHS model in China. The model performed well, with AUC and F1-score testing set accuracies reaching 0.9939 and 0.9566, respectively. The interpretable method of SHapley Additive exPlanations (SHAP) was use to elucidate the individual contribution of input features to the predictions of CNN model. The importance ranking of model variables showed that population, gross domestic product (GDP) and groundwater storage (GWS) change are the three major factors that affect China's land subsidence. During year 2004-2016, an area of 237.6 thousand km2 was classified as high and very high LSHS, mainly concentrated in the North China Plain, central Shanxi, southern Shaanxi, Shanghai and the junction of Jiangsu and Zhejiang. There will be 333.82-343.12 thousand km2 of areas located in the high and very high LSHS in the mid-21st century (2030-2059) and 361.9-385.92 thousand km2 of areas in the late-21st century (2070-2099). Future population exposure to high and very high LSHS will be 252.12-270.19 million people (mid-21st century) and 196.14-274.50 million people (late-21st century), respectively, compared with the historical exposure of 210.99 million people. The proportion of future railway and road exposure will reach 14.63 %-14.89 % and 11.51 %-11.82 % in the mid-21st century, and 15.46 %-17.12 % and 12.35 %-13.11 % in the late-21st century, respectively. Our findings provide an important information for creating regional adaptation policies and strategies to mitigate damage induced by subsidence.

Collapse

Hussain I, Jany R. Interpreting Stroke-Impaired Electromyography Patterns through Explainable Artificial Intelligence. Sensors (Basel) 2024;24:1392. [PMID: 38474928 DOI: 10.3390/s24051392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/17/2024] [Accepted: 02/19/2024] [Indexed: 03/14/2024]

Abstract

Electromyography (EMG) proves invaluable myoelectric manifestation in identifying neuromuscular alterations resulting from ischemic strokes, serving as a potential marker for diagnostics of gait impairments caused by ischemia. This study aims to develop an interpretable machine learning (ML) framework capable of distinguishing between the myoelectric patterns of stroke patients and those of healthy individuals through Explainable Artificial Intelligence (XAI) techniques. The research included 48 stroke patients (average age 70.6 years, 65% male) undergoing treatment at a rehabilitation center, alongside 75 healthy adults (average age 76.3 years, 32% male) as the control group. EMG signals were recorded from wearable devices positioned on the bicep femoris and lateral gastrocnemius muscles of both lower limbs during indoor ground walking in a gait laboratory. Boosting ML techniques were deployed to identify stroke-related gait impairments using EMG gait features. Furthermore, we employed XAI techniques, such as Shapley Additive Explanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), and Anchors to interpret the role of EMG variables in the stroke-prediction models. Among the ML models assessed, the GBoost model demonstrated the highest classification performance (AUROC: 0.94) during cross-validation with the training dataset, and it also overperformed (AUROC: 0.92, accuracy: 85.26%) when evaluated using the testing EMG dataset. Through SHAP and LIME analyses, the study identified that EMG spectral features contributing to distinguishing the stroke group from the control group were associated with the right bicep femoris and lateral gastrocnemius muscles. This interpretable EMG-based stroke prediction model holds promise as an objective tool for predicting post-stroke gait impairments. Its potential application could greatly assist in managing post-stroke rehabilitation by providing reliable EMG biomarkers and address potential gait impairment in individuals recovering from ischemic stroke.

Collapse

Cao C, Zhang T, Xin T. The effect of reading engagement on scientific literacy - an analysis based on the XGBoost method. Front Psychol 2024;15:1329724. [PMID: 38420178 PMCID: PMC10899671 DOI: 10.3389/fpsyg.2024.1329724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 01/22/2024] [Indexed: 03/02/2024] Open

Abstract

Scientific literacy is a key factor of personal competitiveness, and reading is the most common activity in daily learning life, and playing the influence of reading on individuals day by day is the most convenient way to improve the level of scientific literacy of all people. Reading engagement is one of the important student characteristics related to reading literacy, which is highly malleable and is jointly reflected by behavioral, cognitive, and affective engagement, and it is of theoretical and practical significance to explore the relationship between reading engagement and scientific literacy using reading engagement as an entry point. In this study, we used PISA2018 data from China to explore the relationship between reading engagement and scientific literacy with a sample of 15-year-old students in mainland China. 36 variables related to reading engagement and background variables (gender, grade, and socioeconomic and cultural status of the family) were selected from the questionnaire as the independent variables, and the score of the Scientific Literacy Assessment (SLA) was taken as the outcome variable, and supervised machine learning method, the XGBoost algorithm, to construct the model. The dataset is randomly divided into training set and test set to optimize the model, which can verify that the obtained model has good fitting degree and generalization ability. Meanwhile, global and local personalized interpretation is done by introducing the SHAP value, a cutting-edge machine model interpretation method. It is found that among the three major components of reading engagement, cognitive engagement is the more influential factor, and students with high reading cognitive engagement level are more likely to get high scores in scientific literacy assessment, which is relatively dominant in the model of this study. On the other hand, this study verifies the feasibility of the current popular machine learning model, i.e., XGBoost, in a large-scale international education assessment program, with a better model adaptability and conditions for global and local interpretation.

Collapse

Choi JH, Choi Y, Lee KS, Ahn KH, Jang WY. Explainable Model Using Shapley Additive Explanations Approach on Wound Infection after Wide Soft Tissue Sarcoma Resection: "Big Data" Analysis Based on Health Insurance Review and Assessment Service Hub. Medicina (Kaunas) 2024;60:327. [PMID: 38399614 PMCID: PMC10890019 DOI: 10.3390/medicina60020327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/04/2024] [Accepted: 02/12/2024] [Indexed: 02/25/2024]

Abstract

Background and Objectives: Soft tissue sarcomas represent a heterogeneous group of malignant mesenchymal tissues. Despite their low prevalence, soft tissue sarcomas present clinical challenges for orthopedic surgeons owing to their aggressive nature, and perioperative wound infections. However, the low prevalence of soft tissue sarcomas has hindered the availability of large-scale studies. This study aimed to analyze wound infections after wide resection in patients with soft tissue sarcomas by employing big data analytics from the Hub of the Health Insurance Review and Assessment Service (HIRA). Materials and Methods: Patients who underwent wide excision of soft tissue sarcomas between 2010 and 2021 were included. Data were collected from the HIRA database of approximately 50 million individuals' information in the Republic of Korea. The data collected included demographic information, diagnoses, prescribed medications, and surgical procedures. Random forest has been used to analyze the major associated determinants. A total of 10,906 observations with complete data were divided into training and validation sets in an 80:20 ratio (8773 vs. 2193 cases). Random forest permutation importance was employed to identify the major predictors of infection and Shapley Additive Explanations (SHAP) values were derived to analyze the directions of associations with predictors. Results: A total of 10,969 patients who underwent wide excision of soft tissue sarcomas were included. Among the study population, 886 (8.08%) patients had post-operative infections requiring surgery. The overall transfusion rate for wide excision was 20.67% (2267 patients). Risk factors among the comorbidities of each patient with wound infection were analyzed and dependence plots of individual features were visualized. The transfusion dependence plot reveals a distinctive pattern, with SHAP values displaying a negative trend for individuals without blood transfusions and a positive trend for those who received blood transfusions, emphasizing the substantial impact of blood transfusions on the likelihood of wound infection. Conclusions: Using the machine learning random forest model and the SHAP values, the perioperative transfusion, male sex, old age, and low SES were important features of wound infection in soft-tissue sarcoma patients.

Collapse

Karim T, Shaon MSH, Sultan MF, Hasan MZ, Kafy AA. ANNprob-ACPs: A novel anticancer peptide identifier based on probabilistic feature fusion approach. Comput Biol Med 2024;169:107915. [PMID: 38171261 DOI: 10.1016/j.compbiomed.2023.107915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 01/05/2024]

Hu J, Xu J, Li M, Jiang Z, Mao J, Feng L, Miao K, Li H, Chen J, Bai Z, Li X, Lu G, Li Y. Identification and validation of an explainable prediction model of acute kidney injury with prognostic implications in critically ill children: a prospective multicenter cohort study. EClinicalMedicine 2024;68:102409. [PMID: 38273888 PMCID: PMC10809096 DOI: 10.1016/j.eclinm.2023.102409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/19/2023] [Accepted: 12/19/2023] [Indexed: 01/27/2024] Open

Abstract

Background

Acute kidney injury (AKI) is a common and serious organ dysfunction in critically ill children. Early identification and prediction of AKI are of great significance. However, current AKI criteria are insufficiently sensitive and specific, and AKI heterogeneity limits the clinical value of AKI biomarkers. This study aimed to establish and validate an explainable prediction model based on the machine learning (ML) approach for AKI, and assess its prognostic implications in children admitted to the pediatric intensive care unit (PICU).

Methods

This multicenter prospective study in China was conducted on critically ill children for the derivation and validation of the prediction model. The derivation cohort, consisting of 957 children admitted to four independent PICUs from September 2020 to January 2021, was separated for training and internal validation, and an external data set of 866 children admitted from February 2021 to February 2022 was employed for external validation. AKI was defined based on serum creatinine and urine output using the Kidney Disease: Improving Global Outcome (KDIGO) criteria. With 33 medical characteristics easily obtained or evaluated during the first 24 h after PICU admission, 11 ML algorithms were used to construct prediction models. Several evaluation indexes, including the area under the receiver-operating-characteristic curve (AUC), were used to compare the predictive performance. The SHapley Additive exPlanation method was used to rank the feature importance and explain the final model. A probability threshold for the final model was identified for AKI prediction and subgrouping. Clinical outcomes were evaluated in various subgroups determined by a combination of the final model and KDIGO criteria.

Findings

The random forest (RF) model performed best in discriminative ability among the 11 ML models. After reducing features according to feature importance rank, an explainable final RF model was established with 8 features. The final model could accurately predict AKI in both internal (AUC = 0.929) and external (AUC = 0.910) validations, and has been translated into a convenient tool to facilitate its utility in clinical settings. Critically ill children with a probability exceeding or equal to the threshold in the final model had a higher risk of death and multiple organ dysfunctions, regardless of whether they met the KDIGO criteria for AKI.

Interpretation

Our explainable ML model was not only successfully developed to accurately predict AKI but was also highly relevant to adverse outcomes in individual children at an early stage of PICU admission, and it mitigated the concern of the "black-box" issue with an undirect interpretation of the ML technique.

Funding

The National Natural Science Foundation of China, Jiangsu Province Science and Technology Support Program, Key talent of women's and children's health of Jiangsu Province, and Postgraduate Research & Practice Innovation Program of Jiangsu Province.

Collapse

Fung PL, Savadkoohi M, Zaidan MA, Niemi JV, Timonen H, Pandolfi M, Alastuey A, Querol X, Hussein T, Petäjä T. Constructing transferable and interpretable machine learning models for black carbon concentrations. Environ Int 2024;184:108449. [PMID: 38286044 DOI: 10.1016/j.envint.2024.108449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 01/12/2024] [Accepted: 01/17/2024] [Indexed: 01/31/2024]

Abstract

Black carbon (BC) has received increasing attention from researchers due to its adverse health effects. However, in-situ BC measurements are often not included as a regulated variable in air quality monitoring networks. Machine learning (ML) models have been studied extensively to serve as virtual sensors to complement the reference instruments. This study evaluates and compares three white-box (WB) and four black-box (BB) ML models to estimate BC concentrations, with the focus to show their transferability and interpretability. We train the models with the long-term air pollutant and weather measurements in Barcelona urban background site, and test them in other European urban and traffic sites. Despite the difference in geographical locations and measurement sites, BC correlates the strongest with particle number concentration of accumulation mode (PNacc, r = 0.73-0.85) and nitrogen dioxide (NO2, r = 0.68-0.85) and the weakest with meteorological parameters. Due to its similarity of correlation behaviour, the ML models trained in Barcelona performs prominently at the traffic site in Helsinki (R2 = 0.80-0.86; mean absolute error MAE = 3.90-4.73 %) and at the urban background site in Dresden (R2 = 0.79-0.84; MAE = 4.23-4.82 %). WB models appear to explain less variability of BC than BB models, long short-term memory (LSTM) model of which outperforms the rest of the models. In terms of interpretability, we adopt several methods for individual model to quantify and normalize the relative importance of each input feature. The overall static relative importance commonly used for WB models demonstrate varying results from the dynamic values utilized to show local contribution used for BB models. PNacc and NO2 on average have the strongest absolute static contribution; however, they simultaneously impact the estimation positively and negatively at different sites. This comprehensive analysis demonstrates that the possibility of these interpretable air pollutant ML models to be transfered across space and time.

Collapse

Affiliation(s)

Pak Lun Fung Institute for Atmospheric and Earth System Research / Physics, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland; Helsinki Institute of Sustainability Science, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland.
Marjan Savadkoohi Institute of Environmental Assessment and Water Research (IDAEA-CSIC), Barcelona, Spain; Department of Mining, Industrial and ICT Engineering (EMIT), Manresa School of Engineering (EPSEM), Universitat Politècnica de Catalunya (UPC), Manresa 08242, Spain.
Martha Arbayani Zaidan Institute for Atmospheric and Earth System Research / Physics, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland; Helsinki Institute of Sustainability Science, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland; Department of Computer Science, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland.
Jarkko V Niemi Helsinki Region Environmental Services Authority (HSY), Helsinki FI-00066, Finland.
Hilkka Timonen Atmospheric Composition Research, Finnish Meteorological Institute, Helsinki FI-00560, Finland.
Marco Pandolfi Institute of Environmental Assessment and Water Research (IDAEA-CSIC), Barcelona, Spain.
Andrés Alastuey Institute of Environmental Assessment and Water Research (IDAEA-CSIC), Barcelona, Spain.
Xavier Querol Institute of Environmental Assessment and Water Research (IDAEA-CSIC), Barcelona, Spain.
Tareq Hussein Institute for Atmospheric and Earth System Research / Physics, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland; Environmental and Atmospheric Research Laboratory (EARL), Department of Physics, School of Science, Amman 11942, Jordan.
Tuukka Petäjä Institute for Atmospheric and Earth System Research / Physics, Faculty of Science, University of Helsinki, Helsinki FI-00560, Finland.

Collapse

Liu Y, Fu Y, Peng Y, Ming J. Clinical decision support tool for breast cancer recurrence prediction using SHAP value in cooperative game theory. Heliyon 2024;10:e24876. [PMID: 38312672 PMCID: PMC10835316 DOI: 10.1016/j.heliyon.2024.e24876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 01/15/2024] [Accepted: 01/16/2024] [Indexed: 02/06/2024] Open

Abstract

Background

Recurrence remains the primary cause of death in patients with breast cancer. Although machine learning can efficiently predict the prognosis of breast cancer patients, the black-box nature of the model may result in a lack of evidence for clinicians when making critical decisions.

Methods

In this study, our main objective was twofold: (1) to develop a clinical decision support tool for predicting the prognosis of breast cancer and (2) to identify and explore the key factors that influence breast cancer recurrence. To achieve this, we employed an explainable ensemble learning method called Shapley additive explanation (SHAP), which leverages cooperative game theory. Using real-world data from 1629 breast cancer patients, we analyzed and uncovered the key factors associated with breast cancer recurrence. Subsequently, we used these identified factors to create a recurrence prediction model and establish a decision mechanism for the tool. The proposed method not only provides accurate recurrence predictions but also offers transparent explanations for these predictions.

Results

By utilizing four key factors, namely, tumor size, clinical stage III, number of lymph node metastases, and age, our decision support tool for predicting breast cancer recurrence achieved significant improvements. The extra-tree model exhibited an increased area under the receiver operating characteristic curve (AUC) of 0.97, while the Random Forest model demonstrated an improved AUC of 0.96. We also offer a decision mechanism for a recurrence prediction model based on the identified key factors. This transparent and interpretable decision-making process facilitated by our explainable ensemble learning model enhances trust and promotes its applicability in clinical settings.

Conclusions

The proposed explainable ensemble learning method shows promising results in predicting breast cancer recurrence, outperforming existing methods with high accuracy and transparency. This advancement has the potential to significantly improve clinical decision-making and patient outcomes in breast cancer treatment.

Collapse

Tang Y, Zhang Y, Li J. A time series driven model for early sepsis prediction based on transformer module. BMC Med Res Methodol 2024;24:23. [PMID: 38273257 PMCID: PMC10809699 DOI: 10.1186/s12874-023-02138-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 12/27/2023] [Indexed: 01/27/2024] Open

Yasin P, Yimit Y, Abliz D, Mardan M, Xu T, Yusufu A, Cai X, Sheng W, Mamat M. MRI-based interpretable radiomics nomogram for discrimination between Brucella spondylitis and Pyogenic spondylitis. Heliyon 2024;10:e23584. [PMID: 38173524 PMCID: PMC10761805 DOI: 10.1016/j.heliyon.2023.e23584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 01/05/2024] Open

Abstract

Background

Pyogenic spondylitis (PS) and Brucella spondylitis (BS) are commonly seen spinal infectious diseases. Both types can lead to vertebral destruction, kyphosis, and long-term neurological deficits if not promptly diagnosed and treated. Therefore, accurately diagnosis is crucial for personalized therapy. Distinguishing between PS and BS in everyday clinical settings is challenging due to the similarity of their clinical symptoms and imaging features. Hence, this study aims to evaluate the effectiveness of a radiomics nomogram using magnetic resonance imaging (MRI) to accurately differentiate between the two types of spondylitis.

Methods

Clinical and MRI data from 133 patients (2017-2022) with pathologically confirmed PS and BS (68 and 65 patients, respectively) were collected. We have divided patients into training and testing cohorts. In order to develop a clinical diagnostic model, logistic regression was utilized to fit a conventional clinical model (M1). Radiomics features were extracted from sagittal fat-suppressed T2-weighted imaging (FS-T2WI) sequence. The radiomics features were preprocessed, including scaling using Z-score and undergoing univariate analysis to eliminate redundant features. Furthermore, the Least Absolute Shrinkage and Selection Operator (LASSO) was employed to develop a radiomics score (M2). A composite model (M3) was created by combining M1 and M2. Subsequently, calibration and decision curves were generated to evaluate the nomogram's performance in both training and testing groups. The diagnostic performance of each model and the indication was assessed using the receiver operating curve (ROC) with its area under the curve (AUC). Finally, we used the SHapley Additive exPlanations (SHAP) model explanations technique to interpret the model result.

Results

We have finally selected 9 significant features from sagittal FS-T2WI sequences. In the differential diagnosis of PS and BS, the AUC values of M1, M2, and M3 in the testing set were 0.795, 0.859, and 0.868. The composite model exhibited a high degree of concurrence with the ideal outcomes, as evidenced by the calibration curves. The nomogram's possible clinical application values were indicated by the decision curve analysis. By using SHAP values to represent prediction outcomes, our model's prediction results are more understandable.

Conclusions

The implementation of a nomogram that integrates MRI and clinical data has the potential to significantly enhance the accuracy of discriminating between PS and BS within clinical settings.

Collapse

Baek S, Jeong YJ, Kim YH, Kim JY, Kim JH, Kim EY, Lim JK, Kim J, Kim Z, Kim K, Chung MJ. Development and Validation of a Robust and Interpretable Early Triaging Support System for Patients Hospitalized With COVID-19: Predictive Algorithm Modeling and Interpretation Study. J Med Internet Res 2024;26:e52134. [PMID: 38206673 PMCID: PMC10811577 DOI: 10.2196/52134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/03/2023] [Accepted: 12/25/2023] [Indexed: 01/12/2024] Open

Abstract

BACKGROUND

Robust and accurate prediction of severity for patients with COVID-19 is crucial for patient triaging decisions. Many proposed models were prone to either high bias risk or low-to-moderate discrimination. Some also suffered from a lack of clinical interpretability and were developed based on early pandemic period data. Hence, there has been a compelling need for advancements in prediction models for better clinical applicability.

OBJECTIVE

The primary objective of this study was to develop and validate a machine learning-based Robust and Interpretable Early Triaging Support (RIETS) system that predicts severity progression (involving any of the following events: intensive care unit admission, in-hospital death, mechanical ventilation required, or extracorporeal membrane oxygenation required) within 15 days upon hospitalization based on routinely available clinical and laboratory biomarkers.

METHODS

We included data from 5945 hospitalized patients with COVID-19 from 19 hospitals in South Korea collected between January 2020 and August 2022. For model development and external validation, the whole data set was partitioned into 2 independent cohorts by stratified random cluster sampling according to hospital type (general and tertiary care) and geographical location (metropolitan and nonmetropolitan). Machine learning models were trained and internally validated through a cross-validation technique on the development cohort. They were externally validated using a bootstrapped sampling technique on the external validation cohort. The best-performing model was selected primarily based on the area under the receiver operating characteristic curve (AUROC), and its robustness was evaluated using bias risk assessment. For model interpretability, we used Shapley and patient clustering methods.

RESULTS

Our final model, RIETS, was developed based on a deep neural network of 11 clinical and laboratory biomarkers that are readily available within the first day of hospitalization. The features predictive of severity included lactate dehydrogenase, age, absolute lymphocyte count, dyspnea, respiratory rate, diabetes mellitus, c-reactive protein, absolute neutrophil count, platelet count, white blood cell count, and saturation of peripheral oxygen. RIETS demonstrated excellent discrimination (AUROC=0.937; 95% CI 0.935-0.938) with high calibration (integrated calibration index=0.041), satisfied all the criteria of low bias risk in a risk assessment tool, and provided detailed interpretations of model parameters and patient clusters. In addition, RIETS showed potential for transportability across variant periods with its sustainable prediction on Omicron cases (AUROC=0.903, 95% CI 0.897-0.910).

CONCLUSIONS

RIETS was developed and validated to assist early triaging by promptly predicting the severity of hospitalized patients with COVID-19. Its high performance with low bias risk ensures considerably reliable prediction. The use of a nationwide multicenter cohort in the model development and validation implicates generalizability. The use of routinely collected features may enable wide adaptability. Interpretations of model parameters and patients can promote clinical applicability. Together, we anticipate that RIETS will facilitate the patient triaging workflow and efficient resource allocation when incorporated into a routine clinical practice.

Collapse

Affiliation(s)

Sangwon Baek Medical AI Research Center, Samsung Medical Center, Seoul, Republic of Korea Center for Data Science, New York University, New York, NY, United States
Yeon Joo Jeong Department of Radiology, Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan, Republic of Korea
Yun-Hyeon Kim Department of Radiology, Chonnam National University Hospital, Gwangju, Republic of Korea
Jin Young Kim Department of Radiology, Keimyung University Dongsan Hospital, Daegu, Republic of Korea
Jin Hwan Kim Department of Radiology, Chungnam National University Hospital, Daejeon, Republic of Korea
Eun Young Kim Department of Radiology, Gachon University Gil Medical Center, Incheon, Republic of Korea
Jae-Kwang Lim Department of Radiology, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
Jungok Kim Department of Infectious Diseases, Chungnam National University Sejong Hospital, Sejong, Republic of Korea
Zero Kim Medical AI Research Center, Samsung Medical Center, Seoul, Republic of Korea Department of Data Convergence & Future Medicine, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
Kyunga Kim Department of Data Convergence & Future Medicine, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea Biomedical Statistics Center, Research Institute for Future Medicine, Samsung Medical Center, Seoul, Republic of Korea Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, Republic of Korea
Myung Jin Chung Medical AI Research Center, Samsung Medical Center, Seoul, Republic of Korea Department of Data Convergence & Future Medicine, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, Republic of Korea Department of Radiology, Samsung Medical Center, Seoul, Republic of Korea

Collapse

Maeda K, Hirano M, Hayashi T, Iida M, Kurata H, Ishibashi H. Elucidating Key Characteristics of PFAS Binding to Human Peroxisome Proliferator-Activated Receptor Alpha: An Explainable Machine Learning Approach. Environ Sci Technol 2024;58:488-497. [PMID: 38134352 DOI: 10.1021/acs.est.3c06561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]

Wei T, Zhu T, Lin M, Liu H. Predicting and factor analysis of rider injury severity in two-wheeled motorcycle and vehicle crash accidents based on an interpretable machine learning framework. Traffic Inj Prev 2024;25:194-201. [PMID: 38019553 DOI: 10.1080/15389588.2023.2284111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 11/13/2023] [Indexed: 11/30/2023]

Abstract

OBJECTIVE

As one of the vulnerable road users in accidents, how to improve the two-wheeled motorcyclist's driving safety and reduce accident injury is a public health issue. Accurate identification of the factors influencing the severity of accidents is an important prerequisite for mitigating injury from crashes.

METHODS

Based on a vehicle and a two-wheeled motorcycle crash accident data from the China in-depth accident study database (CIDAS), this study uses the performance evaluation indicators of accuracy, precision, recall, F1-score, AUC, and the ROC curve. The classification and prediction performances of the six machine learning methods on the dataset are compared, and the LightGBM algorithm with the best performance is selected to model the accident injury severity of the motorcyclists. The SHAP method is used to extend the interpretability of the LightGBM model results. Based on the SHAP method, the importance, main effect, and the interaction effect of factors under each accident injury severity are quantitatively analyzed.

RESULTS

The model prediction accuracy is 92.6%, the F1-Score is 92.8%, and the AUC value is 0.986. The importance of factors varies with the accident injury severity of motorcyclists. The kilometers traveled per year by the driver, the throwing distance of the motorcyclist, and the road speed limit are the three most important factors. The motorcyclist is more likely to suffer fatal injuries when the throwing distance is >1,000 cm.

CONCLUSIONS

The prediction model of driver injury severity based on LightGBM algorithm has a good prediction performance. It can be used to analyze the influence factors of injury severity in two-wheeled motorcyclist accident by combining the model with SHAP method. These results could help the traffic management department to take measures to reduce accident injury of motorcyclists.

Collapse

Zhu J, Huang Y, Yi Q, Bu L, Zhou S, Shi Z. Predicting reactivity dynamics of halogen species and trace organic contaminants using machine learning models. Chemosphere 2024;346:140659. [PMID: 37949193 DOI: 10.1016/j.chemosphere.2023.140659] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 11/04/2023] [Accepted: 11/06/2023] [Indexed: 11/12/2023]

Wani NA, Kumar R, Bedi J. DeepXplainer: An interpretable deep learning based approach for lung cancer detection using explainable artificial intelligence. Comput Methods Programs Biomed 2024;243:107879. [PMID: 37897989 DOI: 10.1016/j.cmpb.2023.107879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 10/17/2023] [Accepted: 10/20/2023] [Indexed: 10/30/2023]

Abstract

BACKGROUND AND OBJECTIVE

Artificial intelligence (AI) has several uses in the healthcare industry, some of which include healthcare management, medical forecasting, practical making of decisions, and diagnosis. AI technologies have reached human-like performance, but their use is limited since they are still largely viewed as opaque black boxes. This distrust remains the primary factor for their limited real application, particularly in healthcare. As a result, there is a need for interpretable predictors that provide better predictions and also explain their predictions.

METHODS

This study introduces "DeepXplainer", a new interpretable hybrid deep learning-based technique for detecting lung cancer and providing explanations of the predictions. This technique is based on a convolutional neural network and XGBoost. XGBoost is used for class label prediction after "DeepXplainer" has automatically learned the features of the input using its many convolutional layers. For providing explanations or explainability of the predictions, an explainable artificial intelligence method known as "SHAP" is implemented.

RESULTS

The open-source "Survey Lung Cancer" dataset was processed using this method. On multiple parameters, including accuracy, sensitivity, F1-score, etc., the proposed method outperformed the existing methods. The proposed method obtained an accuracy of 97.43%, a sensitivity of 98.71%, and an F1-score of 98.08. After the model has made predictions with this high degree of accuracy, each prediction is explained by implementing an explainable artificial intelligence method at both the local and global levels.

CONCLUSIONS

A deep learning-based classification model for lung cancer is proposed with three primary components: one for feature learning, another for classification, and a third for providing explanations for the predictions made by the proposed hybrid (ConvXGB) model. The proposed "DeepXplainer" has been evaluated using a variety of metrics, and the results demonstrate that it outperforms the current benchmarks. Providing explanations for the predictions, the proposed approach may help doctors in detecting and treating lung cancer patients more effectively.

Collapse

Chen C, Zhang W, Yan G, Tang C. Identifying metabolic dysfunction-associated steatotic liver disease in patients with hypertension and pre-hypertension: An interpretable machine learning approach. Digit Health 2024;10:20552076241233135. [PMID: 38389508 PMCID: PMC10883118 DOI: 10.1177/20552076241233135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 01/30/2024] [Indexed: 02/24/2024] Open

Abstract

Objective

Metabolic dysfunction-associated steatotic liver disease (MASLD) is one of the most prevalent liver diseases and is associated with pre-hypertension and hypertension. Our research aims to develop interpretable machine learning (ML) models to accurately identify MASLD in hypertensive and pre-hypertensive populations.

Methods

The dataset for 4722 hypertensive and pre-hypertensive patients is from subjects in the NAGALA study. Six ML models, including the decision tree, K-nearest neighbor, gradient boosting, naive Bayes, support vector machine, and random forest (RF) models, were used in this study. The optimal model was constructed according to the performances of models evaluated by K-fold cross-validation (k = 5), the area under the receiver operating characteristic curve (AUC), average precision (AP), accuracy, sensitivity, specificity, and F1. Shapley additive explanation (SHAP) values were employed for both global and local interpretation of the model results.

Results

The prevalence of MASLD in hypertensive and pre-hypertensive patients was 44.3% (362 cases) and 28.3% (1107 cases), respectively. The RF model outperformed the other five models with an AUC of 0.889, AP of 0.800, accuracy of 0.819, sensitivity of 0.816, specificity of 0.821, and F1 of 0.729. According to the SHAP analysis, the top five important features were alanine aminotransferase, body mass index, waist circumference, high-density lipoprotein cholesterol, and total cholesterol. Further analysis of the feature selection in the RF model revealed that incorporating all features leads to optimal model performance.

Conclusions

ML algorithms, especially RF algorithm, improve the accuracy of MASLD identification, and the global and local interpretation of the RF model results enables us to intuitively understand how various features affect the chances of MASLD in patients with hypertension and pre-hypertension.

Collapse

Gozdzialski L, Hutchison A, Wallace B, Gill C, Hore D. Toward automated infrared spectral analysis in community drug checking. Drug Test Anal 2024;16:83-92. [PMID: 37248686 DOI: 10.1002/dta.3520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/13/2023] [Accepted: 05/16/2023] [Indexed: 05/31/2023]

Susnjak T. Applying BERT and ChatGPT for Sentiment Analysis of Lyme Disease in Scientific Literature. Methods Mol Biol 2024;2742:173-183. [PMID: 38165624 DOI: 10.1007/978-1-0716-3561-2_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]

Sharma K, Saini N, Hasija Y. Identifying the mitochondrial metabolism network by integration of machine learning and explainable artificial intelligence in skeletal muscle in type 2 diabetes. Mitochondrion 2024;74:101821. [PMID: 38040172 DOI: 10.1016/j.mito.2023.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/04/2023] [Accepted: 11/26/2023] [Indexed: 12/03/2023]

Liu R, Ma Z, Gasparrini A, de la Cruz A, Bi J, Chen K. Integrating Augmented In Situ Measurements and a Spatiotemporal Machine Learning Model To Back Extrapolate Historical Particulate Matter Pollution over the United Kingdom: 1980-2019. Environ Sci Technol 2023;57:21605-21615. [PMID: 38085698 DOI: 10.1021/acs.est.3c05424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]

Niu X, Lu C, Zhang Y, Zhang Y, Wu C, Saidy E, Liu B, Shu L. Hysteresis response of groundwater depth on the influencing factors using an explainable learning model framework with Shapley values. Sci Total Environ 2023;904:166662. [PMID: 37657541 DOI: 10.1016/j.scitotenv.2023.166662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 08/26/2023] [Accepted: 08/26/2023] [Indexed: 09/03/2023]

Boitor O, Stoica F, Mihăilă R, Stoica LF, Stef L. Automated Machine Learning to Develop Predictive Models of Metabolic Syndrome in Patients with Periodontal Disease. Diagnostics (Basel) 2023;13:3631. [PMID: 38132215 PMCID: PMC10743072 DOI: 10.3390/diagnostics13243631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/04/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023] Open

Abstract

Metabolic syndrome is experiencing a concerning and escalating rise in prevalence today. The link between metabolic syndrome and periodontal disease is a highly relevant area of research. Some studies have suggested a bidirectional relationship between metabolic syndrome and periodontal disease, where one condition may exacerbate the other. Furthermore, the existence of periodontal disease among these individuals significantly impacts overall health management. This research focuses on the relationship between periodontal disease and metabolic syndrome, while also incorporating data on general health status and overall well-being. We aimed to develop advanced machine learning models that efficiently identify key predictors of metabolic syndrome, a significant emphasis being placed on thoroughly explaining the predictions generated by the models. We studied a group of 296 patients, hospitalized in SCJU Sibiu, aged between 45-79 years, of which 57% had metabolic syndrome. The patients underwent dental consultations and subsequently responded to a dedicated questionnaire, along with a standard EuroQol 5-Dimensions 5-Levels (EQ-5D-5L) questionnaire. The following data were recorded: DMFT (Decayed, Missing due to caries, and Filled Teeth), CPI (Community Periodontal Index), periodontal pockets depth, loss of epithelial insertion, bleeding after probing, frequency of tooth brushing, regular dental control, cardiovascular risk, carotid atherosclerosis, and EQ-5D-5L score. We used Automated Machine Learning (AutoML) frameworks to build predictive models in order to determine which of these risk factors exhibits the most robust association with metabolic syndrome. To gain confidence in the results provided by the machine learning models provided by the AutoML pipelines, we used SHapley Additive exPlanations (SHAP) values for the interpretability of these models, from a global and local perspective. The obtained results confirm that the severity of periodontal disease, high cardiovascular risk, and low EQ-5D-5L score have the greatest impact in the occurrence of metabolic syndrome.

Collapse

Dutta P, Jain D, Gupta R, Rai B. Classification of tastants: A deep learning based approach. Mol Inform 2023;42:e202300146. [PMID: 37885360 DOI: 10.1002/minf.202300146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 09/26/2023] [Accepted: 10/26/2023] [Indexed: 10/28/2023]

Huang AA, Huang SY. Shapely additive values can effectively visualize pertinent covariates in machine learning when predicting hypertension. J Clin Hypertens (Greenwich) 2023;25:1135-1144. [PMID: 37971610 PMCID: PMC10710553 DOI: 10.1111/jch.14745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 10/16/2023] [Accepted: 10/17/2023] [Indexed: 11/19/2023]

Abstract

Machine learning methods are widely used within the medical field to enhance prediction. However, little is known about the reliability and efficacy of these models to predict long-term medical outcomes such as blood pressure using lifestyle factors, such as diet. The authors assessed whether machine-learning techniques could accurately predict hypertension risk using nutritional information. A cross-sectional study using data from the National Health and Nutrition Examination Survey (NHANES) between January 2017 and March 2020. XGBoost was used as the machine-learning model of choice in this study due to its increased performance relative to other common methods within medical studies. Model prediction metrics (e.g., AUROC, Balanced Accuracy) were used to measure overall model efficacy, covariate Gain statistics (percentage each covariate contributes to the overall prediction) and SHapely Additive exPlanations (SHAP, method to visualize each covariate) were used to provide explanations to machine-learning output and increase the transparency of this otherwise cryptic method. Of a total of 9650 eligible patients, the mean age was 41.02 (SD = 22.16), 4792 (50%) males, 4858 (50%) female, 3407 (35%) White patients, 2567 (27%) Black patients, 2108 (22%) Hispanic patients, and 981 (10%) Asian patients. From evaluation of model gain statistics, age was found to be the single strongest predictor of hypertension, with a gain of 53.1%. Additionally, demographic factors such as poverty and Black race were also strong predictors of hypertension, with gain of 4.33% and 4.18%, respectively. Nutritional Covariates contributed 37% to the overall prediction: Sodium, Caffeine, Potassium, and Alcohol intake being significantly represented within the model. Machine Learning can be used to predict hypertension.

Collapse

Sun Y, Zhao Z, Tong H, Sun B, Liu Y, Ren N, You S. Machine Learning Models for Inverse Design of the Electrochemical Oxidation Process for Water Purification. Environ Sci Technol 2023;57:17990-18000. [PMID: 37189261 DOI: 10.1021/acs.est.2c08771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]

Li T, Zhang Q, Wang X, Peng Y, Guan X, Mu J, Li L, Chen J, Wang H, Wang Q. Characteristics of secondary inorganic aerosols and contributions to PM_2.5 pollution based on machine learning approach in Shandong Province. Environ Pollut 2023;337:122612. [PMID: 37757930 DOI: 10.1016/j.envpol.2023.122612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/21/2023] [Accepted: 09/22/2023] [Indexed: 09/29/2023]

Ji W, Wang C, Chen H, Liang Y, Wang S. Predicting post-stroke cognitive impairment using machine learning: A prospective cohort study. J Stroke Cerebrovasc Dis 2023;32:107354. [PMID: 37716104 DOI: 10.1016/j.jstrokecerebrovasdis.2023.107354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/27/2023] [Accepted: 09/11/2023] [Indexed: 09/18/2023] Open

Jeong S, Yun SB, Park SY, Mun S. Understanding cross-data dynamics of individual and social/environmental factors through a public health lens: explainable machine learning approaches. Front Public Health 2023;11:1257861. [PMID: 37954048 PMCID: PMC10639162 DOI: 10.3389/fpubh.2023.1257861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 10/09/2023] [Indexed: 11/14/2023] Open

Abstract

Introduction

The rising prevalence of obesity has become a public health concern, requiring efficient and comprehensive prevention strategies.

Methods

This study innovatively investigated the combined influence of individual and social/environmental factors on obesity within the urban landscape of Seoul, by employing advanced machine learning approaches. We collected 'Community Health Surveys' and credit card usage data to represent individual factors. In parallel, we utilized 'Seoul Open Data' to encapsulate social/environmental factors contributing to obesity. A Random Forest model was used to predict obesity based on individual factors. The model was further subjected to Shapley Additive Explanations (SHAP) algorithms to determine each factor's relative importance in obesity prediction. For social/environmental factors, we used the Geographically Weighted Least Absolute Shrinkage and Selection Operator (GWLASSO) to calculate the regression coefficients.

Results

The Random Forest model predicted obesity with an accuracy of >90%. The SHAP revealed diverse influential individual obesity-related factors in each Gu district, although 'self-awareness of obesity', 'weight control experience', and 'high blood pressure experience' were among the top five influential factors across all Gu districts. The GWLASSO indicated variations in regression coefficients between social/environmental factors across different districts.

Conclusion

Our findings provide valuable insights for designing targeted obesity prevention programs that integrate different individual and social/environmental factors within the context of urban design, even within the same city. This study enhances the efficient development and application of explainable machine learning in devising urban health strategies. We recommend that each autonomous district consider these differential influential factors in designing their budget plans to tackle obesity effectively.

Collapse

Yagin B, Yagin FH, Colak C, Inceoglu F, Kadry S, Kim J. Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research. Diagnostics (Basel) 2023;13:3314. [PMID: 37958210 PMCID: PMC10650093 DOI: 10.3390/diagnostics13213314] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 10/17/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open

Abstract

AIM

Method: This research presents a model combining machine learning (ML) techniques and eXplainable artificial intelligence (XAI) to predict breast cancer (BC) metastasis and reveal important genomic biomarkers in metastasis patients.

METHOD

A total of 98 primary BC samples was analyzed, comprising 34 samples from patients who developed distant metastases within a 5-year follow-up period and 44 samples from patients who remained disease-free for at least 5 years after diagnosis. Genomic data were then subjected to biostatistical analysis, followed by the application of the elastic net feature selection method. This technique identified a restricted number of genomic biomarkers associated with BC metastasis. A light gradient boosting machine (LightGBM), categorical boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Gradient Boosting Trees (GBT), and Ada boosting (AdaBoost) algorithms were utilized for prediction. To assess the models' predictive abilities, the accuracy, F1 score, precision, recall, area under the ROC curve (AUC), and Brier score were calculated as performance evaluation metrics. To promote interpretability and overcome the "black box" problem of ML models, a SHapley Additive exPlanations (SHAP) method was employed.

RESULTS

The LightGBM model outperformed other models, yielding remarkable accuracy of 96% and an AUC of 99.3%. In addition to biostatistical evaluation, in XAI-based SHAP results, increased expression levels of TSPYL5, ATP5E, CA9, NUP210, SLC37A1, ARIH1, PSMD7, UBQLN1, PRAME, and UBE2T (p ≤ 0.05) were found to be associated with an increased incidence of BC metastasis. Finally, decreased levels of expression of CACTIN, TGFB3, SCUBE2, ARL4D, OR1F1, ALDH4A1, PHF1, and CROCC (p ≤ 0.05) genes were also determined to increase the risk of metastasis in BC.

CONCLUSION

The findings of this study may prevent disease progression and metastases and potentially improve clinical outcomes by recommending customized treatment approaches for BC patients.

Collapse

Yang X, Qiu H, Wang L, Wang X. Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study. J Med Internet Res 2023;25:e44417. [PMID: 37883174 PMCID: PMC10636616 DOI: 10.2196/44417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 03/22/2023] [Accepted: 09/29/2023] [Indexed: 10/27/2023] Open

Abstract

BACKGROUND

Machine learning (ML) methods have shown great potential in predicting colorectal cancer (CRC) survival. However, the ML models introduced thus far have mainly focused on binary outcomes and have not considered the time-to-event nature of this type of modeling.

OBJECTIVE

This study aims to evaluate the performance of ML approaches for modeling time-to-event survival data and develop transparent models for predicting CRC-specific survival.

METHODS

The data set used in this retrospective cohort study contains information on patients who were newly diagnosed with CRC between December 28, 2012, and December 27, 2019, at West China Hospital, Sichuan University. We assessed the performance of 6 representative ML models, including random survival forest (RSF), gradient boosting machine (GBM), DeepSurv, DeepHit, neural net-extended time-dependent Cox (or Cox-Time), and neural multitask logistic regression (N-MTLR) in predicting CRC-specific survival. Multiple imputation by chained equations method was applied to handle missing values in variables. Multivariable analysis and clinical experience were used to select significant features associated with CRC survival. Model performance was evaluated in stratified 5-fold cross-validation repeated 5 times by using the time-dependent concordance index, integrated Brier score, calibration curves, and decision curves. The SHapley Additive exPlanations method was applied to calculate feature importance.

RESULTS

A total of 2157 patients with CRC were included in this study. Among the 6 time-to-event ML models, the DeepHit model exhibited the best discriminative ability (time-dependent concordance index 0.789, 95% CI 0.779-0.799) and the RSF model produced better-calibrated survival estimates (integrated Brier score 0.096, 95% CI 0.094-0.099), but these are not statistically significant. Additionally, the RSF, GBM, DeepSurv, Cox-Time, and N-MTLR models have comparable predictive accuracy to the Cox Proportional Hazards model in terms of discrimination and calibration. The calibration curves showed that all the ML models exhibited good 5-year survival calibration. The decision curves for CRC-specific survival at 5 years showed that all the ML models, especially RSF, had higher net benefits than default strategies of treating all or no patients at a range of clinically reasonable risk thresholds. The SHapley Additive exPlanations method revealed that R0 resection, tumor-node-metastasis staging, and the number of positive lymph nodes were important factors for 5-year CRC-specific survival.

CONCLUSIONS

This study showed the potential of applying time-to-event ML predictive algorithms to help predict CRC-specific survival. The RSF, GBM, Cox-Time, and N-MTLR algorithms could provide nonparametric alternatives to the Cox Proportional Hazards model in estimating the survival probability of patients with CRC. The transparent time-to-event ML models help clinicians to more accurately predict the survival rate for these patients and improve patient outcomes by enabling personalized treatment plans that are informed by explainable ML models.

Collapse

Zaki FR, Monroy GL, Shi J, Sudhir K, Boppart SA. Texture-based speciation of otitis media-related bacterial biofilms from optical coherence tomography images using supervised classification. Res Sq 2023:rs.3.rs-3466690. [PMID: 37961282 PMCID: PMC10635317 DOI: 10.21203/rs.3.rs-3466690/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Tayyebi A, Alshami AS, Rabiei Z, Yu X, Ismail N, Talukder MJ, Power J. Prediction of organic compound aqueous solubility using machine learning: a comparison study of descriptor-based and fingerprints-based models. J Cheminform 2023;15:99. [PMID: 37853492 PMCID: PMC10583449 DOI: 10.1186/s13321-023-00752-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 08/25/2023] [Indexed: 10/20/2023] Open

Liu L, Na R, Yang L, Liu J, Tan Y, Zhao X, Huang X, Chen X. A Workflow Combining Machine Learning with Molecular Simulations Uncovers Potential Dual-Target Inhibitors against BTK and JAK3. Molecules 2023;28:7140. [PMID: 37894618 PMCID: PMC10608827 DOI: 10.3390/molecules28207140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/08/2023] [Accepted: 10/15/2023] [Indexed: 10/29/2023] Open

Minosse S, Picchi E, Conti A, di Giuliano F, di Ciò F, Sarmati L, Teti E, de Santis S, Andreoni M, Floris R, Guerrisi M, Garaci F, Toschi N. Multishell diffusion MRI reveals whole-brain white matter changes in HIV. Hum Brain Mapp 2023;44:5113-5124. [PMID: 37647214 PMCID: PMC10502617 DOI: 10.1002/hbm.26448] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 07/15/2023] [Accepted: 07/26/2023] [Indexed: 09/01/2023] Open

Abstract

Diffusion tensor imaging (DTI) and diffusion kurtosis imaging (DKI) have been previously used to explore white matter related to human immunodeficiency virus (HIV) infection. While DTI and DKI suffer from low specificity, the Combined Hindered and Restricted Model of Diffusion (CHARMED) provides additional microstructural specificity. We used these three models to evaluate microstructural differences between 35 HIV-positive patients without neurological impairment and 20 healthy controls who underwent diffusion-weighted imaging using three b-values. While significant group effects were found in all diffusion metrics, CHARMED and DKI analyses uncovered wider involvement (80% vs. 20%) of all white matter tracts in HIV infection compared with DTI. In restricted fraction (FR) analysis, we found significant differences in the left corticospinal tract, middle cerebellar peduncle, right inferior cerebellar peduncle, right corticospinal tract, splenium of the corpus callosum, left superior cerebellar peduncle, left superior cerebellar peduncle, pontine crossing tract, left posterior limb of the internal capsule, and left/right medial lemniscus. These are involved in language, motor, equilibrium, behavior, and proprioception, supporting the functional integration that is frequently impaired in HIV-positivity. Additionally, we employed a machine learning algorithm (XGBoost) to discriminate HIV-positive patients from healthy controls using DTI and CHARMED metrics on an ROIwise basis, and unique contributions to this discrimination were examined using Shapley Explanation values. The CHARMED and DKI estimates produced the best performance. Our results suggest that biophysical multishell imaging, combining additional sensitivity and built-in specificity, provides further information about the brain microstructural changes in multimodal areas involved in attentive, emotional and memory networks often impaired in HIV patients.

Collapse