151
|
Abdollahi A, Pradhan B. Urban Vegetation Mapping from Aerial Imagery Using Explainable AI (XAI). Sensors (Basel) 2021; 21:4738. [PMID: 34300478 PMCID: PMC8309506 DOI: 10.3390/s21144738] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 07/02/2021] [Accepted: 07/09/2021] [Indexed: 11/17/2022]
Abstract
Urban vegetation mapping is critical in many applications, i.e., preserving biodiversity, maintaining ecological balance, and minimizing the urban heat island effect. It is still challenging to extract accurate vegetation covers from aerial imagery using traditional classification approaches, because urban vegetation categories have complex spatial structures and similar spectral properties. Deep neural networks (DNNs) have shown a significant improvement in remote sensing image classification outcomes during the last few years. These methods are promising in this domain, yet unreliable for various reasons, such as the use of irrelevant descriptor features in the building of the models and lack of quality in the labeled image. Explainable AI (XAI) can help us gain insight into these limits and, as a result, adjust the training dataset and model as needed. Thus, in this work, we explain how an explanation model called Shapley additive explanations (SHAP) can be utilized for interpreting the output of the DNN model that is designed for classifying vegetation covers. We want to not only produce high-quality vegetation maps, but also rank the input parameters and select appropriate features for classification. Therefore, we test our method on vegetation mapping from aerial imagery based on spectral and textural features. Texture features can help overcome the limitations of poor spectral resolution in aerial imagery for vegetation mapping. The model was capable of obtaining an overall accuracy (OA) of 94.44% for vegetation cover mapping. The conclusions derived from SHAP plots demonstrate the high contribution of features, such as Hue, Brightness, GLCM_Dissimilarity, GLCM_Homogeneity, and GLCM_Mean to the output of the proposed model for vegetation mapping. Therefore, the study indicates that existing vegetation mapping strategies based only on spectral characteristics are insufficient to appropriately classify vegetation covers.
Collapse
Affiliation(s)
- Abolfazl Abdollahi
- Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW 2007, Australia;
| | - Biswajeet Pradhan
- Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW 2007, Australia;
- Earth Observation Center, Institute of Climate Change, University Kebangsaan Malaysia, Bangi 43600 UKM, Selangor, Malaysia
| |
Collapse
|
152
|
Lama L, Wilhelmsson O, Norlander E, Gustafsson L, Lager A, Tynelius P, Wärvik L, Östenson CG. Machine learning for prediction of diabetes risk in middle-aged Swedish people. Heliyon 2021; 7:e07419. [PMID: 34296003 DOI: 10.1016/j.heliyon.2021.e07419] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 02/07/2021] [Accepted: 06/23/2021] [Indexed: 11/23/2022] Open
Abstract
Aims To study if machine learning methodology can be used to detect persons with increased type 2 diabetes or prediabetes risk among people without known abnormal glucose regulation. Methods Machine learning and interpretable machine learning models were applied on research data from Stockholm Diabetes Preventive Program, including more than 8000 people initially with normal glucose tolerance or prediabetes to determine high and low risk features for further impairment in glucose tolerance at follow-up 10 and 20 years later. Results The features with the highest importance on the outcome were body mass index, waist-hip ratio, age, systolic and diastolic blood pressure, and diabetes heredity. High values of these features as well as diabetes heredity conferred increased risk of type 2 diabetes. . The machine learning model was used to generate individual, comprehensible risk profiles, where the diabetes risk was obtained for each person in the data set. Features with the largest increasing or decreasing effects on the risk were determined. Conclusions The primary application of this machine learning model is to predict individual type 2 diabetes risk in people without diagnosed diabetes, and to which features the risk relates. However, since most features affecting diabetes risk also play a role for metabolic control in diabetes, e.g. body mass index, diet composition, tobacco use, and stress, the tool can possibly also be used in diabetes care to develop more individualized, easily accessible health care plans to be utilized when encountering the patients.
Collapse
|
153
|
Hasan MJ, Sohaib M, Kim JM. An Explainable AI-Based Fault Diagnosis Model for Bearings. Sensors (Basel) 2021; 21:4070. [PMID: 34199163 PMCID: PMC8231543 DOI: 10.3390/s21124070] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 06/08/2021] [Accepted: 06/11/2021] [Indexed: 11/28/2022]
Abstract
In this paper, an explainable AI-based fault diagnosis model for bearings is proposed with five stages, i.e., (1) a data preprocessing method based on the Stockwell Transformation Coefficient (STC) is proposed to analyze the vibration signals for variable speed and load conditions, (2) a statistical feature extraction method is introduced to capture the significance from the invariant pattern of the analyzed data by STC, (3) an explainable feature selection process is proposed by introducing a wrapper-based feature selector-Boruta, (4) a feature filtration method is considered on the top of the feature selector to avoid the multicollinearity problem, and finally, (5) an additive Shapley explanation followed by k-NN is proposed to diagnose and to explain the individual decision of the k-NN classifier for debugging the performance of the diagnosis model. Thus, the idea of explainability is introduced for the first time in the field of bearing fault diagnosis in two steps: (a) incorporating explainability to the feature selection process, and (b) interpretation of the classifier performance with respect to the selected features. The effectiveness of the proposed model is demonstrated on two different datasets obtained from separate bearing testbeds. Lastly, an assessment of several state-of-the-art fault diagnosis algorithms in rotating machinery is included.
Collapse
Affiliation(s)
- Md Junayed Hasan
- Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 44610, Korea;
| | - Muhammad Sohaib
- Department of Computer Science, Lahore Garrison University, Lahore 54000, Pakistan;
| | - Jong-Myon Kim
- Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 44610, Korea;
| |
Collapse
|
154
|
Wang K, Tian J, Zheng C, Yang H, Ren J, Li C, Han Q, Zhang Y. Improving Risk Identification of Adverse Outcomes in Chronic Heart Failure Using SMOTE+ENN and Machine Learning. Risk Manag Healthc Policy 2021; 14:2453-2463. [PMID: 34149290 PMCID: PMC8206455 DOI: 10.2147/rmhp.s310295] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 05/24/2021] [Indexed: 01/14/2023] Open
Abstract
PURPOSE This study sought to develop models with good identification for adverse outcomes in patients with heart failure (HF) and find strong factors that affect prognosis. PATIENTS AND METHODS A total of 5004 qualifying cases were selected, among which 498 cases had adverse outcomes and 4506 cases were discharged after improvement. The study subjects were hospitalized patients diagnosed with HF from a regional cardiovascular hospital and the cardiology department of a medical university hospital in Shanxi Province of China between January 2014 and June 2019. Synthesizing minority oversampling technology combined with edited nearest neighbors (SMOTE+ENN) was used to pre-process unbalanced data. Traditional logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost) were used to build risk identification models, and each model was repeated 100 times. Model discrimination and calibration were estimated using F1-score, the area under the receiver-operating characteristic curve (AUROC), and Brier score. The best performing of the five models was used to identify the risk of adverse outcomes and evaluate the influencing factors. RESULTS The SME-XGBoost was the best performing model with means of F1-score (0.3673, 95% confidence interval [CI]: 0.3633-0.3712), AUC (0.8010, CI: 0.7974-0.8046), and Brier score (0.1769, CI: 0.1748-0.1789). Age, N-terminal pronatriuretic peptide, pulmonary disease, etc. were the most significant factors of adverse outcomes in patients with HF. CONCLUSION The combination of SMOTE+ENN and advanced machine learning methods effectively improved the discrimination efficacy of adverse outcomes in HF patients, accurately stratified patients at risk of adverse outcomes, and found the top factors of adverse outcomes. These models and factors emphasize the importance of health status data in determining adverse outcomes in patients with HF.
Collapse
Affiliation(s)
- Ke Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Jing Tian
- Department of Cardiology, The First Affiliated Hospital of Shanxi Medical University, Taiyuan, People’s Republic of China
| | - Chu Zheng
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Hong Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Jia Ren
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
| | - Chenhao Li
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Qinghua Han
- Department of Cardiology, The First Affiliated Hospital of Shanxi Medical University, Taiyuan, People’s Republic of China
| | - Yanbo Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| |
Collapse
|
155
|
Johnsen PV, Riemer-Sørensen S, DeWan AT, Cahill ME, Langaas M. A new method for exploring gene-gene and gene-environment interactions in GWAS with tree ensemble methods and SHAP values. BMC Bioinformatics 2021; 22:230. [PMID: 33947323 PMCID: PMC8097909 DOI: 10.1186/s12859-021-04041-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 02/22/2021] [Indexed: 01/08/2023] Open
Abstract
Background The identification of gene–gene and gene–environment interactions in genome-wide association studies is challenging due to the unknown nature of the interactions and the overwhelmingly large number of possible combinations. Parametric regression models are suitable to look for prespecified interactions. Nonparametric models such as tree ensemble models, with the ability to detect any unspecified interaction, have previously been difficult to interpret. However, with the development of methods for model explainability, it is now possible to interpret tree ensemble models efficiently and with a strong theoretical basis. Results We propose a tree ensemble- and SHAP-based method for identifying as well as interpreting potential gene–gene and gene–environment interactions on large-scale biobank data. A set of independent cross-validation runs are used to implicitly investigate the whole genome. We apply and evaluate the method using data from the UK Biobank with obesity as the phenotype. The results are in line with previous research on obesity as we identify top SNPs previously associated with obesity. We further demonstrate how to interpret and visualize interaction candidates. Conclusions The new method identifies interaction candidates otherwise not detected with parametric regression models. However, further research is needed to evaluate the uncertainties of these candidates. The method can be applied to large-scale biobanks with high-dimensional data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04041-7.
Collapse
Affiliation(s)
- Pål V Johnsen
- SINTEF DIGITAL, Forskningsveien 1, 0373, Oslo, Norway. .,Department of Mathematical Sciences, Norwegian University of Science and Technology, A. Getz vei 1, 7491, Trondheim, Norway.
| | | | - Andrew Thomas DeWan
- Department of Chronic Disease Epidemiology and Center for Perinatal, Pediatric and Environmental Epidemiology, Yale School of Public Health, 1 Church Street, New Haven, CT, 06510, USA.,Gemini Center for Sepsis Research, Department of Circulation and Medical Imaging, NTNU, Norwegian University of Science and Technology, Prinsesse Kristinas gate 3, 7030, Trondheim, Norway
| | - Megan E Cahill
- Department of Chronic Disease Epidemiology and Center for Perinatal, Pediatric and Environmental Epidemiology, Yale School of Public Health, 1 Church Street, New Haven, CT, 06510, USA
| | - Mette Langaas
- Department of Mathematical Sciences, Norwegian University of Science and Technology, A. Getz vei 1, 7491, Trondheim, Norway
| |
Collapse
|
156
|
Amparore E, Perotti A, Bajardi P. To trust or not to trust an explanation: using LEAF to evaluate local linear XAI methods. PeerJ Comput Sci 2021; 7:e479. [PMID: 33977131 PMCID: PMC8056245 DOI: 10.7717/peerj-cs.479] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 03/16/2021] [Indexed: 06/12/2023]
Abstract
The main objective of eXplainable Artificial Intelligence (XAI) is to provide effective explanations for black-box classifiers. The existing literature lists many desirable properties for explanations to be useful, but there is a scarce consensus on how to quantitatively evaluate explanations in practice. Moreover, explanations are typically used only to inspect black-box models, and the proactive use of explanations as a decision support is generally overlooked. Among the many approaches to XAI, a widely adopted paradigm is Local Linear Explanations-with LIME and SHAP emerging as state-of-the-art methods. We show that these methods are plagued by many defects including unstable explanations, divergence of actual implementations from the promised theoretical properties, and explanations for the wrong label. This highlights the need to have standard and unbiased evaluation procedures for Local Linear Explanations in the XAI field. In this paper we address the problem of identifying a clear and unambiguous set of metrics for the evaluation of Local Linear Explanations. This set includes both existing and novel metrics defined specifically for this class of explanations. All metrics have been included in an open Python framework, named LEAF. The purpose of LEAF is to provide a reference for end users to evaluate explanations in a standardised and unbiased way, and to guide researchers towards developing improved explainable techniques.
Collapse
Affiliation(s)
- Elvio Amparore
- Department of Computer Science, University of Turin, Turin, Italy
- ISI Foundation, Turin, Italy
| | | | | |
Collapse
|
157
|
Ayoub J, Yang XJ, Zhou F. Combat COVID-19 infodemic using explainable natural language processing models. Inf Process Manag 2021; 58:102569. [PMID: 33776192 PMCID: PMC7980090 DOI: 10.1016/j.ipm.2021.102569] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 02/25/2021] [Accepted: 02/28/2021] [Indexed: 01/14/2023]
Abstract
Misinformation of COVID-19 is prevalent on social media as the pandemic unfolds, and the associated risks are extremely high. Thus, it is critical to detect and combat such misinformation. Recently, deep learning models using natural language processing techniques, such as BERT (Bidirectional Encoder Representations from Transformers), have achieved great successes in detecting misinformation. In this paper, we proposed an explainable natural language processing model based on DistilBERT and SHAP (Shapley Additive exPlanations) to combat misinformation about COVID-19 due to their efficiency and effectiveness. First, we collected a dataset of 984 claims about COVID-19 with fact-checking. By augmenting the data using back-translation, we doubled the sample size of the dataset and the DistilBERT model was able to obtain good performance (accuracy: 0.972; areas under the curve: 0.993) in detecting misinformation about COVID-19. Our model was also tested on a larger dataset for AAAI2021 — COVID-19 Fake News Detection Shared Task and obtained good performance (accuracy: 0.938; areas under the curve: 0.985). The performance on both datasets was better than traditional machine learning models. Second, in order to boost public trust in model prediction, we employed SHAP to improve model explainability, which was further evaluated using a between-subjects experiment with three conditions, i.e., text (T), text+SHAP explanation (TSE), and text+SHAP explanation+source and evidence (TSESE). The participants were significantly more likely to trust and share information related to COVID-19 in the TSE and TSESE conditions than in the T condition. Our results provided good implications for detecting misinformation about COVID-19 and improving public trust.
Collapse
Affiliation(s)
- Jackie Ayoub
- Industrial and Manufacturing Systems Engineering, University of Michigan-Dearborn, 4901 Evergreen Road, Dearborn, MI 48128, United States of America
| | - X Jessie Yang
- Industrial and Operations Engineering, University of Michigan, 1205 Beal Avenue, Ann Arbor, MI 48015, United States of America
| | - Feng Zhou
- Industrial and Manufacturing Systems Engineering, University of Michigan-Dearborn, 4901 Evergreen Road, Dearborn, MI 48128, United States of America
| |
Collapse
|
158
|
Pan P, Li Y, Xiao Y, Han B, Su L, Su M, Li Y, Zhang S, Jiang D, Chen X, Zhou F, Ma L, Bao P, Xie L. Prognostic Assessment of COVID-19 in the Intensive Care Unit by Machine Learning Methods: Model Development and Validation. J Med Internet Res 2020; 22:e23128. [PMID: 33035175 PMCID: PMC7661105 DOI: 10.2196/23128] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 09/06/2020] [Accepted: 10/08/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Patients with COVID-19 in the intensive care unit (ICU) have a high mortality rate, and methods to assess patients' prognosis early and administer precise treatment are of great significance. OBJECTIVE The aim of this study was to use machine learning to construct a model for the analysis of risk factors and prediction of mortality among ICU patients with COVID-19. METHODS In this study, 123 patients with COVID-19 in the ICU of Vulcan Hill Hospital were retrospectively selected from the database, and the data were randomly divided into a training data set (n=98) and test data set (n=25) with a 4:1 ratio. Significance tests, correlation analysis, and factor analysis were used to screen 100 potential risk factors individually. Conventional logistic regression methods and four machine learning algorithms were used to construct the risk prediction model for the prognosis of patients with COVID-19 in the ICU. The performance of these machine learning models was measured by the area under the receiver operating characteristic curve (AUC). Interpretation and evaluation of the risk prediction model were performed using calibration curves, SHapley Additive exPlanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), etc, to ensure its stability and reliability. The outcome was based on the ICU deaths recorded from the database. RESULTS Layer-by-layer screening of 100 potential risk factors finally revealed 8 important risk factors that were included in the risk prediction model: lymphocyte percentage, prothrombin time, lactate dehydrogenase, total bilirubin, eosinophil percentage, creatinine, neutrophil percentage, and albumin level. Finally, an eXtreme Gradient Boosting (XGBoost) model established with the 8 important risk factors showed the best recognition ability in the training set of 5-fold cross validation (AUC=0.86) and the verification queue (AUC=0.92). The calibration curve showed that the risk predicted by the model was in good agreement with the actual risk. In addition, using the SHAP and LIME algorithms, feature interpretation and sample prediction interpretation algorithms of the XGBoost black box model were implemented. Additionally, the model was translated into a web-based risk calculator that is freely available for public usage. CONCLUSIONS The 8-factor XGBoost model predicts risk of death in ICU patients with COVID-19 well; it initially demonstrates stability and can be used effectively to predict COVID-19 prognosis in ICU patients.
Collapse
Affiliation(s)
- Pan Pan
- Chinese PLA General Hospital, Medical School Of Chinese PLA, College of Pulmonary and Critical Care Medicine, Beijing, China
| | - Yichao Li
- DHC Mediway Technology Co Ltd, Beijing, China
| | - Yongjiu Xiao
- The 940th Hospital of Jiont Logistics Support Force of Chinese People's Liberation Army, Lanzhou, China
| | - Bingchao Han
- The 980th Hospital of Jiont Logistics Support Force of Chinese People's Liberation Army, Shijiazhuang, China
| | - Longxiang Su
- Peking Union Medical College Hospital, Beijing, China
| | | | - Yansheng Li
- DHC Mediway Technology Co Ltd, Beijing, China
| | - Siqi Zhang
- DHC Mediway Technology Co Ltd, Beijing, China
| | | | - Xia Chen
- DHC Mediway Technology Co Ltd, Beijing, China
| | - Fuquan Zhou
- DHC Mediway Technology Co Ltd, Beijing, China
| | - Ling Ma
- The 940th Hospital of Jiont Logistics Support Force of Chinese People's Liberation Army, Lanzhou, China
| | - Pengtao Bao
- College of Pulmonary and Critical Care Medicine, Chinese PLA General Hospital, Beijing, China
| | - Lixin Xie
- College of Pulmonary and Critical Care Medicine, Chinese PLA General Hospital, Beijing, China
| |
Collapse
|
159
|
Bi Y, Xiang D, Ge Z, Li F, Jia C, Song J. An Interpretable Prediction Model for Identifying N 7-Methylguanosine Sites Based on XGBoost and SHAP. Mol Ther Nucleic Acids 2020; 22:362-372. [PMID: 33230441 PMCID: PMC7533297 DOI: 10.1016/j.omtn.2020.08.022] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/20/2020] [Indexed: 12/19/2022]
Abstract
Recent studies have increasingly shown that the chemical modification of mRNA plays an important role in the regulation of gene expression. N7-methylguanosine (m7G) is a type of positively-charged mRNA modification that plays an essential role for efficient gene expression and cell viability. However, the research on m7G has received little attention to date. Bioinformatics tools can be applied as auxiliary methods to identify m7G sites in transcriptomes. In this study, we develop a novel interpretable machine learning-based approach termed XG-m7G for the differentiation of m7G sites using the XGBoost algorithm and six different types of sequence-encoding schemes. Both 10-fold and jackknife cross-validation tests indicate that XG-m7G outperforms iRNA-m7G. Moreover, using the powerful SHAP algorithm, this new framework also provides desirable interpretations of the model performance and highlights the most important features for identifying m7G sites. XG-m7G is anticipated to serve as a useful tool and guide for researchers in their future studies of mRNA modification sites.
Collapse
Affiliation(s)
- Yue Bi
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Dongxu Xiang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Zongyuan Ge
- Monash e-Research Centre and Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
160
|
Parsa AB, Movahedi A, Taghipour H, Derrible S, Mohammadian AK. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accid Anal Prev 2020; 136:105405. [PMID: 31864931 DOI: 10.1016/j.aap.2019.105405] [Citation(s) in RCA: 120] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Revised: 10/24/2019] [Accepted: 12/15/2019] [Indexed: 06/10/2023]
Abstract
Detecting traffic accidents as rapidly as possible is essential for traffic safety. In this study, we use eXtreme Gradient Boosting (XGBoost)-a Machine Learning (ML) technique-to detect the occurrence of accidents using a set of real time data comprised of traffic, network, demographic, land use, and weather features. The data used from the Chicago metropolitan expressways was collected between December 2016 and December 2017, and it includes 244 traffic accidents and 6073 non-accident cases. In addition, SHAP (SHapley Additive exPlanation) is employed to interpret the results and analyze the importance of individual features. The results show that XGBoost can detect accidents robustly with an accuracy, detection rate, and a false alarm rate of 99 %, 79 %, and 0.16 %, respectively. Several traffic related features, especially difference of speed between 5 min before and 5 min after an accident, are found to have relatively more impact on the occurrence of accidents. Furthermore, a feature dependency analysis is conducted for three pairs of features. First, average daily traffic and speed after accidents/non-accidents time at the upstream location are interpreted jointly. Then, distance to Central Business District and residential density are analyzed. Finally, speed after accidents/non-accidents time at upstream location and speed after accidents/non-accidents time at downstream location are evaluated with respect to the model's output.
Collapse
Affiliation(s)
- Amir Bahador Parsa
- Department of Civil and Materials Engineering, University of Illinois at Chicago, 842 W Taylor St, 2095 ERF, Chicago, IL 60607, United States.
| | - Ali Movahedi
- Department of Civil and Materials Engineering, University of Illinois at Chicago, 842 W Taylor St, 2095 ERF, Chicago, IL 60607, United States.
| | - Homa Taghipour
- Department of Civil and Materials Engineering, University of Illinois at Chicago, 842 W Taylor St, 2095 ERF, Chicago, IL 60607, United States.
| | - Sybil Derrible
- Department of Civil and Materials Engineering, Institute for Environmental Science and Policy, University of Illinois at Chicago, 842 W Taylor St, 2095 ERF, Chicago, IL 60607, United States.
| | - Abolfazl Kouros Mohammadian
- Department of Civil and Materials Engineering, University of Illinois at Chicago, 842 W Taylor St, 2095 ERF, Chicago, IL 60607, United States.
| |
Collapse
|
161
|
Dai C, Fan Y, Li Y, Bao X, Li Y, Su M, Yao Y, Deng K, Xing B, Feng F, Feng M, Wang R. Development and Interpretation of Multiple Machine Learning Models for Predicting Postoperative Delayed Remission of Acromegaly Patients During Long-Term Follow-Up. Front Endocrinol (Lausanne) 2020; 11:643. [PMID: 33042013 PMCID: PMC7525125 DOI: 10.3389/fendo.2020.00643] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 08/07/2020] [Indexed: 12/11/2022] Open
Abstract
Background: Some patients with acromegaly do not reach the remission standard in the short term after surgery but achieve remission without additional postoperative treatment during long-term follow-up; this phenomenon is defined as postoperative delayed remission (DR). DR may complicate the interpretation of surgical outcomes in patients with acromegaly and interfere with decision-making regarding postoperative adjuvant therapy. Objective: We aimed to develop and validate machine learning (ML) models for predicting DR in acromegaly patients who have not achieved remission within 6 months of surgery. Methods: We enrolled 306 acromegaly patients and randomly divided them into training and test datasets. We used the recursive feature elimination (RFE) algorithm to select features and applied six ML algorithms to construct DR prediction models. The performance of these ML models was validated using receiver operating characteristics analysis. We used permutation importance, SHapley Additive exPlanations (SHAP), and local interpretable model-agnostic explanation (LIME) algorithms to determine the importance of the selected features and interpret the ML models. Results: Fifty-five (17.97%) acromegaly patients met the criteria for DR, and five features (post-1w rGH, post-1w nGH, post-6m rGH, post-6m IGF-1, and post-6m nGH) were significantly associated with DR in both the training and the test datasets. After the RFE feature selection, the XGboost model, which comprised the 15 important features, had the greatest discriminatory ability (area under the curve = 0.8349, sensitivity = 0.8889, Youden's index = 0.6842). The XGboost model showed good discrimination ability and provided significantly better estimates of DR of patients with acromegaly compared with using only the Knosp grade. The results obtained from permutation importance, SHAP, and LIME algorithms showed that post-6m IGF-1 is the most important feature in XGboost algorithm prediction and showed the reliability and the clinical practicability of the XGboost model in DR prediction. Conclusions: ML-based models can serve as an effective non-invasive approach to predicting DR and could aid in determining individual treatment and follow-up strategies for acromegaly patients who have not achieved remission within 6 months of surgery.
Collapse
Affiliation(s)
- Congxin Dai
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yanghua Fan
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yichao Li
- DHC Mediway Technology Co., Ltd., Beijing, China
| | - Xinjie Bao
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yansheng Li
- DHC Mediway Technology Co., Ltd., Beijing, China
| | - Mingliang Su
- DHC Mediway Technology Co., Ltd., Beijing, China
| | - Yong Yao
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Kan Deng
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Bing Xing
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Feng Feng
- Department of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Ming Feng
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- *Correspondence: Ming Feng
| | - Renzhi Wang
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- Renzhi Wang
| |
Collapse
|
162
|
Hathaway QA, Roth SM, Pinti MV, Sprando DC, Kunovac A, Durr AJ, Cook CC, Fink GK, Cheuvront TB, Grossman JH, Aljahli GA, Taylor AD, Giromini AP, Allen JL, Hollander JM. Machine-learning to stratify diabetic patients using novel cardiac biomarkers and integrative genomics. Cardiovasc Diabetol 2019; 18:78. [PMID: 31185988 PMCID: PMC6560734 DOI: 10.1186/s12933-019-0879-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 05/29/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Diabetes mellitus is a chronic disease that impacts an increasing percentage of people each year. Among its comorbidities, diabetics are two to four times more likely to develop cardiovascular diseases. While HbA1c remains the primary diagnostic for diabetics, its ability to predict long-term, health outcomes across diverse demographics, ethnic groups, and at a personalized level are limited. The purpose of this study was to provide a model for precision medicine through the implementation of machine-learning algorithms using multiple cardiac biomarkers as a means for predicting diabetes mellitus development. METHODS Right atrial appendages from 50 patients, 30 non-diabetic and 20 type 2 diabetic, were procured from the WVU Ruby Memorial Hospital. Machine-learning was applied to physiological, biochemical, and sequencing data for each patient. Supervised learning implementing SHapley Additive exPlanations (SHAP) allowed binary (no diabetes or type 2 diabetes) and multiple classification (no diabetes, prediabetes, and type 2 diabetes) of the patient cohort with and without the inclusion of HbA1c levels. Findings were validated through Logistic Regression (LR), Linear Discriminant Analysis (LDA), Gaussian Naïve Bayes (NB), Support Vector Machine (SVM), and Classification and Regression Tree (CART) models with tenfold cross validation. RESULTS Total nuclear methylation and hydroxymethylation were highly correlated to diabetic status, with nuclear methylation and mitochondrial electron transport chain (ETC) activities achieving superior testing accuracies in the predictive model (~ 84% testing, binary). Mitochondrial DNA SNPs found in the D-Loop region (SNP-73G, -16126C, and -16362C) were highly associated with diabetes mellitus. The CpG island of transcription factor A, mitochondrial (TFAM) revealed CpG24 (chr10:58385262, P = 0.003) and CpG29 (chr10:58385324, P = 0.001) as markers correlating with diabetic progression. When combining the most predictive factors from each set, total nuclear methylation and CpG24 methylation were the best diagnostic measures in both binary and multiple classification sets. CONCLUSIONS Using machine-learning, we were able to identify novel as well as the most relevant biomarkers associated with type 2 diabetes mellitus by integrating physiological, biochemical, and sequencing datasets. Ultimately, this approach may be used as a guideline for future investigations into disease pathogenesis and novel biomarker discovery.
Collapse
Affiliation(s)
- Quincy A Hathaway
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA
- Mitochondria, Metabolism & Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Skyler M Roth
- Department of Chemical and Biomedical Engineering, West Virginia University, Morgantown, WV, 26505, USA
| | - Mark V Pinti
- West Virginia University School of Pharmacy, Morgantown, WV, 26505, USA
| | - Daniel C Sprando
- West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Amina Kunovac
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA
- Mitochondria, Metabolism & Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Andrya J Durr
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA
- Mitochondria, Metabolism & Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Chris C Cook
- Cardiovascular and Thoracic Surgery, West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Garrett K Fink
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA
| | - Tristen B Cheuvront
- Department of Chemical and Biomedical Engineering, West Virginia University, Morgantown, WV, 26505, USA
| | - Jasmine H Grossman
- Department of Chemical and Biomedical Engineering, West Virginia University, Morgantown, WV, 26505, USA
| | - Ghadah A Aljahli
- Department of Chemical and Biomedical Engineering, West Virginia University, Morgantown, WV, 26505, USA
| | - Andrew D Taylor
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA
- Mitochondria, Metabolism & Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Andrew P Giromini
- West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Jessica L Allen
- Department of Chemical and Biomedical Engineering, West Virginia University, Morgantown, WV, 26505, USA
| | - John M Hollander
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA.
- Mitochondria, Metabolism & Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, WV, 26505, USA.
| |
Collapse
|
163
|
Stojić A, Stanić N, Vuković G, Stanišić S, Perišić M, Šoštarić A, Lazić L. Explainable extreme gradient boosting tree-based prediction of toluene, ethylbenzene and xylene wet deposition. Sci Total Environ 2019; 653:140-147. [PMID: 30408662 DOI: 10.1016/j.scitotenv.2018.10.368] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 10/25/2018] [Accepted: 10/27/2018] [Indexed: 05/20/2023]
Abstract
Current research suggests that, apart from photochemical reactions, toluene, ethylbenzene and xylene (TEX) removal from ambient air might be affected by atmospheric precipitation, depending on the concentrations and water solubility of the compounds, Henry's law, physico-chemical properties of the water, as well as the frequency and intensity of precipitation events. Nevertheless, existing knowledge of the role that wet deposition plays in biogeochemical cycles of volatile species remains insufficient, and this topic requires more scientific effort to be explored and understood. In this study, we employed the eXtreme Gradient Boosting tree ensemble for revealing TEX transfer from ambient air to rainwater, and applied a novel SHapley Additive exPlanations feature attribution framework to examine the relevance of the monitored parameters and identify key factors that govern wet deposition of TEX. According to the results, main impacts, including ambient air TEX concentrations, and rainwater and air temperatures, and occasional, but less important impacts, including wind speed, air pressure, turbidity, and total organic carbon, NO3-, Cl- and K+ rainwater concentration, shaped TEX partition between gaseous and aqueous phases during rain events.
Collapse
Affiliation(s)
- Andreja Stojić
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, Pregrevica 118, 11000 Belgrade, Serbia.
| | - Nenad Stanić
- Singidunum University, Danijelova 32, 11000 Belgrade, Serbia
| | - Gordana Vuković
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, Pregrevica 118, 11000 Belgrade, Serbia
| | | | - Mirjana Perišić
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, Pregrevica 118, 11000 Belgrade, Serbia
| | - Andrej Šoštarić
- Institute of Public Health Belgrade, Despota Stefana 54, 11000 Belgrade, Serbia
| | - Lazar Lazić
- Faculty of Physics, University of Belgrade, Studentski trg 12-16, 11000 Belgrade, Serbia
| |
Collapse
|
164
|
Burgerhof JGM, Vasluian E, Dijkstra PU, Bongers RM, van der Sluis CK. The Southampton Hand Assessment Procedure revisited: A transparent linear scoring system, applied to data of experienced prosthetic users. J Hand Ther 2017; 30:49-57. [PMID: 27912919 DOI: 10.1016/j.jht.2016.05.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Revised: 03/30/2016] [Accepted: 05/10/2016] [Indexed: 02/03/2023]
Abstract
STUDY DESIGN Cross-sectional. INTRODUCTION Southampton Hand Assessment Procedure (SHAP) provides function scores for hand grips (prehensile patterns) and an overall score, the index of function (IOF). The underlying equations of SHAP are not publicly available, which induces opacity. Furthermore, SHAP has been scarcely tested in prosthetic users. METHODS Issues with SHAP-IOF are discussed; an alternative scoring system, that is, linear index of function (LIF) and a weighted version (W-LIF) are presented. In LIF, task times are transformed linearly, relative to SHAP norms, and are computed into LIF-prehensile patterns (LIFPP). LIF and IOF were compared using data of 27 experienced prosthetic users. RESULTS High correlation and agreement between LIF and IOF was found: LIFPP vs IOFPP ranged from r = 0.880 to r = 0.988, and W-LIF vs IOF had a correlation coefficient of r = 0.984. DISCUSSION SHAP data of prosthetic users are valuable benchmarks for health care professionals. LIF calculations are a good and cost free alternative for IOF scores. CONCLUSION(S) Measurements with LIF and IOF may be considered similar, but LIF is transparent and easier to use than IOF. LEVEL OF EVIDENCE Clinical measurement and cross-sectional.
Collapse
Affiliation(s)
- Johannes G M Burgerhof
- Department of Epidemiology, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands.
| | - Ecaterina Vasluian
- Department of Rehabilitation Medicine, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | - Pieter U Dijkstra
- Department of Rehabilitation Medicine, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands; Department of Oral and Maxillofacial Surgery, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | - Raoul M Bongers
- Department of Rehabilitation Medicine, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands; Center of Human Movement Sciences, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | - Corry K van der Sluis
- Department of Rehabilitation Medicine, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| |
Collapse
|
165
|
Vujaklija I, Roche AD, Hasenoehrl T, Sturma A, Amsuess S, Farina D, Aszmann OC. Translating Research on Myoelectric Control into Clinics-Are the Performance Assessment Methods Adequate? Front Neurorobot 2017; 11:7. [PMID: 28261085 PMCID: PMC5306214 DOI: 10.3389/fnbot.2017.00007] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2016] [Accepted: 02/01/2017] [Indexed: 11/23/2022] Open
Abstract
Missing an upper limb dramatically impairs daily-life activities. Efforts in overcoming the issues arising from this disability have been made in both academia and industry, although their clinical outcome is still limited. Translation of prosthetic research into clinics has been challenging because of the difficulties in meeting the necessary requirements of the market. In this perspective article, we suggest that one relevant factor determining the relatively small clinical impact of myocontrol algorithms for upper limb prostheses is the limit of commonly used laboratory performance metrics. The laboratory conditions, in which the majority of the solutions are being evaluated, fail to sufficiently replicate real-life challenges. We qualitatively support this argument with representative data from seven transradial amputees. Their ability to control a myoelectric prosthesis was tested by measuring the accuracy of offline EMG signal classification, as a typical laboratory performance metrics, as well as by clinical scores when performing standard tests of daily living. Despite all subjects reaching relatively high classification accuracy offline, their clinical scores varied greatly and were not strongly predicted by classification accuracy. We therefore support the suggestion to test myocontrol systems using clinical tests on amputees, fully fitted with sockets and prostheses highly resembling the systems they would use in daily living, as evaluation benchmark. Agreement on this level of testing for systems developed in research laboratories would facilitate clinically relevant progresses in this field.
Collapse
Affiliation(s)
- Ivan Vujaklija
- Clinic for Trauma Surgery, Orthopaedic Surgery and Plastic Surgery, Research Department for Neurorehabilitation Systems, University Medical Centre GöttingenGoettingen, Germany; Department of Bioengineering, Imperial College LondonLondon, UK
| | - Aidan D Roche
- Christian Doppler Laboratory for Restoration of Extremity Function, Medical University of Vienna, Vienna Austria
| | - Timothy Hasenoehrl
- Department of Physical Medicine, Rehabilitation and Occupational Medicine, Medical University of Vienna, Vienna Austria
| | - Agnes Sturma
- Christian Doppler Laboratory for Restoration of Extremity Function, Medical University of Vienna, ViennaAustria; Master Degree Program "Health Assisting Engineering", University of Applied Sciences FH Campus Wien, ViennaAustria
| | | | - Dario Farina
- Department of Bioengineering, Imperial College London London, UK
| | - Oskar C Aszmann
- Christian Doppler Laboratory for Restoration of Extremity Function, Medical University of Vienna, ViennaAustria; Division of Plastic and Reconstructive Surgery, Department of Surgery, Medical University of Vienna, ViennaAustria
| |
Collapse
|