1
|
Long Y, Xu X, Chen J, Liu S, Li J, Dong Y. An explainable predictive model of direct pulp capping in carious mature permanent teeth. J Dent 2024; 149:105269. [PMID: 39094974 DOI: 10.1016/j.jdent.2024.105269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 07/24/2024] [Accepted: 07/26/2024] [Indexed: 08/04/2024] Open
Abstract
OBJECTIVE To introduce a novel approach for predicting the personalized probability of success of DPC treatment in carious mature permanent teeth using explainable machine learning (ML) models. METHODS Clinical data were obtained from our previous single-center retrospective study, comprising 393 carious mature permanent teeth from 372 patients who underwent DPC and attended 1-year follow-up between January 2015 and February 2021. Six ML models were derived based on 80 % cases of the cohort, with the remaining 20 % cases used for validation. Shapley additive explanation (SHAP) values were utilized to assess feature importance and the clinical relevance of prediction models. RESULTS Within the cohort, 9.67 % (38 out of 393) of teeth experienced failure at the 1-year follow-up after DPC treatment. Among the six evaluated ML models, the XGBoost model exhibited the highest discriminative ability. By prioritizing features based on their importance, streamlined and interpretable XGBoost model with 11 features were developed for 1-year prognostication post-DPC. The model demonstrated predictive accuracy with area under the curve (AUC) scores of 0.86 for the 1-year prediction. The final model has been translated into a web application to facilitate clinical decision-making. CONCLUSION By incorporating demographic and clinical examination data, the XGBoost model offered a user-friendly tool for dentists to predict personalized probability of success, thereby improving personalized dental care and patient counseling. The utilization of SHAP for model interpretation provided transparent insights into the decision-making process.
Collapse
Affiliation(s)
- Yunzi Long
- Department of Cariology and Endodontology, Peking University School and Hospital of Stomatology & National Center for Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices & Beijing Key Laboratory of Digital Stomatology & NHC Key Laboratory of Digital Stomatology & NMPA Key Laboratory for Dental Materials, Beijing 100081, PR China; Department of General Dentistry II, Peking University School and Hospital of Stomatology, Beijing 100081, PR China
| | - Xiaowei Xu
- Institute of Medical Information/ Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China; College of Biomedical Engineering & Instrument Science, Zhejiang University, Hangzhou 310058, China
| | - Jiaqi Chen
- Department of Cariology and Endodontology, Peking University School and Hospital of Stomatology & National Center for Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices & Beijing Key Laboratory of Digital Stomatology & NHC Key Laboratory of Digital Stomatology & NMPA Key Laboratory for Dental Materials, Beijing 100081, PR China
| | - Siyi Liu
- Department of Cariology and Endodontology, Peking University School and Hospital of Stomatology & National Center for Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices & Beijing Key Laboratory of Digital Stomatology & NHC Key Laboratory of Digital Stomatology & NMPA Key Laboratory for Dental Materials, Beijing 100081, PR China.
| | - Jiao Li
- Institute of Medical Information/ Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China.
| | - Yanmei Dong
- Department of Cariology and Endodontology, Peking University School and Hospital of Stomatology & National Center for Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices & Beijing Key Laboratory of Digital Stomatology & NHC Key Laboratory of Digital Stomatology & NMPA Key Laboratory for Dental Materials, Beijing 100081, PR China
| |
Collapse
|
2
|
Samerei SA, Aghabayk K. Interpretable machine learning for evaluating risk factors of freeway crash severity. Int J Inj Contr Saf Promot 2024; 31:534-550. [PMID: 38768184 DOI: 10.1080/17457300.2024.2351972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 04/27/2024] [Accepted: 05/02/2024] [Indexed: 05/22/2024]
Abstract
Machine learning (ML) models are widely employed for crash severity modelling, yet their interpretability remains underexplored. Interpretation is crucial for comprehending ML results and aiding informed decision-making. This study aims to implement an interpretable ML to visualize the impacts of factors on crash severity using 5 years of freeways data from Iran. Methods including classification and regression trees (CART), K-nearest neighbours (KNNs), random forest (RF), artificial neural network (ANN) and support vector machines (SVM) were applied, with RF demonstrating superior accuracy, recall, F1-score and ROC. The accumulated local effects (ALE) were utilized for interpretation. Findings suggest that light traffic conditions (volume / capacity < 0.5 ) with critical values around 0.05 or 0.38, and higher proportion of large trucks and buses, particularly at 10% and 4%, are associated with severe crashes. Additionally, speeds exceeding 90 km/h, drivers younger than 30 years, rollover crashes, collisions with fixed objects and barriers, nighttime driving and driver fatigue elevate the likelihood of severe crashes.
Collapse
Affiliation(s)
- Seyed Alireza Samerei
- School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran
| | - Kayvan Aghabayk
- School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran
| |
Collapse
|
3
|
Yue H. Investigating the influence of streetscape environmental characteristics on pedestrian crashes at intersections using street view images and explainable machine learning. ACCIDENT; ANALYSIS AND PREVENTION 2024; 205:107693. [PMID: 38955107 DOI: 10.1016/j.aap.2024.107693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/05/2024] [Accepted: 06/24/2024] [Indexed: 07/04/2024]
Abstract
Examining the relationship between streetscape features and road traffic accidents is pivotal for enhancing roadway safety. While previous studies have primarily focused on the influence of street design characteristics, sociodemographic features, and land use features on crash occurrence, the impact of streetscape features on pedestrian crashes has not been thoroughly investigated. Furthermore, while machine learning models demonstrate high accuracy in prediction and are increasingly utilized in traffic safety research, understanding the prediction results poses challenges. To address these gaps, this study extracts streetscape environment characteristics from street view images (SVIs) using a combination of semantic segmentation and object detection deep learning networks. These characteristics are then incorporated into the eXtreme Gradient Boosting (XGBoost) algorithm, along with a set of control variables, to model the occurrence of pedestrian crashes at intersections. Subsequently, the SHapley Additive exPlanations (SHAP) method is integrated with XGBoost to establish an interpretable framework for exploring the association between pedestrian crash occurrence and the surrounding streetscape built environment. The results are interpreted from global, local, and regional perspectives. The findings indicate that, from a global perspective, traffic volume and commercial land use are significant contributors to pedestrian-vehicle collisions at intersections, while road, person, and vehicle elements extracted from SVIs are associated with higher risks of pedestrian crash onset. At a local level, the XGBoost-SHAP framework enables quantification of features' local contributions for individual intersections, revealing spatial heterogeneity in factors influencing pedestrian crashes. From a regional perspective, similar intersections can be grouped to define geographical regions, facilitating the formulation of spatially responsive strategies for distinct regions to reduce traffic accidents. This approach can potentially enhance the quality and accuracy of local policy making. These findings underscore the underlying relationship between streetscape-level environmental characteristics and vehicle-pedestrian crashes. The integration of SVIs and deep learning techniques offers a visually descriptive portrayal of the streetscape environment at locations where traffic crashes occur at eye level. The proposed framework not only achieves excellent prediction performance but also enhances understanding of traffic crash occurrences, offering guidance for optimizing traffic accident prevention and treatment programs.
Collapse
Affiliation(s)
- Han Yue
- Center of GeoInformatics for Public Security, School of Geography and Remote Sensing, Guangzhou University, Guangzhou, 510006, China.
| |
Collapse
|
4
|
Wang S, Liu Y, Wang W, Zhao G, Liang H. Interpretable machine learning guided by physical mechanisms reveals drivers of runoff under dynamic land use changes. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 367:121978. [PMID: 39067339 DOI: 10.1016/j.jenvman.2024.121978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 06/14/2024] [Accepted: 07/17/2024] [Indexed: 07/30/2024]
Abstract
Human activities continuously impact water balances and cycling in watersheds, making it essential to accurately identify the responses of runoff to dynamic changes in land use types. Although machine learning models demonstrate promise in capturing the intricate interplay between hydrological factors, their "black box" nature makes it challenging to identify the dynamic drivers of runoff. To overcome this challenge, we employed an interpretable machine learning method to inversely deduce the dynamic determinants within hydrological processes. In this study, we analyzed land use changes in the Ningxia section of the middle Yellow River across four periods, laying the foundation for revealing how these changes affect runoff. The sub-watershed attributes and meteorological characteristics generated by the Soil and Water Assessment Tool (SWAT) model were used as input variables of the Extreme Gradient Boosting (XGBoost) model to simulate substantial sub-watershed rainfall runoff in the region. The XGBoost was interpreted using the SHapley Additive exPlanations (SHAP) to identify the dynamic responses of runoff to the land use changes over different periods. The results revealed increasingly frequent interchanges between the land use types in the study area. The XGBoost effectively captured the characteristics of the hydrological processes in the SWAT-derived sub-watersheds. The SHAP analysis results demonstrated that the promoting effect of agricultural land (AGRL) on runoff gradually weakens, while forests (FRST) continuously strengthen their restraining effect on runoff. Relevant land use policies provide empirical support for these findings. Furthermore, the interaction between meteorological variables and land use impacts the runoff generation mechanism and exhibits a threshold effect, with the thresholds for relative humidity (RH), maximum temperature (MaxT), and minimum temperature (MinT) determined to be 0.8, 25 °C, and 15 °C, respectively. This reverse deduction method can reveal hydrological patterns and the mechanisms of interaction between variables, helping to effectively addressing constantly changing human activities and meteorological conditions.
Collapse
Affiliation(s)
- Shuli Wang
- School of Water and Environment, Chang'an University, Xi'an, 710061, China; Key Laboratory of Subsurface Hydrology and Ecological Effects in Arid Region, Ministry of Education, Chang'an University, Xi'an, 710061, China
| | - Yitian Liu
- School of Water and Environment, Chang'an University, Xi'an, 710061, China; Key Laboratory of Subsurface Hydrology and Ecological Effects in Arid Region, Ministry of Education, Chang'an University, Xi'an, 710061, China
| | - Wei Wang
- School of Water and Environment, Chang'an University, Xi'an, 710061, China; Key Laboratory of Subsurface Hydrology and Ecological Effects in Arid Region, Ministry of Education, Chang'an University, Xi'an, 710061, China.
| | - Guizhang Zhao
- College of Geosciences and Engineering, North China University of Water Resources and Electric Power, Zhengzhou, 450045, China
| | - Haotian Liang
- School of Water and Environment, Chang'an University, Xi'an, 710061, China; Key Laboratory of Subsurface Hydrology and Ecological Effects in Arid Region, Ministry of Education, Chang'an University, Xi'an, 710061, China
| |
Collapse
|
5
|
Li Y, Huang T, Lee HF, Heo Y, Ho KF, Yim SHL. Integrating Doppler LiDAR and machine learning into land-use regression model for assessing contribution of vertical atmospheric processes to urban PM 2.5 pollution. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 952:175632. [PMID: 39168320 DOI: 10.1016/j.scitotenv.2024.175632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 08/06/2024] [Accepted: 08/17/2024] [Indexed: 08/23/2024]
Abstract
Air pollution has been recognized as a global issue, through adverse effects on environment and health. While vertical atmospheric processes substantially affect urban air pollution, traditional epidemiological research using Land-use regression (LUR) modeling usually focused on ground-level attributes without considering upper-level atmospheric conditions. This study aimed to integrate Doppler LiDAR and machine learning techniques into LUR models (LURF-LiDAR) to comprehensively evaluate urban air pollution in Hong Kong, and to assess complex interactions between vertical atmospheric processes and urban air pollution from long-term (i.e., annual) and short-term (i.e., two air pollution episodes) views in 2021. The results demonstrated significant improvements in model performance, achieving CV R2 values of 0.81 (95 % CI: 0.75-0.86) for the long-term PM2.5 prediction model and 0.90 (95 % CI: 0.87-0.91) for the short-term models. Approximately 69 % of ground-level air pollution arose from the mixing of ground- and lower-level (105 m-225 m) particles, while 21 % was associated with upper-level (825 m-945 m) atmospheric processes. The identified transboundary air pollution (TAP) layer was located at ~900 m above the ground. The identified Episode one (E1: 7 Jan-22 Jan) was induced by the accumulation of local emissions under stable atmospheric conditions, whereas Episode two (E2: 13 Dec-24 Dec) was regulated by TAP under instable and turbulent conditions. Our improved air quality prediction model is accurate and comprehensive with high interpretability for supporting urban planning and air quality policies.
Collapse
Affiliation(s)
- Yue Li
- Department of Geography and Resource Management, The Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong 999077, China
| | - Tao Huang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 639798, Singapore; Earth Observatory of Singapore, Nanyang Technological University, Singapore 639798, Singapore
| | - Harry Fung Lee
- Department of Geography and Resource Management, The Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong 999077, China
| | - Yeonsook Heo
- School of Civil, Environmental and Architectural Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| | - Kin-Fai Ho
- The Jockey Club School of Public Health and Primary Care, The Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong 999077, China
| | - Steve H L Yim
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 639798, Singapore; Earth Observatory of Singapore, Nanyang Technological University, Singapore 639798, Singapore; Asian School of the Environment, Nanyang Technological University, Singapore 639798, Singapore.
| |
Collapse
|
6
|
Zan J, Dong X, Yang H, Yan J, He Z, Tian J, Zhang Y. Application of the Unbalanced Ensemble Algorithm for Prognostic Prediction Outcomes of All-Cause Mortality in Coronary Heart Disease Patients Comorbid with Hypertension. Risk Manag Healthc Policy 2024; 17:1921-1936. [PMID: 39135612 PMCID: PMC11317517 DOI: 10.2147/rmhp.s472398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 07/24/2024] [Indexed: 08/15/2024] Open
Abstract
Purpose This study sought to develop an unbalanced-ensemble model that could accurately predict death outcomes of patients with comorbid coronary heart disease (CHD) and hypertension and evaluate the factors contributing to death. Patients and Methods Medical records of 1058 patients with coronary heart disease combined with hypertension and excluding those acute coronary syndrome were collected. Patients were followed-up at the first, third, sixth, and twelfth months after discharge to record death events. Follow-up ended two years after discharge. Patients were divided into survival and nonsurvival groups. According to medical records, gender, smoking, drinking, COPD, cerebral stroke, diabetes, hyperhomocysteinemia, heart failure and renal insufficiency of the two groups were sorted and compared and other influencing factors of the two groups, feature selection was carried out to construct models. Owing to data unbalance, we developed four unbalanced-ensemble prediction models based on Balanced Random Forest (BRF), EasyEnsemble, RUSBoost, SMOTEBoost and the two base classification algorithms based on AdaBoost and Logistic. Each model was optimised using hyperparameters based on GridSearchCV and evaluated using area under the curve (AUC), sensitivity, recall, Brier score, and geometric mean (G-mean). Additionally, to understand the influence of variables on model performance, we constructed a SHapley Additive explanation (SHAP) model based on the optimal model. Results There were significant differences in age, heart rate, COPD, cerebral stroke, heart failure and renal insufficiency in the nonsurvival group compared with the survival group. Among all models, BRF yielded the highest AUC (0.810; 95% CI, 0.778-0.839), sensitivity (0.990; 95% CI, 0.981-1.000), recall (0.990; 95% CI, 0.981-1.000), and G-mean (0.806; 95% CI, 0.778-0.827), and the lowest Brier score (0.181; 95% CI, 0.178-0.185). Therefore, we identified BRF as the optimal model. Furthermore, red blood cell count (RBC), body mass index (BMI), and lactate dehydrogenase were found to be important mortality-associated risk factors. Conclusion BRF combined with advanced machine learning methods and SHAP is highly effective and accurately predicts mortality in patients with CHD comorbid with hypertension. This model has the potential to assist clinicians in modifying treatment strategies to improve patient outcomes.
Collapse
Affiliation(s)
- Jiaxin Zan
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, People’s Republic of China
| | - Xiaojing Dong
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, People’s Republic of China
| | - Hong Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, People’s Republic of China
| | - Jingjing Yan
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, People’s Republic of China
| | - Zixuan He
- Department of Cardiology, The First Hospital of Shanxi Medical University, Taiyuan, People’s Republic of China
| | - Jing Tian
- Department of Cardiology, The First Hospital of Shanxi Medical University, Taiyuan, People’s Republic of China
| | - Yanbo Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, People’s Republic of China
- School of Health Services and Management, Shanxi University of Chinese Medicine, Taiyuan, People’s Republic of China
| |
Collapse
|
7
|
Yasin P, Yimit Y, Cai X, Aimaiti A, Sheng W, Mamat M, Nijiati M. Machine learning-enabled prediction of prolonged length of stay in hospital after surgery for tuberculosis spondylitis patients with unbalanced data: a novel approach using explainable artificial intelligence (XAI). Eur J Med Res 2024; 29:383. [PMID: 39054495 PMCID: PMC11270948 DOI: 10.1186/s40001-024-01988-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 07/18/2024] [Indexed: 07/27/2024] Open
Abstract
BACKGROUND Tuberculosis spondylitis (TS), commonly known as Pott's disease, is a severe type of skeletal tuberculosis that typically requires surgical treatment. However, this treatment option has led to an increase in healthcare costs due to prolonged hospital stays (PLOS). Therefore, identifying risk factors associated with extended PLOS is necessary. In this research, we intended to develop an interpretable machine learning model that could predict extended PLOS, which can provide valuable insights for treatments and a web-based application was implemented. METHODS We obtained patient data from the spine surgery department at our hospital. Extended postoperative length of stay (PLOS) refers to a hospitalization duration equal to or exceeding the 75th percentile following spine surgery. To identify relevant variables, we employed several approaches, such as the least absolute shrinkage and selection operator (LASSO), recursive feature elimination (RFE) based on support vector machine classification (SVC), correlation analysis, and permutation importance value. Several models using implemented and some of them are ensembled using soft voting techniques. Models were constructed using grid search with nested cross-validation. The performance of each algorithm was assessed through various metrics, including the AUC value (area under the curve of receiver operating characteristics) and the Brier Score. Model interpretation involved utilizing methods such as Shapley additive explanations (SHAP), the Gini Impurity Index, permutation importance, and local interpretable model-agnostic explanations (LIME). Furthermore, to facilitate the practical application of the model, a web-based interface was developed and deployed. RESULTS The study included a cohort of 580 patients and 11 features include (CRP, transfusions, infusion volume, blood loss, X-ray bone bridge, X-ray osteophyte, CT-vertebral destruction, CT-paravertebral abscess, MRI-paravertebral abscess, MRI-epidural abscess, postoperative drainage) were selected. Most of the classifiers showed better performance, where the XGBoost model has a higher AUC value (0.86) and lower Brier Score (0.126). The XGBoost model was chosen as the optimal model. The results obtained from the calibration and decision curve analysis (DCA) plots demonstrate that XGBoost has achieved promising performance. After conducting tenfold cross-validation, the XGBoost model demonstrated a mean AUC of 0.85 ± 0.09. SHAP and LIME were used to display the variables' contributions to the predicted value. The stacked bar plots indicated that infusion volume was the primary contributor, as determined by Gini, permutation importance (PFI), and the LIME algorithm. CONCLUSIONS Our methods not only effectively predicted extended PLOS but also identified risk factors that can be utilized for future treatments. The XGBoost model developed in this study is easily accessible through the deployed web application and can aid in clinical research.
Collapse
Affiliation(s)
- Parhat Yasin
- Department of Spine Surgery, The Sixth Affiliated Hospital of Xinjiang Medical University, Urumqi, 830000, Xinjiang, People's Republic of China
- Department of Spine Surgery, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, Xinjiang, People's Republic of China
| | - Yasen Yimit
- Department of Radiology, The First People's Hospital of Kashi Prefecture, Kashi, 844000, Xinjiang, People's Republic of China
| | - Xiaoyu Cai
- Department of Spine Surgery, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, Xinjiang, People's Republic of China
| | - Abasi Aimaiti
- Department of Anesthesiology, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, Xinjiang, People's Republic of China
| | - Weibin Sheng
- Department of Spine Surgery, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, Xinjiang, People's Republic of China
| | - Mardan Mamat
- Department of Spine Surgery, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, Xinjiang, People's Republic of China.
| | - Mayidili Nijiati
- Department of Radiology, The Fourth Affiliated Hospital of Xinjiang Medical University(Xinjiang Hospital of Traditional Chinese Medicine), Urumqi, 830002, Xinjiang, People's Republic of China.
- Xinjiang Key Laboratory of Artificial Intelligence Assisted Imaging Diagnosis, Kashi, 844000, Xinjiang, People's Republic of China.
| |
Collapse
|
8
|
Samerei SA, Aghabayk K. Analyzing the transition from two-vehicle collisions to chain reaction crashes: A hybrid approach using random parameters logit model, interpretable machine learning, and clustering. ACCIDENT; ANALYSIS AND PREVENTION 2024; 202:107603. [PMID: 38701559 DOI: 10.1016/j.aap.2024.107603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/02/2024] [Accepted: 04/27/2024] [Indexed: 05/05/2024]
Abstract
Chain reaction crashes (CRC) begin with a two-vehicle collision and rapidly intensify as more vehicles get directly involved. CRCs result in more extensive damage compared to two-vehicle crashes and understanding the progression of a two-vehicle collision into a CRC can unveil preventive strategies that have received less attention. In this study, to align with recent research direction and overcome the limitations of econometric and machine learning (ML) modelling, a hybrid approach is adopted. Moreover, to tackle the existing challenges in crash analysis, addressing unobserved heterogeneity in ML, and exploring random parameter effects and interactions more precisely, a new approach is proposed. To achieve this, a hybrid random parameter logit model and interpretable ML, joint with prior latent class clustering is implemented. Notably, this is the first attempt at using a clustering with hybrid modeling. The significant risk factors, their critical values, distinct effects, and interactions are interpreted using both marginal effects and the SHAP (SHapley Additive exPlanations) method across clusters. This study utilizes crash, traffic, and geometric data from eleven suburban freeways in Iran collected over a 5-year period. The overall results indicate an increased risk of CRC in congested traffic, higher traffic variation, and on horizontal curves combined with longitudinal slopes. Some parameters exhibit distinct or fluctuating effects, which are discussed across different conditions or considering interactions. For instance, during nighttime, heightened congestion on 2-lane freeways, increased traffic variation in less congested conditions, and adverse weather combined with horizontal curves and slopes pose risks. During daytime, increased traffic variation within highly congested sections, higher proportion of heavy vehicle traffic in moderately congested sections, and two lanes in each direction coupled with curves, elevate the levels of risk. The results of this study provide a better understanding of risk factors impact across different conditions, which are usable for policy makers.
Collapse
Affiliation(s)
- Seyed Alireza Samerei
- School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran.
| | - Kayvan Aghabayk
- School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran.
| |
Collapse
|
9
|
Zhu Y, Qian Y, Xu J, Hu W. Young novice drivers' road crash injuries and contributing factors: A crash data investigation. TRAFFIC INJURY PREVENTION 2024:1-8. [PMID: 38917367 DOI: 10.1080/15389588.2024.2367504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 06/10/2024] [Indexed: 06/27/2024]
Abstract
OBJECTIVE Collisions are a significant cause of injury and fatality among young novice drivers. Using real crash data, this study further explores the multifaceted and complex nature of young novice drivers' crash injury risk by synthesizing different driver attributes and crash scenarios in order to update and validate previous research findings and provide more feasible recommendations for preventive measures. METHODS Detailed data on traffic crash of young novice drivers were extracted from the National Automobile Accident In-Depth Investigation System (NAIS) in China, and a mixed research methodology using a Random Forest and multinomial logit modeling framework was used in order to explore and study the important influences on traffic crash injuries of young novice drivers in Songjiang District, Shanghai, during the period from 2018 to 2022. RESULTS The results of the study showed that human, vehicle, road and environmental characteristics contributed 36.83%, 22.65%, 17.07% and 23.45% respectively to the prediction of crash injury level of novice drivers. Among the various single factors, driver negligence was the most important factor affecting the crash injury level of novice drivers. Age of the vehicle, crash location, road signal condition and time of crash all had a significant effect on the crash injury level of young novice drivers (95% of the confidence level). CONCLUSIONS The study comprehensively analyzed young novice driver crash data to reveal the crash injury risk and its severity faced by young novice drivers in different contexts, and suggested targeted safety improvements. There are similarities and differences with the results of previous studies, in which there are new contributions to understanding the driving risks of young novice drivers in daytime and nighttime.
Collapse
Affiliation(s)
- Yansong Zhu
- School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Yubin Qian
- School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Jiejie Xu
- Shanghai Intelligent Vehicle Fusion Innovation Center Co., Shanghai, China
| | - Wenhao Hu
- Key Laboratory of Product Defect and Safety for State Market Regulation, Beijing, China
| |
Collapse
|
10
|
Chen H, Yang F, Duan Y, Yang L, Li J. A novel higher performance nomogram based on explainable machine learning for predicting mortality risk in stroke patients within 30 days based on clinical features on the first day ICU admission. BMC Med Inform Decis Mak 2024; 24:161. [PMID: 38849903 PMCID: PMC11161998 DOI: 10.1186/s12911-024-02547-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 05/21/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUND This study aimed to develop a higher performance nomogram based on explainable machine learning methods, and to predict the risk of death of stroke patients within 30 days based on clinical characteristics on the first day of intensive care units (ICU) admission. METHODS Data relating to stroke patients were extracted from the Medical Information Marketplace of the Intensive Care (MIMIC) IV and III database. The LightGBM machine learning approach together with Shapely additive explanations (termed as explain machine learning, EML) was used to select clinical features and define cut-off points for the selected features. These selected features and cut-off points were then evaluated using the Cox proportional hazards regression model and Kaplan-Meier survival curves. Finally, logistic regression-based nomograms for predicting 30-day mortality of stroke patients were constructed using original variables and variables dichotomized by cut-off points, respectively. The performance of two nomograms were evaluated in overall and individual dimension. RESULTS A total of 2982 stroke patients and 64 clinical features were included, and the 30-day mortality rate was 23.6% in the MIMIC-IV datasets. 10 variables ("sofa (sepsis-related organ failure assessment)", "minimum glucose", "maximum sodium", "age", "mean spo2 (blood oxygen saturation)", "maximum temperature", "maximum heart rate", "minimum bun (blood urea nitrogen)", "minimum wbc (white blood cells)" and "charlson comorbidity index") and respective cut-off points were defined from the EML. In the Cox proportional hazards regression model (Cox regression) and Kaplan-Meier survival curves, after grouping stroke patients according to the cut-off point of each variable, patients belonging to the high-risk subgroup were associated with higher 30-day mortality than those in the low-risk subgroup. The evaluation of nomograms found that the EML-based nomogram not only outperformed the conventional nomogram in NIR (net reclassification index), brier score and clinical net benefits in overall dimension, but also significant improved in individual dimension especially for low "maximum temperature" patients. CONCLUSIONS The 10 selected first-day ICU admission clinical features require greater attention for stroke patients. And the nomogram based on explainable machine learning will have greater clinical application.
Collapse
Affiliation(s)
- Haoran Chen
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100020, China.
- Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing, 100020, China.
| | - Fengchun Yang
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100020, China
- Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing, 100020, China
| | - Yifan Duan
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100020, China
| | - Lin Yang
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100020, China
- Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing, 100020, China
| | - Jiao Li
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100020, China.
- Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing, 100020, China.
| |
Collapse
|
11
|
Xue H, Guo P, Li Y, Ma J. Integrating visual factors in crash rate analysis at Intersections: An AutoML and SHAP approach towards cycling safety. ACCIDENT; ANALYSIS AND PREVENTION 2024; 200:107544. [PMID: 38493612 DOI: 10.1016/j.aap.2024.107544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 02/18/2024] [Accepted: 03/09/2024] [Indexed: 03/19/2024]
Abstract
Cycling crashes constitute a significant and rising share of traffic accidents. Consequently, exploring factors affecting cycling safety has become a priority for both governmental bodies and scholars. However, most existing studies have neglected the vision factors capable of quantitatively describing the city-level cycling environment. Moreover, they have relied on limited models that lack interpretability and fail to capture the spatial variations in the contribution of factors. To address these gaps, this research proposed a framework that used origin-destination-based cycling flow and vision factors generated from Google Street View images to identify the leading factors. It also employed the comparative Automatic Machine Learning and interpretable SHAP value-based geospatial analysis to explain each factor's contribution to the cycling crash risk, with a particular focus on the spatial variations in the influence of vision factors. The effectiveness of this framework was validated by a case study in Manhattan, which examined the leading risk factors of cycling crash rates at intersections. The results showed that the LightGBM model, with selected subsets of factors, outperformed other models. Through SHAP explanations of global feature importance, the study identified the proportion of road barriers, the proportion of open sky, and the number of visible trucks as the leading visual risk factors. Additionally, using SHAP-based geospatial analysis, the study revealed the local variations in the effects of these three factors and identified eight areas with higher cycling crash rates. Based on these findings, the study provided practical measures for a safer cycling environment in Manhattan.
Collapse
Affiliation(s)
- Huiyuan Xue
- Department of Urban Planning and Design, The University of Hong Kong, Hong Kong, China.
| | - Peizhuo Guo
- Department of Urban Planning and Design, The University of Hong Kong, Hong Kong, China.
| | - Yiyan Li
- Department of Urban Planning and Design, The University of Hong Kong, Hong Kong, China; Department of Geography, The University of Hong Kong, Hong Kong, China.
| | - Jun Ma
- Department of Urban Planning and Design, The University of Hong Kong, Hong Kong, China; Urban Systems Institute, The University of Hong Kong, Hong Kong, China.
| |
Collapse
|
12
|
Yoo JW, Park J, Park H. Enhancing safety of construction workers in Korea: an integrated text mining and machine learning framework for predicting accident types. Int J Inj Contr Saf Promot 2024; 31:203-215. [PMID: 38164519 DOI: 10.1080/17457300.2023.2300424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 12/24/2023] [Indexed: 01/03/2024]
Abstract
Construction workers face a high risk of various occupational accidents, many of which can result in fatalities. This study aims to develop a prediction model for nine prevalent types of construction accidents, utilizing construction tasks, activities, and tools/materials as input features, through the application of machine learning-based multi-class classification algorithms. 152,867 construction accident summary reports, composed of both structured (construction task, construction activity, accident type) and unstructured data (tools/materials) were used for the study. The study employed several data processing techniques, including keyword extraction through text mining, Boruta feature selection, and SMOTE data resampling enhance model accuracy. Three performance metrics (Multi-class area under the receiver operating characteristic curve (MAUC), Multi-class Matthews Correlation Coefficient (MMCC), Geometric-mean (G-mean)) were used to compare the predictive performance of four machine learning algorithms, including Decision tree, Random forest, Naïve bayes, and XGBoost. Of the four algorithms, XGBoost showed the highest performance in predicting accident type (MAUC: 0.8603, MMCC: 0.3523, G-mean: 0.5009). Furthermore, a Shapley additive explanation (SHAP) analysis was conducted to visualize feature importance. The findings of this study make a valuable contribution to improving construction safety by presenting a prediction model for accident types derived from real-world big data.
Collapse
Affiliation(s)
- Joon Woo Yoo
- Department of Industrial Engineering, Yonsei University, Seoul, South Korea
| | - Junsung Park
- Department of Industrial Engineering, Yonsei University, Seoul, South Korea
| | - Heejun Park
- Department of Industrial Engineering, Yonsei University, Seoul, South Korea
| |
Collapse
|
13
|
Yu Q, Shi C, Bai Y, Zhang J, Lu Z, Xu Y, Li W, Liu C, Soomro SEH, Tian L, Hu C. Interpretable baseflow segmentation and prediction based on numerical experiments and deep learning. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 360:121089. [PMID: 38733842 DOI: 10.1016/j.jenvman.2024.121089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/11/2024] [Accepted: 05/03/2024] [Indexed: 05/13/2024]
Abstract
Baseflow is a crucial water source in the inland river basins of high-cold mountainous region, playing a significant role in maintaining runoff stability. It is challenging to select the most suitable baseflow separation method in data-scarce high-cold mountainous region and to evaluate effects of climate factors and underlying surface changes on baseflow variability and seasonal distribution characteristics. Here we attempt to address how meteorological factors and underlying surface changes affect baseflow using the Grey Wolf Optimizer Digital Filter Method (GWO-DFM) for rapid baseflow separation and the Long Short-Term Memory (LSTM) neural network model for baseflow prediction, clarifying interpretability of the LSTM model in baseflow forecasting. The proposed method was successfully implemented using a 63-year time series (1958-2020) of flow data from the Tai Lan River (TLR) basin in the high-cold mountainous region, along with 21 years of ERA5-land meteorological data and MODIS data (2000-2020). The results indicate that: (1) GWO-DFM can rapidly identify the optimal filtering parameters. It employs the arithmetic average of three methods, namely Chapman, Chapman-Maxwell and Eckhardt filter, as the best baseflow separation approach for the TLR basin. Additionally, the baseflow significantly increases after the second mutation of the baseflow rate. (2) Baseflow sources are mainly influenced by precipitation infiltration, glacier frozen soil layers, and seasonal ponding. (3) Solar radiation, temperature, precipitation, and NDVI are the primary factors influencing baseflow changes, with Nash-Sutcliffe efficiency coefficients exceeding 0.78 in both the LSTM model training and prediction periods. (4) Changes in baseflow are most influenced by solar radiation, temperature, and NDVI. This study systematically analyzes the changes in baseflow and response mechanisms in high-cold mountainous region, contributing to the management of water resources in mountainous basins under changing environmental conditions.
Collapse
Affiliation(s)
- Qiying Yu
- School of Water Conservancy and Transportation, Zhengzhou University, Henan, China; Xinjiang Institute of Water Resources and Hydropower Research, Xinjiang, 830049, China
| | - Chen Shi
- School of Water Conservancy and Transportation, Zhengzhou University, Henan, China
| | - Yungang Bai
- Xinjiang Institute of Water Resources and Hydropower Research, Xinjiang, 830049, China.
| | - Jianghui Zhang
- Xinjiang Institute of Water Resources and Hydropower Research, Xinjiang, 830049, China
| | - Zhenlin Lu
- Xinjiang Institute of Water Resources and Hydropower Research, Xinjiang, 830049, China
| | - Yingying Xu
- School of Water Conservancy and Transportation, Zhengzhou University, Henan, China
| | - Wenzhong Li
- School of Water Conservancy and Transportation, Zhengzhou University, Henan, China
| | - Chengshuai Liu
- School of Water Conservancy and Transportation, Zhengzhou University, Henan, China
| | - Shan-E-Hyder Soomro
- College of Hydraulic and Environmental Engineering, China Three Gorges University, Yichang, 443002, China
| | - Lu Tian
- School of Water Conservancy and Transportation, Zhengzhou University, Henan, China
| | - Caihong Hu
- School of Water Conservancy and Transportation, Zhengzhou University, Henan, China.
| |
Collapse
|
14
|
Khattak A, Chan PW, Chen F, Peng H. Interpretable ensemble imbalance learning strategies for the risk assessment of severe-low-level wind shear based on LiDAR and PIREPs. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2024; 44:1084-1102. [PMID: 37700727 DOI: 10.1111/risa.14215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 06/05/2023] [Accepted: 08/22/2023] [Indexed: 09/14/2023]
Abstract
The occurrence of severe low-level wind shear (S-LLWS) events in the vicinity of airport runways poses a significant threat to flight safety and exacerbates a burgeoning problem in civil aviation. Identifying the risk factors that contribute to occurrences of S-LLWS can facilitate the improvement of aviation safety. Despite the significant influence of S-LLWS on aviation safety, its occurrence is relatively infrequent in comparison to non-SLLWS incidents. In this study, we develop an S-LLWS risk prediction model through the utilization of ensemble imbalance learning (EIL) strategies, namely, BalanceCascade, EasyEnsemble, and RUSBoost. The data for this study were obtained from PIREPs and LiDAR at Hong Kong International Airport. The analysis revealed that the BalanceCascade strategy outperforms EasyEnsemble and RUSBoost in terms of prediction performance. Afterward, the SHapley Additive exPlanations (SHAP) interpretation tool was used in conjunction with the BalanceCascade model for the risk assessment of various factors. The four most influential risk factors, according to the SHAP interpretation tool, were hourly temperature, runway 25LD, runway 25LA, and RWY (encounter location of LLWS). S-LLWS was likely to happen at Runway 25LD and Runway 25LA in temperatures ranging from low to moderate. Similarly, a high proportion of S-LLWS events occurred near the runway threshold, and a relatively small proportion occurred away from it. The EIL strategies in conjunction with the SHAP interpretation tool may accurately predict the S-LLWS without the need for data augmentation in the data pre-processing phase.
Collapse
Affiliation(s)
- Afaq Khattak
- Key Laboratory of Infrastructure Durability and Operation Safety in Airfield of Civil Aviation Administration of China, College of Transportation Engineering, Tongji University, Jiading, Shanghai, China
| | - Pak-Wai Chan
- Hong Kong Observatory, Kowloon, Hong Kong, China
| | - Feng Chen
- Key Laboratory of Infrastructure Durability and Operation Safety in Airfield of Civil Aviation Administration of China, College of Transportation Engineering, Tongji University, Jiading, Shanghai, China
| | - Haorong Peng
- Key Laboratory of Infrastructure Durability and Operation Safety in Airfield of Civil Aviation Administration of China, College of Transportation Engineering, Tongji University, Jiading, Shanghai, China
| |
Collapse
|
15
|
Matin M, Dehghanian A, Dastranj M, Darijani H. Explainable artificial intelligence modeling of internal arc in a medium voltage switchgear based on different CFD simulations. Heliyon 2024; 10:e29594. [PMID: 38665570 PMCID: PMC11044042 DOI: 10.1016/j.heliyon.2024.e29594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/17/2024] [Accepted: 04/10/2024] [Indexed: 04/28/2024] Open
Abstract
The internal arc represents an unintentional release of electrical energy within the switchgear industry. Manufacturers must address this electro-thermal issue in their switchgears. Over the past decades, various researchers and engineering groups have examined the internal arc pressure rise in switchgears to mitigate damages. The high variability in pressure rise among switchgears due to diverse factors such as design, manufacturing, and electrical parameters results in varying reported pressure increases. This issue motivates the application of artificial intelligence (AI) in interpreting internal arc modeling. The present paper explores the impact of manufacturing parameters such as total duct width (TDW), height (H), and ducts condition (DC), along with environmental parameters like initial pressure (IP) and initial temperature (IT), on the maximum pressure (MP) generated during an internal arc in a medium voltage (MV) switchgear. For this purpose, 54 different computational fluid dynamics (CFD) models were built using the parameters indicated. An extreme gradient boosting (XGBoost) machine learning (ML) model was trained using different CFD models, with MP serving as the target variable for the ML model. The obtained results reveal a variation in the MP of the internal arc under the mentioned parameters, ranging from 17835.45 Pa to 144423.2 Pa. Using SHAP data revealed that IP, TDW, and DC were the most significant factors affecting the pressure increase of the internal arc phenomena.
Collapse
Affiliation(s)
- Mahmood Matin
- R&D Department, Kerman Tablo Corporation, Kerman, Iran
- Mechanical Engineering Department, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Amir Dehghanian
- Department of Mechanical Engineering, Shiraz University of Technology, Shiraz, Iran
| | - Mohammad Dastranj
- R&D Department, Kerman Tablo Corporation, Kerman, Iran
- Mechanical Engineering Department, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Hossein Darijani
- R&D Department, Kerman Tablo Corporation, Kerman, Iran
- Mechanical Engineering Department, Shahid Bahonar University of Kerman, Kerman, Iran
| |
Collapse
|
16
|
Yu X, Ma J, Tang Y, Yang T, Jiang F. Can we trust our eyes? Interpreting the misperception of road safety from street view images and deep learning. ACCIDENT; ANALYSIS AND PREVENTION 2024; 197:107455. [PMID: 38218132 DOI: 10.1016/j.aap.2023.107455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 12/20/2023] [Accepted: 12/31/2023] [Indexed: 01/15/2024]
Abstract
Road safety is a critical concern that impacts both human lives and urban development, drawing significant attention from city managers and researchers. The perception of road safety has gained increasing research interest due to its close connection with the behavior of road users. However, safety isn't always as it appears, and there is a scarcity of studies examining the association and mismatch between road traffic safety and road safety perceptions at the city scale, primarily due to the time-consuming nature of data acquisition. In this study, we applied an advanced deep learning model and street view images to predict and map human perception scores of road safety in Manhattan. We then explored the association and mismatch between these perception scores and traffic crash rates, while also interpreting the influence of the built environment on this disparity. The results showed that there was heterogeneity in the distribution of road safety perception scores. Furthermore, the study found a positive correlation between perception scores and crash rates, indicating that higher perception scores were associated with higher crash rates. In this study, we also concluded four perception patterns: "Safer than it looks", "Safe as it looks", "More dangerous than it looks", and "Dangerous as it looks". Wall view index, tree view index, building view index, distance to the nearest traffic signals, and street width were found to significantly influence these perception patterns. Notably, our findings underscored the crucial role of traffic lights in the "More dangerous than it looks" pattern. While traffic lights may enhance people's perception of safety, areas in close proximity to traffic lights were identified as potentially accident-prone regions.
Collapse
Affiliation(s)
- Xujing Yu
- Department of Urban Planning and Design, The University of Hong Kong, Hong Kong, China
| | - Jun Ma
- Department of Urban Planning and Design, The University of Hong Kong, Hong Kong, China; Urban Systems Institute, The University of Hong Kong, Hong Kong, China.
| | - Yihong Tang
- Department of Urban Planning and Design, The University of Hong Kong, Hong Kong, China
| | - Tianren Yang
- Department of Urban Planning and Design, The University of Hong Kong, Hong Kong, China; Urban Systems Institute, The University of Hong Kong, Hong Kong, China
| | - Feifeng Jiang
- Faculty of Architecture, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
17
|
Yao S, Wu Q, Kang Q, Chen YW, Lu Y. An interpretable XGBoost-based approach for Arctic navigation risk assessment. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2024; 44:459-476. [PMID: 37330273 DOI: 10.1111/risa.14175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 03/14/2023] [Accepted: 05/07/2023] [Indexed: 06/19/2023]
Abstract
The Northern Sea Route (NSR) makes travel between Europe and Asia shorter and quicker than a southern transit via the Strait of Malacca and Suez Canal. It provides greater access to Arctic resources such as oil and gas. As global warming accelerates, melting Arctic ice caps are likely to increase traffic in the NSR and enhance its commercial viability. Due to the harsh Arctic environment imposing threats to the safety of ship navigation, it is necessary to assess Arctic navigation risk to maintain shipping safety. Currently, most studies are focused on the conventional assessment of the risk, which lacks the validation based on actual data. In this study, actual data about Arctic navigation environment and related expert judgments were used to generate a structured data set. Based on the structured data set, extreme gradient boosting (XGBoost) and alternative methods were used to establish models for the assessment of Arctic navigation risk, which were validated using cross-validation. The results show that compared with alternative models, XGBoost models have the best performance in terms of mean absolute errors and root mean squared errors. The XGBoost models can learn and reproduce expert judgments and knowledge for the assessment of Arctic navigation risk. Feature importance (FI) and shapley additive explanations (SHAP) are used to further interpret the relationship between input data and predictions. The application of XGBoost, FI, and SHAP is aimed to improve the safety of Arctic shipping using advanced artificial intelligence techniques. The validated assessment enhances the quality and robustness of assessment.
Collapse
Affiliation(s)
- Shuaiyu Yao
- Department of Control Science and Engineering, Tongji University, Shanghai, China
| | - Qinhao Wu
- Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Qi Kang
- Department of Control Science and Engineering, Tongji University, Shanghai, China
| | - Yu-Wang Chen
- Alliance Manchester Business School (AMBS), The University of Manchester, Manchester, UK
| | - Yi Lu
- COSCO Shipping Special Transportation Co., Ltd, Guangzhou, China
| |
Collapse
|
18
|
Wang X, Zhang X, Pei Y. A systematic approach to macro-level safety assessment and contributing factors analysis considering traffic crashes and violations. ACCIDENT; ANALYSIS AND PREVENTION 2024; 194:107323. [PMID: 37864889 DOI: 10.1016/j.aap.2023.107323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 09/03/2023] [Accepted: 09/17/2023] [Indexed: 10/23/2023]
Abstract
During rapid urbanization and increase in motorization, it becomes particularly important to understand the relationships between traffic safety and risk factors in order to provide targeted improvements and policy recommendations. Violations and police enforcement are key variables, but the endogenous relationship between crashes and violations has made these variables unreliable and has limited their use. To manage this problem, this study developed a systematic approach for the joint modeling of crashes and violations to identify crash and violation hotspots and examine the mechanisms underlying macro-level contributing factors. Socio-economic, road network, public facility, traffic enforcement, and land use intensity data from 115 towns in Suzhou, China, were collected as independent variables. A bivariate negative binomial spatial conditional autoregressive model (BNB-CAR) and the potential for safety improvement (PSI) method were adopted to identify crash-prone and violation-prone areas, and an interpretable machine learning framework was applied to explore the factors' effects by area. Results showed that the proposed framework was able to accurately identify problem areas and quantify the impact of key factors, which, in Suzhou, were the number of traffic police and their daily patrol time. Considering such enforcement-related information provided important insights into reducing crash and violation frequency; for example, keeping the number of traffic police and daily patrol time under certain thresholds (number of police lower than 11 and patrol time lower than 2.3 h in this sample) was as effective as increasing these numbers for reducing the probability of high-crash and high-violation areas. The proposed approach can help traffic administrators identify the key contributing factors, especially enforcement factors, in crash-prone and violation-prone areas and provide guidelines for improvement.
Collapse
Affiliation(s)
- Xuesong Wang
- School of Transportation Engineering, Tongji University, Shanghai 201804, China; The Key Laboratory of Road and Traffic Engineering, Ministry of Education, Shanghai 201804, China.
| | - Xueyu Zhang
- School of Transportation Engineering, Tongji University, Shanghai 201804, China; The Key Laboratory of Road and Traffic Engineering, Ministry of Education, Shanghai 201804, China
| | - Yingying Pei
- School of Transportation Engineering, Tongji University, Shanghai 201804, China; The Key Laboratory of Road and Traffic Engineering, Ministry of Education, Shanghai 201804, China
| |
Collapse
|
19
|
Chen H, Wang M, Li J. Exploring the association between two groups of metals with potentially opposing renal effects and renal function in middle-aged and older adults: Evidence from an explainable machine learning method. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 269:115812. [PMID: 38091680 DOI: 10.1016/j.ecoenv.2023.115812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 11/12/2023] [Accepted: 12/08/2023] [Indexed: 01/12/2024]
Abstract
BACKGROUND Machine learning models have promising applications in capturing the complex relationship between mixtures of exposures and outcomes. OBJECTIVE Our study aimed at introducing an explainable machine learning (EML) model to assess the association between metal mixtures with potentially opposing renal effects and renal function in middle-aged and older adults. METHODS This study extracted data from two cycle years of the National Health and Nutrition Examination Survey (NHANES). Participants aged 45 years or older with complete data on six metals (lead, cadmium, manganese, mercury, and selenium) and related covariates were enrolled. The EML model was developed by the optimized machine learning model together with Shapley Additive exPlanations (SHAP) to assess the chronic kidney disease (CKD) risk with metal mixtures. The results from EML were further compared in detail with multiple logistic regression (MLR) and Bayesian kernel machine regression (BKMR). RESULTS After adjusting for included covariates, MLR pointed out the lead and arsenic were generally positively associated with CKD, but manganese had a negative association. In the BKMR analysis, each metal was found to have a non-linear association with the risk of CKD, and interactions can exist between metals, especially for arsenic and lead. The EML ranked the feature importance: lead, manganese, arsenic and selenium were close behind in importance after gender, age or BMI for participants with CKD. Strong interactions between mercury and lead, manganese and cadmium and arsenic and manganese were identified by partial dependence plot (PDP) of SHAP and bivariate exposure-response effect plots of BKMR. The EML model determined the "trigger point" at which the risk of CKD abruptly changed. CONCLUSION Co-exposure to metals with different nephrotoxicity could have different joint association with renal function, and EML can be a powerful method for studying complex exposure mixtures.
Collapse
Affiliation(s)
- Haoran Chen
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China
| | - Min Wang
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China
| | - Jiao Li
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, China.
| |
Collapse
|
20
|
Sun Z, Wang D, Gu X, Abdel-Aty M, Xing Y, Wang J, Lu H, Chen Y. A hybrid approach of random forest and random parameters logit model of injury severity modeling of vulnerable road users involved crashes. ACCIDENT; ANALYSIS AND PREVENTION 2023; 192:107235. [PMID: 37557001 DOI: 10.1016/j.aap.2023.107235] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 07/12/2023] [Accepted: 07/23/2023] [Indexed: 08/11/2023]
Abstract
Vulnerable road users (VRUs) involved crashes are a major road safety concern due to the high likelihood of fatal and severe injury. The use of data-driven methods and heterogeneity models separately have limitations in crash data analysis. This study develops a hybrid approach of Random Forest based SHAP algorithm (RF-SHAP) and random parameters logit modeling framework to explore significant factors and identify the underlying interaction effects on injury severity of VRUs-involved crashes in Shenyang (China) from 2015 to 2017. The results show that the hybrid approach can uncover more underlying causality, which not only quantifies the impact of individual factors on injury severity, but also finds the interaction effects between the factors with random parameters and fixed parameters. Seven factors are found to have significant effect on crash injury severity. Two factors, including primary roads and rural areas produce random parameters. The interaction effects reveal interesting combination features. For example, even though rural areas and primary roads increase the likelihood of fatal crash occurrence individually, the interaction effect of the two factors decreases the likelihood of being fatal. The findings form the foundation for developing safety countermeasures targeted at specific crash groups for reducing fatalities in future crashes.
Collapse
Affiliation(s)
- Zhiyuan Sun
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing 100124, China
| | - Duo Wang
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing 100124, China
| | - Xin Gu
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing 100124, China.
| | - Mohamed Abdel-Aty
- Department of Civil, Environmental and Construction Engineering, University of Central Florida Orlando, FL 32826-2450, United States
| | - Yuxuan Xing
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing 100124, China
| | - Jianyu Wang
- Beijing Key Laboratory of General Aviation Technology, Beijing University of Civil Engineering and Architecture, Beijing 102616, China
| | - Huapu Lu
- Institute of Transportation Engineering, Tsinghua University, Beijing 100084, China
| | - Yanyan Chen
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
21
|
Zhang R, Zhu R, Jia M, Pang Y, Zhang B, Bao X, Wang Y. Improvement of a Rapid Method of Detecting Gasoline Detergency Based on the Image Recognition. ACS OMEGA 2023; 8:34134-34145. [PMID: 37744810 PMCID: PMC10515347 DOI: 10.1021/acsomega.3c05350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 08/29/2023] [Indexed: 09/26/2023]
Abstract
The detergency of motor gasoline is closely related to vehicle exhaust emissions and fuel economy. This paper proposed an improved method for the rapid detection of gasoline detergency based on the deposit images of test gasoline on aluminum plates produced by a multichannel gasoline detergency simulation test (MGST). The detection algorithm system was structured to recognize the deposit plate images by computer vision based on the convolutional neural networks (CNNs). Compared with the traditional simulation test, the improved MGST method resulted in significant reductions in fuel consumption, cost, and test time. The performance of three transfer learning models (Inception-ResNet-V2, Inception-V3, and ResNet50-V2) and a customized CNN was evaluated in the detection algorithm system, and their detection accuracies reached 94, 94, 88, and 82%. Inception-RsNet-V2 was selected due to its higher accuracy and better robustness. Based on the model interpretation, it is evident that the model undergoes feature extraction from the sediment deposits on the deposit plate. Subsequently, it employed the acquired deposit features to accurately detect gasoline samples that failed to meet detergency standards. This approach was proved to be effective in enhancing the detection process and ensuring reliable results for gasoline detergency evaluation. It is beneficial to environmental protection regulators for managing market gasoline detergency and urban mobile source pollution. In addition, a deposit plate image database should be established to further improve the detection model performance during the environmental regulation.
Collapse
Affiliation(s)
- Rongshuo Zhang
- School
of Ecology and Environment, Zhengzhou University, Zhengzhou 450001, China
| | - Rencheng Zhu
- School
of Ecology and Environment, Zhengzhou University, Zhengzhou 450001, China
- State
Environmental Protection Key Laboratory of Vehicle Emission Control
and Simulation, Chinese Research Academy
of Environmental Sciences, Beijing 100012, China
| | - Ming Jia
- State
Environmental Protection Key Laboratory of Vehicle Emission Control
and Simulation, Chinese Research Academy
of Environmental Sciences, Beijing 100012, China
| | - Yujie Pang
- School
of Ecology and Environment, Zhengzhou University, Zhengzhou 450001, China
| | - Bowen Zhang
- School
of Ecology and Environment, Zhengzhou University, Zhengzhou 450001, China
| | - Xiaofeng Bao
- State
Environmental Protection Key Laboratory of Vehicle Emission Control
and Simulation, Chinese Research Academy
of Environmental Sciences, Beijing 100012, China
- National
Engineering Laboratory for Mobile Source Emission Control Technology, Tianjin 300399, China
| | - Yunjing Wang
- State
Environmental Protection Key Laboratory of Vehicle Emission Control
and Simulation, Chinese Research Academy
of Environmental Sciences, Beijing 100012, China
| |
Collapse
|
22
|
Almannaa M, Zawad MN, Moshawah M, Alabduljabbar H. Investigating the effect of road condition and vacation on crash severity using machine learning algorithms. Int J Inj Contr Saf Promot 2023; 30:392-402. [PMID: 37079354 DOI: 10.1080/17457300.2023.2202660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 03/14/2023] [Accepted: 04/10/2023] [Indexed: 04/21/2023]
Abstract
Investigating the contributing factors to traffic crash severity is a demanding topic in research focusing on traffic safety and policies. This research investigates the impact of 16 roadway condition features and vacations (along with the spatial and temporal factors and road geometry) on crash severity for major intra-city roads in Saudi Arabia. We used a crash dataset that covers four years (Oct. 2016 - Feb. 2021) with more than 59,000 crashes. Machine learning algorithms were utilized to predict the crash severity outcome (non-fatal/fatal) for three types of roads: single, multilane, and freeway. Furthermore, features that have a strong impact on crash severity were examined. Results show that only 4 out of 16 road condition variables were found to be contributing to crash severity, namely: paints, cat eyes, fence side, and metal cable. Additionally, vacation was found to be a contributing factor to crash severity, meaning crashes that occur on vacation are more severe than non-vacation days.
Collapse
Affiliation(s)
- Mohammed Almannaa
- Department of Civil Engineering, College of Engineering, King Saud University, Riyadh, Saudi Arabia
| | - Md Nabil Zawad
- Department of Civil Engineering, College of Engineering, King Saud University, Riyadh, Saudi Arabia
| | - May Moshawah
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Haifa Alabduljabbar
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
23
|
Sun Z, Wang D, Gu X, Xing Y, Wang J, Lu H, Chen Y. A hybrid clustering and random forest model to analyse vulnerable road user to motor vehicle (VRU-MV) crashes. Int J Inj Contr Saf Promot 2023; 30:338-351. [PMID: 37643462 DOI: 10.1080/17457300.2023.2180804] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 12/28/2022] [Accepted: 02/11/2023] [Indexed: 02/24/2023]
Abstract
The main goal of this study is to investigate the unobserved heterogeneity in VRU-MV crash data and to determine the relatively important contributing factors of injury severity. For this end, a latent class analysis (LCA) coupled with random parameters logit model (LCA-RPL) is developed to segment the VRU-MV crashes into relatively homogeneous clusters and to explore the differences among clusters. The random-forest-based SHapley Additive exPlanation (RF-SHAP) approach is used to explore the relative importance of the contributing factors for injury severity in each cluster. The results show that, vulnerable group (VG), intersection or not (ION) and road type (RT) clearly distinguish the crash clusters. Moto-vehicle type and functional zone have significant impact on the injury severity among all clusters. Several variables (e.g. ION, crash type [CT], season and RT) demonstrate a significant effect in a specific sub-cluster model. Results of this study provide specific and insightful countermeasures that target the contributing factors in each cluster for mitigating VRU-MV crash injury severity.
Collapse
Affiliation(s)
- Zhiyuan Sun
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing, PRChina
| | - Duo Wang
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing, PRChina
| | - Xin Gu
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing, PRChina
| | - Yuxuan Xing
- China Academy of Urban Planning and Design, Beijing, PRChina
| | - Jianyu Wang
- Beijing Key Laboratory of General Aviation Technology, Beijing University of Civil Engineering and Architecture, Beijing, PRChina
| | - Huapu Lu
- Institute of Transportation Engineering, Tsinghua University, Beijing, PRChina
| | - Yanyan Chen
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing, PRChina
| |
Collapse
|
24
|
Farzipour A, Elmi R, Nasiri H. Detection of Monkeypox Cases Based on Symptoms Using XGBoost and Shapley Additive Explanations Methods. Diagnostics (Basel) 2023; 13:2391. [PMID: 37510135 PMCID: PMC10378557 DOI: 10.3390/diagnostics13142391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/03/2023] [Accepted: 07/11/2023] [Indexed: 07/30/2023] Open
Abstract
The monkeypox virus poses a novel public health risk that might quickly escalate into a worldwide epidemic. Machine learning (ML) has recently shown much promise in diagnosing diseases like cancer, finding tumor cells, and finding COVID-19 patients. In this study, we have created a dataset based on the data both collected and published by Global Health and used by the World Health Organization (WHO). Being entirely textual, this dataset shows the relationship between the symptoms and the monkeypox disease. The data have been analyzed, using gradient boosting methods such as Extreme Gradient Boosting (XGBoost), CatBoost, and LightGBM along with other standard machine learning methods such as Support Vector Machine (SVM) and Random Forest. All these methods have been compared. The research aims to provide an ML model based on symptoms for the diagnosis of monkeypox. Previous studies have only examined disease diagnosis using images. The best performance has belonged to XGBoost, with an accuracy of 1.0 in reviews. To check the model's flexibility, k-fold cross-validation is used, reaching an average accuracy of 0.9 in 5 different splits of the test set. In addition, Shapley Additive Explanations (SHAP) helps in examining and explaining the output of the XGBoost model.
Collapse
Affiliation(s)
- Alireza Farzipour
- Department of Computer Science, Semnan University, Semnan 35131-19111, Iran
| | - Roya Elmi
- Farzanegan Campus, Semnan University, Semnan 35197-34851, Iran
| | - Hamid Nasiri
- Department of Computer Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran 15916-34311, Iran
| |
Collapse
|
25
|
Masello L, Castignani G, Sheehan B, Guillen M, Murphy F. Using contextual data to predict risky driving events: A novel methodology from explainable artificial intelligence. ACCIDENT; ANALYSIS AND PREVENTION 2023; 184:106997. [PMID: 36854225 DOI: 10.1016/j.aap.2023.106997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 01/07/2023] [Accepted: 02/01/2023] [Indexed: 06/18/2023]
Abstract
Usage-based insurance has allowed insurers to dynamically tailor insurance premiums by understanding when and how safe policyholders drive. However, telematics information can also be used to understand the driving contexts experienced by the driver within each trip (e.g., road types, weather, traffic). Since different combinations of these conditions affect exposure to accidents, this understanding introduces predictive opportunities in driving risk assessment. This paper investigates the relationships between driving context combinations and risk using a naturalistic driving dataset of 77,859 km. In particular, XGBoost and Random Forests are used to determine the predictive significance of driving contexts for near-misses, speeding and distraction events. Moreover, the most important contextual factors in predicting these risky events are identified and ranked through Shapley Additive Explanations. The results show that the driving context has significant power in predicting driving risk. Speed limit, weather temperature, wind speed, traffic conditions and road slope appear in the top ten most relevant features for most risky events. Analysing contextual feature variations and their influence on risky events showed that low-speed limits increase the predicted frequency of speeding and phone unlocking events, whereas high-speed limits decrease harsh accelerations. Low temperatures decrease the expected frequency of harsh manoeuvres, and precipitations increase harsh acceleration, harsh braking, and distraction events. Furthermore, road slope, intersections and pavement quality are the most critical factors among road layout attributes. The methodology presented in this study aims to support road safety stakeholders and insurers by providing insights to study the contextual risk factors that influence road accident frequency and driving risk.
Collapse
Affiliation(s)
- Leandro Masello
- University of Limerick, Limerick KB3-040, Ireland; Motion-S S.A., Mondorf-les-Bains L-5610, Luxembourg
| | - German Castignani
- Motion-S S.A., Mondorf-les-Bains L-5610, Luxembourg; University of Luxembourg, Esch-sur-Alzette L-4365, Luxembourg
| | | | - Montserrat Guillen
- Department of Econometrics, Statistics and Applied Economics, Universitat de Barcelona, Avinguda Diagonal, 690, Barcelona 08034, Catalonia, Spain
| | | |
Collapse
|
26
|
Modeling industrial hydrocyclone operational variables by SHAP-CatBoost - A “conscious lab” approach. POWDER TECHNOL 2023. [DOI: 10.1016/j.powtec.2023.118416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
|
27
|
Yi Z, Wu L. Identification of factors influencing net primary productivity of terrestrial ecosystems based on interpretable machine learning --evidence from the county-level administrative districts in China. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 326:116798. [PMID: 36435139 DOI: 10.1016/j.jenvman.2022.116798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 11/10/2022] [Accepted: 11/13/2022] [Indexed: 06/16/2023]
Abstract
Global climate change is rooted in the imbalance between carbon sources and sinks, and net-zero greenhouse gas emissions should focus not only on the source-side drivers but also on the sink-side influencing factors. Taking the county-level administrative districts in China as the sample, this study uses machine learning models to fit the relationship between socioeconomic development (SED) and net primary productivity (NPP) of terrestrial ecosystems. Moreover, it identifies key influencing factors and their effects based on the SHapley Additive exPlanations (SHAP) algorithm. The results show that the districts with low terrestrial NPP show the characteristics of agglomeration distribution. The eight key factors, in order, are as follows: agricultural development level, latitude, population size, longitude, animal husbandry development level, economic scale, time trend and industrialization level. In this study, via SHAP interaction plots, we found that the effects of population, economic growth, and industrialization on terrestrial NPP are regionally heterogeneous; via cluster analysis, we found the stage characteristics of the mode of SED affecting terrestrial NPP. Therefore, the conservation of terrestrial NPP needs to be combined with the stage changes of SED, as well as inter-regional differences, to develop a regionally coordinated and time-coherent ecological carbon sink conservation plan.
Collapse
Affiliation(s)
- Zhaoqiang Yi
- School of Economics and Management, Southeast University, Nanjing, 211189, China
| | - Lihua Wu
- School of Economics and Management, Southeast University, Nanjing, 211189, China.
| |
Collapse
|
28
|
Zou Z, Wu Q, Wang J, Xu L, Zhou M, Lu Z, He Y, Wang Y, Liu B, Zhao Y. Research on non-destructive testing of hotpot oil quality by fluorescence hyperspectral technology combined with machine learning. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023; 284:121785. [PMID: 36058172 DOI: 10.1016/j.saa.2022.121785] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 08/21/2022] [Accepted: 08/23/2022] [Indexed: 06/15/2023]
Abstract
Eating repeatedly used hotpot oil will cause serious harm to human health. In order to realize rapid non-destructive testing of hotpot oil quality, a modeling experiment method of fluorescence hyperspectral technology combined with machine learning algorithm was proposed. Five preprocessing algorithms were used to preprocess the original spectral data, which realized data denoising and reduces the influence of baseline drift and tilt. The feature bands extracted from the spectral data showed that the best feature bands for the two-classification model and the six-classification model were concentrated between 469 and 962 nm and 534-809 nm, respectively. Using the PCA algorithm to visualize the spectral data, the results showed the distribution of the six types of samples intuitively, and indicated that the data could be classified. Based on the modeling analysis of the feature bands, the results showed that the best two-classification models and the best six-classification models were MF-RF-RF and MF-XGBoost-LGB models, respectively, and the classification accuracy reached 100 %. Compared with the traditional model, the error was greatly reduced, and the calculation time was also saved. This study confirmed that fluorescence hyperspectral technology combined with machine learning algorithm could effectively realize the detection of reused hotpot oil.
Collapse
Affiliation(s)
- Zhiyong Zou
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Xin Kang Road, Yucheng District, Ya'an 625014, PR China
| | - Qingsong Wu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Xin Kang Road, Yucheng District, Ya'an 625014, PR China
| | - Jian Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Xin Kang Road, Yucheng District, Ya'an 625014, PR China
| | - Lijia Xu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Xin Kang Road, Yucheng District, Ya'an 625014, PR China
| | - Man Zhou
- College of Food Sciences, Sichuan Agricultural University, Xin Kang Road, Yucheng District, Ya'an 625014, PR China
| | - Zhiwei Lu
- College of Science, Sichuan Agricultural University, Xin Kang Road, Yucheng District, Ya'an 625014, PR China
| | - Yong He
- College of Biosystems Engineering and Food Science, Zhejiang University, 866, Yuhangtang Road, Hangzhou 310058, PR China
| | - Yuchao Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Xin Kang Road, Yucheng District, Ya'an 625014, PR China
| | - Bi Liu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Xin Kang Road, Yucheng District, Ya'an 625014, PR China
| | - Yongpeng Zhao
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Xin Kang Road, Yucheng District, Ya'an 625014, PR China.
| |
Collapse
|
29
|
Wang ZZ, Lu YN, Zou ZH, Ma YH, Wang T. Applying OHSA to Detect Road Accident Blackspots. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:16970. [PMID: 36554851 PMCID: PMC9779212 DOI: 10.3390/ijerph192416970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 12/12/2022] [Accepted: 12/14/2022] [Indexed: 06/17/2023]
Abstract
With increasing numbers of crashes and injuries, understanding traffic accident spatial patterns and identifying blackspots is critical to improve overall road safety. This study aims at detecting blackspots using optimized hot spot analysis (OHSA). Traffic accidents were classified by their participants and severity to explore the relationship between blackspots and different types of accidents. Based on the outputs of incremental spatial autocorrelation, OHSA was then implemented on different types of accidents. Finally, the performance of OHSA in evaluating the road safety level of the proposed RBT index are examined using a binary correlation analysis (i.e., R2 = 0.89). The results show that: (1) The optimal scale distance varies from 0.6 km to 2.8 km and is influenced by the distance of the travel mode. (2) Central cities, with 54.6% of the total accidents, experiences more rigorous challenges regarding traffic safety than satellite cities. (3) There are many types of black spots in vulnerable communities, but in some specific areas, there are only black spots of non-motor vehicle accidents. Considering the practical significance of the above results, policy makers and traffic engineers are expected to give higher attention to central cities and vulnerable communities or prioritize the implementation of relevant optimization measures.
Collapse
|
30
|
Iranmanesh M, Seyedabrishami S, Moridpour S. Identifying high crash risk segments in rural roads using ensemble decision tree-based models. Sci Rep 2022; 12:20024. [PMID: 36414672 PMCID: PMC9681741 DOI: 10.1038/s41598-022-24476-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 11/16/2022] [Indexed: 11/24/2022] Open
Abstract
Traffic safety forecast models are mainly used to rank road segments. While existing studies have primarily focused on identifying segments in urban networks, rural networks have received less attention. However, rural networks seem to have a higher risk of severe crashes. This paper aims to analyse traffic crashes on rural roads to identify the influencing factors on the crash frequency and present a framework to develop a spatial-temporal crash risk map to prioritise high-risk segments on different days. The crash data of Khorasan Razavi province is used in this study. Crash frequency data with the temporal resolution of one day and spatial resolution of 1500 m from loop detectors are analysed. Four groups of influential factors, including traffic parameters (e.g. traffic flow, speed, time headway), road characteristics (e.g. road type, number of lanes), weather data (e.g. daily rainfall, snow depth, temperature), and calendar variables (e.g. day of the week, public holidays, month, year) are used for model calibration. Three different decision tree algorithms, including, Decision Tree (DT), Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) have been employed to predict crash frequency. Results show that based on the traditional evaluation measures, the XGBosst is better for the explanation and interpretation of the factors affecting crash frequency, while the RF model is better for detecting trends and forecasting crash frequency. According to the results, the traffic flow rate, road type, year of the crash, and wind speed are the most influencing variables in predicting crash frequency on rural roads. Forecasting the high and medium risk segment-day in the rural network can be essential to the safety management plan. This risk will be sensitive to real traffic data, weather forecasts and road geometric characteristics. Seventy percent of high and medium risk segment-day are predicted for the case study.
Collapse
Affiliation(s)
- Maryam Iranmanesh
- grid.412266.50000 0001 1781 3962Faculty of Civil and Environmental Engineering, Tarbiat Modares University, Tehran, Iran
| | - Seyedehsan Seyedabrishami
- grid.412266.50000 0001 1781 3962Faculty of Civil and Environmental Engineering, Tarbiat Modares University, Tehran, Iran
| | - Sara Moridpour
- grid.1017.70000 0001 2163 3550Civil and Infrastructure Engineering Discipline, RMIT University, Melbourne, Australia
| |
Collapse
|
31
|
Wu Q, Xu L, Zou Z, Wang J, Zeng Q, Wang Q, Zhen J, Wang Y, Zhao Y, Zhou M. Rapid nondestructive detection of peanut varieties and peanut mildew based on hyperspectral imaging and stacked machine learning models. FRONTIERS IN PLANT SCIENCE 2022; 13:1047479. [PMID: 36438117 PMCID: PMC9685660 DOI: 10.3389/fpls.2022.1047479] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 10/12/2022] [Indexed: 06/16/2023]
Abstract
Moldy peanut seeds are damaged by mold, which seriously affects the germination rate of peanut seeds. At the same time, the quality and variety purity of peanut seeds profoundly affect the final yield of peanuts and the economic benefits of farmers. In this study, hyperspectral imaging technology was used to achieve variety classification and mold detection of peanut seeds. In addition, this paper proposed to use median filtering (MF) to preprocess hyperspectral data, use four variable selection methods to obtain characteristic wavelengths, and ensemble learning models (SEL) as a stable classification model. This paper compared the model performance of SEL and extreme gradient boosting algorithm (XGBoost), light gradient boosting algorithm (LightGBM), and type boosting algorithm (CatBoost). The results showed that the MF-LightGBM-SEL model based on hyperspectral data achieves the best performance. Its prediction accuracy on the data training and data testing reach 98.63% and 98.03%, respectively, and the modeling time was only 0.37s, which proved that the potential of the model to be used in practice. The approach of SEL combined with hyperspectral imaging techniques facilitates the development of a real-time detection system. It could perform fast and non-destructive high-precision classification of peanut seed varieties and moldy peanuts, which was of great significance for improving crop yields.
Collapse
Affiliation(s)
- Qingsong Wu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Lijia Xu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Zhiyong Zou
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Jian Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Qifeng Zeng
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Qianlong Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Jiangbo Zhen
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Yuchao Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Yongpeng Zhao
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Man Zhou
- College of Food Sciences, Sichuan Agricultural University, Yaan, China
| |
Collapse
|
32
|
Hou L, Liu Y, Xie W, Dai Z, Yang W, Zhao Y. Statistical neural network (SNN) for predicting signal-to-noise ratio (SNR) from static parameters and its validation in 16-bit, 125-MSPS analog-to-digital converters (ADCs). THE REVIEW OF SCIENTIFIC INSTRUMENTS 2022; 93:084701. [PMID: 36050066 DOI: 10.1063/5.0093709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 07/11/2022] [Indexed: 06/15/2023]
Abstract
In the analog-to-digital converter (ADC) test process, the static and dynamic performance parameters are the most important, and the tests for these parameters account for the bulk of the ADC test cost. These two types of parameters follow certain relationships, which are incorporated into the ADC test to reduce the cost. In this paper, we focus on the signal-to-noise ratio (SNR), a key indicator of the dynamic performances of ADCs. A statistical neural network (SNN) with two hidden layers was constructed to predict the SNR from the feature variables, which were extracted from the static parameters. A 16-bit, 125-MSPS ADC was used to evaluate the proposed prediction model. Compared to the measured SNR obtained by traditional fast Fourier transform based test methods, the predicted value had a mean average error of only 0.75 dB. In addition, the Shapley additive explanations interpreter was adopted to analyze the feature dependences of the SNN model, and the results demonstrated that the deterioration of the integral nonlinearity-curve-related features could significantly decrease the SNR, which is consistent with previous research results. The reported results demonstrated that, at the cost of a slight loss of accuracy, the proposed SNN can significantly reduce the test complexity, avoid dynamic parameter measurements, and reduce the total test time by about 4%.
Collapse
Affiliation(s)
- Linjie Hou
- Shenzhen Institute for Advanced Study, UESTC, Shenzhen, China
| | - Yvtao Liu
- Testing Center The 58th Research Institute of China Electronics Technology Corporation, Wuxi, China
| | - Weikun Xie
- Testing Center The 58th Research Institute of China Electronics Technology Corporation, Wuxi, China
| | - Zhijian Dai
- School of Automation Engineering of University of Electronic and Technology of China, Chengdu, China
| | - Wanyv Yang
- School of Automation Engineering of University of Electronic and Technology of China, Chengdu, China
| | - Yijiu Zhao
- Shenzhen Institute for Advanced Study, UESTC, Shenzhen, China
| |
Collapse
|
33
|
An Explainable Machine Learning Framework for Forecasting Crude Oil Price during the COVID-19 Pandemic. AXIOMS 2022. [DOI: 10.3390/axioms11080374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Financial institutions, investors, central banks and relevant corporations need an efficient and reliable forecasting approach for determining the future of crude oil price in an effort to reach optimal decisions under market volatility. This paper presents an innovative research framework for precisely predicting crude oil price movements and interpreting the predictions. First, it compares six advanced machine learning (ML) models, including two state-of-the-art methods: extreme gradient boosting (XGB) and the light gradient boosting machine (LGBM). Second, it selects novel data, including user search big data, digital currencies and data on the COVID-19 epidemic. The empirical results suggest that LGBM outperforms other alternative ML models. Finally, it proposes an interpretable framework for facilitating decision making to interpret the prediction results of complex ML models and for verifying the importance of various features affecting crude oil price. The results of this paper provide practical guidance for participants in the crude oil market.
Collapse
|
34
|
Fatahi R, Nasiri H, Dadfar E, Chehreh Chelgani S. Modeling of energy consumption factors for an industrial cement vertical roller mill by SHAP-XGBoost: a "conscious lab" approach. Sci Rep 2022; 12:7543. [PMID: 35534588 PMCID: PMC9085744 DOI: 10.1038/s41598-022-11429-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 04/25/2022] [Indexed: 11/30/2022] Open
Abstract
Cement production is one of the most energy-intensive manufacturing industries, and the milling circuit of cement plants consumes around 4% of a year's global electrical energy production. It is well understood that modeling and digitalizing industrial-scale processes would help control production circuits better, improve efficiency, enhance personal training systems, and decrease plants' energy consumption. This tactical approach could be integrated using conscious lab (CL) as an innovative concept in the internet age. Surprisingly, no CL has been reported for the milling circuit of a cement plant. A robust CL interconnect datasets originated from monitoring operational variables in the plants and translating them to human basis information using explainable artificial intelligence (EAI) models. By initiating a CL for an industrial cement vertical roller mill (VRM), this study conducted a novel strategy to explore relationships between VRM monitored operational variables and their representative energy consumption factors (output temperature and motor power). Using SHapley Additive exPlanations (SHAP) as one of the most recent EAI models accurately helped fill the lack of information about correlations within VRM variables. SHAP analyses highlighted that working pressure and input gas rate with positive relationships are the key factors influencing energy consumption. eXtreme Gradient Boosting (XGBoost) as a powerful predictive tool could accurately model energy representative factors by R-square ever 0.80 in the testing phase. Comparison assessments indicated that SHAP-XGBoost could provide higher accuracy for VRM-CL structure than conventional modeling tools (Pearson correlation, Random Forest, and Support vector regression.
Collapse
|
35
|
Wen X, Xie Y, Jiang L, Li Y, Ge T. On the interpretability of machine learning methods in crash frequency modeling and crash modification factor development. ACCIDENT; ANALYSIS AND PREVENTION 2022; 168:106617. [PMID: 35202941 DOI: 10.1016/j.aap.2022.106617] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 01/29/2022] [Accepted: 02/15/2022] [Indexed: 06/14/2023]
Abstract
Machine learning (ML) model interpretability has attracted much attention recently given the promising performance of ML methods in crash frequency studies. Extracting accurate relationship between risk factors and crash frequency is important for understanding the causal effects of risk factors and developing safety countermeasures. However, there is no study that comprehensively summarizes ML model interpretation methods and provides guidance for safety researchers and practitioners. This research aims to fill this gap. Model-based and post-hoc ML interpretation methods are critically evaluated and compared to study their suitability in crash frequency modeling. These methods include classification and regression tree (CART), multivariate adaptive regression splines (MARS), Local Interpretable Model-agnostic Explanations (LIME), Local Sensitivity Analysis (LSA), Partial Dependence Plots (PDP), Global Sensitivity Analysis (GSA), and SHapley Additive exPlanations (SHAP). Model-based interpretation methods cannot reveal the detailed interaction relationships among risk factors. LIME can only be used to analyze the effects of a risk factor at the prediction level. LSA and PDP assume that different risk factors are independently distributed. Both GSA and SHAP can account for the potential correlation among risk factors. However, only SHAP can visualize the detailed relationships between crash outcomes and risk factors. This study also demonstrates the potential and benefits of using ML and SHAP to derive Crash Modification Factors (CMF). Finally, it is emphasized that statistical and ML models may not directly differentiate causation from correlation. Understanding the differences between them is critical for developing reliable safety countermeasures.
Collapse
Affiliation(s)
- Xiao Wen
- Department of Civil and Environmental Engineering, University of Massachusetts Lowell, United States
| | - Yuanchang Xie
- Department of Civil and Environmental Engineering, University of Massachusetts Lowell, United States.
| | - Liming Jiang
- Department of Civil and Environmental Engineering, University of Massachusetts Lowell, United States
| | - Yan Li
- Department of Computer Science, University of Massachusetts Lowell, United States
| | - Tingjian Ge
- Department of Computer Science, University of Massachusetts Lowell, United States
| |
Collapse
|
36
|
Dong S, Khattak A, Ullah I, Zhou J, Hussain A. Predicting and Analyzing Road Traffic Injury Severity Using Boosting-Based Ensemble Learning Models with SHAPley Additive exPlanations. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19052925. [PMID: 35270617 PMCID: PMC8910532 DOI: 10.3390/ijerph19052925] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 02/20/2022] [Accepted: 02/28/2022] [Indexed: 12/10/2022]
Abstract
Road traffic accidents are one of the world’s most serious problems, as they result in numerous fatalities and injuries, as well as economic losses each year. Assessing the factors that contribute to the severity of road traffic injuries has proven to be insightful. The findings may contribute to a better understanding of and potential mitigation of the risk of serious injuries associated with crashes. While ensemble learning approaches are capable of establishing complex and non-linear relationships between input risk variables and outcomes for the purpose of injury severity prediction and classification, most of them share a critical limitation: their “black-box” nature. To develop interpretable predictive models for road traffic injury severity, this paper proposes four boosting-based ensemble learning models, namely a novel Natural Gradient Boosting, Adaptive Gradient Boosting, Categorical Gradient Boosting, and Light Gradient Boosting Machine, and uses a recently developed SHapley Additive exPlanations analysis to rank the risk variables and explain the optimal model. Among four models, LightGBM achieved the highest classification accuracy (73.63%), precision (72.61%), and recall (70.09%), F1-scores (70.81%), and AUC (0.71) when tested on 2015–2019 Pakistan’s National Highway N-5 (Peshawar to Rahim Yar Khan Section) accident data. By incorporating the SHapley Additive exPlanations approach, we were able to interpret the model’s estimation results from both global and local perspectives. Following interpretation, it was determined that the Month_of_Year, Cause_of_Accident, Driver_Age and Collision_Type all played a significant role in the estimation process. According to the analysis, young drivers and pedestrians struck by a trailer have a higher risk of suffering fatal injuries. The combination of trailers and passenger vehicles, as well as driver at-fault, hitting pedestrians and rear-end collisions, significantly increases the risk of fatal injuries. This study suggests that combining LightGBM and SHAP has the potential to develop an interpretable model for predicting road traffic injury severity.
Collapse
Affiliation(s)
- Sheng Dong
- School of Civil and Transportation Engineering, Ningbo University of Technology, Fenghua Road No. 201, Ningbo 315211, China;
| | - Afaq Khattak
- The Key Laboratory of Road and Traffic Engineering, Ministry of Education, Tongji University, 4800 Cao’an Road, Jiading, Shanghai 201804, China
- Correspondence:
| | - Irfan Ullah
- Department of Civil Engineering, International Islamic University, Sector H-10, Islamabad 1243, Pakistan;
| | - Jibiao Zhou
- College of Transportation Engineering, Tongji University, 4800 Cao’an Road, Jiading, Shanghai 201804, China;
| | - Arshad Hussain
- NUST Institute of Civil Engineering, National University of Sciences and Technology, Sector H-12, Islamabad 44000, Pakistan;
| |
Collapse
|
37
|
Chang I, Park H, Hong E, Lee J, Kwon N. Predicting effects of built environment on fatal pedestrian accidents at location-specific level: Application of XGBoost and SHAP. ACCIDENT; ANALYSIS AND PREVENTION 2022; 166:106545. [PMID: 34995959 DOI: 10.1016/j.aap.2021.106545] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/05/2021] [Accepted: 12/13/2021] [Indexed: 06/14/2023]
Abstract
Understanding locally heterogeneous physical contexts in built environment is of great importance in developing preemptive countermeasures to mitigate pedestrian fatality risks. In this study, we aim to investigate the non-linear relationship between physical factors and pedestrian fatality at a location-specific level using a machine learning approach. The state-of-art machine learning algorithm, eXtreme Gradient Boosting (XGBoost), is employed for a binary classification problem, in which nationwide locations where fatal pedestrian accidents occurred for the years from 2012 to 2019 in Korea serve as positive samples (np = 13,366). For negative samples, locations with no pedestrian accidents are selected randomly to the size that is 10 times larger (nn = 133,660) than positive samples. Fifteen features under the categories of road conditions, road facilities, road networks, and land uses are assigned to both the positive and negative sample locations using Geographic Information System (GIS). A method is proposed to avoid the class imbalance problem, and a final unbiased model is utilized to predict fatal pedestrian risks at the negative sample locations. In addition, Shapley Additive Explanations (SHAP) is introduced to provide a robust interpretation of the XGBoos prediction results. It is shown that 21.6% of the negative sample locations have a probability of fatal pedestrian accidents greater than 0.5 (or 78.4% accuracy). Generally, a road segment that lies in many of the shortest routes in a dense residential area with many lively activities from aligned buildings is a potential spot for fatal pedestrian accidents. However, based on the SHAP interpretation, the relationships between the features and pedestrian fatality are found nonlinear and locally heterogeneous. We discuss the implications of this result has for drafting policy recommendations to reduce pedestrian fatalities.
Collapse
Affiliation(s)
- Iljoon Chang
- Department of Urban Planning, Gacheon University, Seongnam, South Korea
| | | | - Eungi Hong
- MIM Institute Co. Ltd, Seoul, South Korea
| | - Jaeduk Lee
- Department of Urban Planning, Gacheon University, Seongnam, South Korea
| | - Namju Kwon
- Department of Urban Planning, Gacheon University, Seongnam, South Korea
| |
Collapse
|
38
|
Mukhopadhyay A, Pettet G, Vazirizade SM, Lu D, Jaimes A, Said SE, Baroud H, Vorobeychik Y, Kochenderfer M, Dubey A. A Review of Incident Prediction, Resource Allocation, and Dispatch Models for Emergency Management. ACCIDENT; ANALYSIS AND PREVENTION 2022; 165:106501. [PMID: 34929574 DOI: 10.1016/j.aap.2021.106501] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 11/14/2021] [Accepted: 11/15/2021] [Indexed: 06/14/2023]
Abstract
In the last fifty years, researchers have developed statistical, data-driven, analytical, and algorithmic approaches for designing and improving emergency response management (ERM) systems. The problem has been noted as inherently difficult and constitutes spatio-temporal decision making under uncertainty, which has been addressed in the literature with varying assumptions and approaches. This survey provides a detailed review of these approaches, focusing on the key challenges and issues regarding four sub-processes: (a) incident prediction, (b) incident detection, (c) resource allocation, and (c) computer-aided dispatch for emergency response. We highlight the strengths and weaknesses of prior work in this domain and explore the similarities and differences between different modeling paradigms. We conclude by illustrating open challenges and opportunities for future research in this complex domain.
Collapse
Affiliation(s)
- Ayan Mukhopadhyay
- Electrical Engineering and Computer Science, Vanderbilt University, USA
| | - Geoffrey Pettet
- Electrical Engineering and Computer Science, Vanderbilt University, USA
| | | | | | | | | | - Hiba Baroud
- Civil and Environmental Engineering, Vanderbilt University, USA
| | | | | | - Abhishek Dubey
- Electrical Engineering and Computer Science, Vanderbilt University, USA
| |
Collapse
|
39
|
Wei N, Zhang Q, Zhang Y, Jin J, Chang J, Yang Z, Ma C, Jia Z, Ren C, Wu L, Peng J, Mao H. Super-learner model realizes the transient prediction of CO 2 and NOx of diesel trucks: Model development, evaluation and interpretation. ENVIRONMENT INTERNATIONAL 2022; 158:106977. [PMID: 34775187 DOI: 10.1016/j.envint.2021.106977] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 10/20/2021] [Accepted: 11/08/2021] [Indexed: 06/13/2023]
Abstract
The transient simulation of CO2 and NOX from motor vehicles has essential applications in evaluating vehicular greenhouse gas emissions and pollutant emissions. However, accurately estimating vehicular transient emissions is challenging due to the heterogeneity between different vehicles and the continuous upgrading of vehicle exhaust purification technology. To accurately characterize the transient emissions of motor vehicles, a Super-learner model is used to build CO2 and NOx transient emission models. The actual onboard test data of 9 China VI N2 vehicles were used to train the model, and the test data of another China VI N2 vehicle were selected for further robustness verification. There were significant differences in the emissions between the vehicles, but the constructed transient model could capture the common law of transient emissions from China VI N2 vehicles. The R2 values of CO2 and NOx emission in the test data of the validation vehicle were 0.71 and 0.82, respectively. In addition, to further prove the model's robustness, the training data were synchronously modelled based on the Moves-method. The Super-learner model has a smaller RMSE on the validation set than the model based on the Moves-method, indicating that the Super-learner model has more transient simulation advantages. The marginal contributions of the model characteristics to the model results were analysed by SHapley Additive exPlanation (SHAP) value interpretation, and the marginal contributions of different pollutant characteristic parameters varied. Therefore, when establishing transient models of different pollutants, the selection of the model parameters demands considering the generation and purification process of different pollutants. The present work provides novel insights into the parameter selection, construction, and interpretation of the transient vehicle emission model.
Collapse
Affiliation(s)
- Ning Wei
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China
| | - Qijun Zhang
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China.
| | - Yanjie Zhang
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China
| | - Jiaxin Jin
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China
| | - Junyu Chang
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China
| | - Zhiwen Yang
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China
| | - Chao Ma
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China
| | - Zhenyu Jia
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China
| | - Chunzhe Ren
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China
| | - Lin Wu
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China
| | - Jianfei Peng
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China
| | - Hongjun Mao
- Tianjin Key Laboratory of Urban Transport Emission Research & State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China.
| |
Collapse
|
40
|
Chen S, Shao H, Ji X. Insights into Factors Affecting Traffic Accident Severity of Novice and Experienced Drivers: A Machine Learning Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph182312725. [PMID: 34886451 PMCID: PMC8656871 DOI: 10.3390/ijerph182312725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/24/2021] [Accepted: 11/30/2021] [Indexed: 11/16/2022]
Abstract
Traffic accidents have significant financial and social impacts. Reducing the losses caused by traffic accidents has always been one of the most important issues. This paper presents an effort to investigate the factors affecting the accident severity of drivers with different driving experience. Special focus was placed on the combined effect of driving experience and age. Based on our dataset (traffic accidents that occurred between 2005 and 2021 in Shaanxi, China), CatBoost model was applied to deal with categorical feature, and SHAP (Shapley Additive exPlanations) model was used to interpret the output. Results show that accident cause, age, visibility, light condition, season, road alignment, and terrain are the key factors affecting accident severity for both novice and experienced drivers. Age has the opposite impact on fatal accident for novice and experienced drivers. Novice drivers younger than 30 or older than 55 are prone to suffer fatal accident, but for experienced drivers, the risk of fatal accident decreases when they are young and increases when they are old. These findings fill the research gap of the combined effect of driving experience and age on accident severity. Meanwhile, it can provide useful insights for practitioners to improve traffic safety for novice and experienced drivers.
Collapse
|
41
|
A Cost-Sensitive Diagnosis Method Based on the Operation and Maintenance Data of UAV. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app112311116] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In the fault diagnosis of UAVs, extremely imbalanced data distribution and vast differences in effects of fault modes can drastically affect the application effect of a data-driven fault diagnosis model under the limitation of computing resources. At present, there is still no credible approach to determine the cost of the misdiagnosis of different fault modes that accounts for the interference of data distribution. The performance of the original cost-insensitive flight data-driven fault diagnosis models also needs to be improved. In response to this requirement, this paper proposes a two-step ensemble cost-sensitive diagnosis method based on the operation and maintenance data of UAV. According to the fault criticality from FMECA information, we defined a misdiagnosis hazard value and calculated the misdiagnosis cost. By using the misdiagnosis cost, a static cost matrix could be set to modify the diagnosis model and to evaluate the performance of the diagnosis results. A two-step ensemble cost-sensitive method based on the MetaCost framework was proposed using stratified bootstrapping, choosing LightGBM as meta-classifiers, and adjusting the ensemble form to enhance the overall performance of the diagnosis model and reduce the occupation of the computing resources while optimizing the total misdiagnosis cost. The experimental results based on the KPG component data of a large fixed-wing UAV show that the proposed cost-sensitive model can effectively reduce the total cost incurred by misdiagnosis, without putting forward excessive requirements on the computing equipment under the condition of ensuring a certain overall level of diagnosis performance.
Collapse
|