1
|
Wang Z, Wang J, Yu D, Chen K. Groundwater potential assessment using GIS-based ensemble learning models in Guanzhong Basin, China. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:690. [PMID: 37199816 DOI: 10.1007/s10661-023-11388-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 05/11/2023] [Indexed: 05/19/2023]
Abstract
Groundwater plays a crucial role in sustaining industrial and agricultural production and meeting the water demands of the growing population in the semi-arid Guanzhong Basin of China. The objective of this study was to evaluate the groundwater potential of the region through the use of GIS-based ensemble learning models. Fourteen factors, including landform, slope, slope aspect, curvature, precipitation, evapotranspiration, distance to fault, distance to river, road density, topographic wetness index, soil type, lithology, land cover, and normalized difference vegetation index, were considered. Three ensemble learning models, namely random forest (RF), extreme gradient boosting (XGB), and local cascade ensemble (LCE), were trained and cross-validated using 205 sets of samples. The models were then applied to predict groundwater potential in the region. The XGB model was found to be the best, with an area under the curve (AUC) value of 0.874, followed by the RF model with an AUC of 0.859, and the LCE model with an AUC of 0.810. The XGB and LCE models were more effective than the RF model in discriminating between areas of high and low groundwater potential. This is because most of the RF model's prediction outcomes were concentrated in moderate groundwater potential areas, indicating that RF is less decisive when it comes to binary classification. In areas predicted to have very high and high groundwater potential, the proportions of samples with abundant groundwater were 33.6%, 69.31%, and 52.45% for RF, XGB, and LCE, respectively. In contrast, in areas predicted to have very low and low groundwater potential, the proportions of samples without groundwater were 57.14%, 66.67%, and 74.29% for RF, XGB, and LCE, respectively. The XGB model required the least amount of computational resources and achieved the highest accuracy, making it the most practical option for predicting groundwater potential. The results can be useful for policymakers and water resource managers in promoting the sustainable use of groundwater in the Guanzhong Basin and other similar regions.
Collapse
Affiliation(s)
- Zitao Wang
- Key Laboratory of Comprehensive and Highly Efficient Utilization of Salt Lake Resources, Qinghai Institute of Salt Lakes, Chinese Academy of Sciences, Xining, 810008, China
- Qinghai Provincial Key Laboratory of Geology and Environment of Salt Lakes, Xining, 810008, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jianping Wang
- Key Laboratory of Comprehensive and Highly Efficient Utilization of Salt Lake Resources, Qinghai Institute of Salt Lakes, Chinese Academy of Sciences, Xining, 810008, China.
- Qinghai Provincial Key Laboratory of Geology and Environment of Salt Lakes, Xining, 810008, China.
| | - Dongmei Yu
- Key Laboratory of Comprehensive and Highly Efficient Utilization of Salt Lake Resources, Qinghai Institute of Salt Lakes, Chinese Academy of Sciences, Xining, 810008, China
- Qinghai Provincial Key Laboratory of Geology and Environment of Salt Lakes, Xining, 810008, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Kai Chen
- School of Earth and Environment, Anhui University of Science and Technology, Huainan, 232001, China
| |
Collapse
|
2
|
Wang Z, Wang J, Yu D, Chen K. The potential evaluation of groundwater by integrating rank sum ratio (RSR) and machine learning algorithms in the Qaidam Basin. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:63991-64005. [PMID: 37059956 DOI: 10.1007/s11356-023-26961-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/08/2023] [Indexed: 04/16/2023]
Abstract
Groundwater is a vital resource in arid areas that sustains local industrial development and environmental preservation. Mapping groundwater potential zones and determining high-potential regions are essential for the responsible use of the local groundwater resource. When utilizing machine learning or deep learning algorithms to forecast groundwater potential in arid areas, difficulties such as inaccurate and overfitting predictions might occur due to a shortage of borehole samples. In this study, a database of groundwater conditioning factors with a size of 275,157 × 9 was created in the Qaidam Basin, and 85 known borehole samples were collected. The groundwater potential was evaluated using a combination of rank sum ratio (RSR), projection pursuit regression (PPR) and random forest (RF) algorithms, resulting in four models: PPR, RSR-PPR, RSR-RF, and RF. Results indicated that the groundwater potential was higher in mountainous regions surrounding the Qaidam Basin and decreased progressively towards the central and northwestern regions where most industries and facilities are located. The two primary factors, according to the PPR and RF models, were evapotranspiration (0.246, 0.225) and landform (0.176, 0.294). In terms of their ability to accurately forecast the borehole samples, the four models ranked as follows: RF > RSR-RF > RSR-PPR > PPR. The accuracy of the four models in the low-potential area was 0.73 (PPR), 0.60 (RSR-PPR), 0.87 (RSR-RF), and 0.80 (RF), respectively. However, the RF model showed overfitting due to a lack of samples, especially in high-potential regions, which limits its applicability. The RSR-RF method was applied directly to evaluate the entire factor database, avoiding the risk of overfitting caused by a limited number of training samples. The results demonstrate that the RSR-RF model is effective for classifying groundwater potential types in samples and mapping groundwater potential of the study area. This research presents a novel approach for groundwater potential predictions in areas with insufficient sample sizes, providing a reference for policymakers and researchers.
Collapse
Affiliation(s)
- Zitao Wang
- Key Laboratory of Comprehensive and Highly Efficient Utilization of Salt Lake Resources, Qinghai Institute of Salt Lakes, Chinese Academy of Sciences, Xining, 810008, China
- Qinghai Provincial Key Laboratory of Geology and Environment of Salt Lakes, Xining, 810008, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jianping Wang
- Key Laboratory of Comprehensive and Highly Efficient Utilization of Salt Lake Resources, Qinghai Institute of Salt Lakes, Chinese Academy of Sciences, Xining, 810008, China.
- Qinghai Provincial Key Laboratory of Geology and Environment of Salt Lakes, Xining, 810008, China.
| | - Dongmei Yu
- Key Laboratory of Comprehensive and Highly Efficient Utilization of Salt Lake Resources, Qinghai Institute of Salt Lakes, Chinese Academy of Sciences, Xining, 810008, China
- Qinghai Provincial Key Laboratory of Geology and Environment of Salt Lakes, Xining, 810008, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Kai Chen
- School of Earth and Environment, Anhui University of Science and Technology, Huainan, 232001, China
| |
Collapse
|
3
|
Kulsoom I, Hua W, Hussain S, Chen Q, Khan G, Shihao D. SBAS-InSAR based validated landslide susceptibility mapping along the Karakoram Highway: a case study of Gilgit-Baltistan, Pakistan. Sci Rep 2023; 13:3344. [PMID: 36849465 PMCID: PMC9971256 DOI: 10.1038/s41598-023-30009-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/14/2023] [Indexed: 03/01/2023] Open
Abstract
Geological settings of the Karakoram Highway (KKH) increase the risk of natural disasters, threatening its regular operations. Predicting landslides along the KKH is challenging due to limitations in techniques, a challenging environment, and data availability issues. This study uses machine learning (ML) models and a landslide inventory to evaluate the relationship between landslide events and their causative factors. For this, Extreme Gradient Boosting (XGBoost), Random Forest (RF), Artificial Neural Network (ANN), Naive Bayes (NB), and K Nearest Neighbor (KNN) models were used. A total of 303 landslide points were used to create an inventory, with 70% for training and 30% for testing. Susceptibility mapping used Fourteen landslide causative factors. The area under the curve (AUC) of a receiver operating characteristic (ROC) is employed to compare the accuracy of the models. The deformation of generated models in susceptible regions was evaluated using SBAS-InSAR (Small-Baseline subset-Interferometric Synthetic Aperture Radar) technique. The sensitive regions of the models showed elevated line-of-sight (LOS) deformation velocity. The XGBoost technique produces a superior Landslide Susceptibility map (LSM) for the region with the integration of SBAS-InSAR findings. This improved LSM offers predictive modeling for disaster mitigation and gives a theoretical direction for the regular management of KKH.
Collapse
Affiliation(s)
- Isma Kulsoom
- grid.503241.10000 0004 1760 9015School of Geography and Information Engineering, China University of Geosciences (Wuhan), Wuhan, 430074 China
| | - Weihua Hua
- School of Geography and Information Engineering, China University of Geosciences (Wuhan), Wuhan, 430074, China.
| | - Sadaqat Hussain
- grid.444938.60000 0004 0609 0078Department of Geological Engineering, University of Engineering and Technology, (Lahore), Lahore, 54890 Pakistan
| | - Qihao Chen
- grid.503241.10000 0004 1760 9015School of Geography and Information Engineering, China University of Geosciences (Wuhan), Wuhan, 430074 China
| | - Garee Khan
- grid.440534.20000 0004 0637 8987Department of Earth Sciences, Karakoram International University, Gilgit, 15100 Pakistan
| | - Dai Shihao
- grid.503241.10000 0004 1760 9015School of Geography and Information Engineering, China University of Geosciences (Wuhan), Wuhan, 430074 China
| |
Collapse
|
4
|
Derakhshan-Babaei F, Mirchooli F, Mohammadi M, Nosrati K, Egli M. Tracking the origin of trace metals in a watershed by identifying fingerprints of soils, landscape and river sediments. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 835:155583. [PMID: 35489478 DOI: 10.1016/j.scitotenv.2022.155583] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 04/18/2022] [Accepted: 04/25/2022] [Indexed: 06/14/2023]
Abstract
The identification of the spatial distribution of soil trace-elements and the contribution of different sources to the sediment yield is necessary for a better watershed and river water quality management. Until now, less attention has been paid to comprehensive assessments of sediment sources and soil trace-elements with respect to the suspended sediment production. The present study aimed at modelling the spatial distribution of soil trace-elements, quantifying the sediment sources apportionment and relating the landforms to polluted soils. Different techniques and approaches such as the Nemerow pollution index, machine learning algorithms (Random Forest (RF), generalised boosting methods (GBM), generalised linear models (GLM) and sediment fingerprinting were applied to the Kan watershed. A total of 79 soil samples having different Nemerow index values were considered for spatial modelling. Using statistical methods (Range test, Kruskal-Wallis and discrimination function analysis), an optimal set of tracers was selected. An unmixing model was applied to calculate the relative contribution of landforms for eight rainfall events. The results of the soil trace-element mapping showed that RF had the best performance with an accuracy of 83%. The evaluation of polluted soil areas showed that the landforms 'steep hills' and 'valley' contributed the most with 51% and 27% in the riparian zone, respectively. In addition, these landforms give a high contribution to sediment production in late-winter-spring events (29%) with a GOF (goodness of fit) of 0.65. The landform 'plain' had the highest contribution (28%) in sediment yield with a GOF of 0.72 in early-winter events. This means that the valley and steep hill landforms accelerate the transport of trace-elements across the watershed. Interestingly, the contribution of landforms varies during the year. Overall, the new proposed approach enables to better trace the origin of suspended sediments and trace-elements discharge into the river environment.
Collapse
Affiliation(s)
- Farzaneh Derakhshan-Babaei
- Department of Physical Geography, Faculty of Earth Sciences, Shahid Beheshti University, 1983969411 Tehran, Iran
| | - Fahimeh Mirchooli
- Department of Watershed Management and Engineering, Faculty of Natural Resources, Tarbiat Modares University, 46414-356 Tehran, Iran
| | - Maziar Mohammadi
- Department of Watershed Management and Engineering, Faculty of Natural Resources, Tarbiat Modares University, 46414-356 Tehran, Iran.
| | - Kazem Nosrati
- Department of Physical Geography, Faculty of Earth Sciences, Shahid Beheshti University, 1983969411 Tehran, Iran
| | - Markus Egli
- Department of Geography, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| |
Collapse
|
5
|
A Comparative Assessment of Machine Learning Models for Landslide Susceptibility Mapping in the Rugged Terrain of Northern Pakistan. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12052280] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
This study investigated the performances of different techniques, including random forest (RF), support vector machine (SVM), maximum entropy (maxENT), gradient-boosting machine (GBM), and logistic regression (LR), for landslide susceptibility mapping (LSM) in the rugged terrain of northern Pakistan. Initially, a landslide inventory of 200 samples was produced along with an additional 200 samples indicating nonlandslide areas and divided into training (70%) and validation (30%) groups using a stratified loop-based random sampling approach. Then, a geospatial database of 12 possible landslide influencing factors (LIFs) was generated, including elevation, slope, aspect, topographic wetness index (TWI), topographic position index (TPI), distance to drainage, distance to fault, distance to road, normalized difference vegetation index (NDVI), rainfall, land cover/land use (LCLU), and a geological map of the study area. None of the LIFs were redundant for the modeling, as indicated by the multicollinearity test (tolerance > 0.1) and information gain ratio (IGR > 0). We extended the evaluation measures of each algorithm from area-under-the-curve (AUC) analysis to the calculation of performance overall (POA) with the help of precision, recall, F1 score, accuracy (ACC), and Matthew’s correlation coefficient (MCC). The results showed that the SVM was the most promising model (AUC = 0.969, POA = 2669) for the LSM, followed by RF (AUC = 0.967, POA = 2656), GBM (AUC = 0.967, POA = 2623), maxENT (AUC = 0.872, POA = 1761), and LR (AUC = 0.836, POA = 1299). It is important to note that the SVM, RF, and GBM were the top performers, with almost similar accuracy. Thus, each of these could be equally effective for LSM and can be used for risk reduction and mitigation measures in the rugged terrain of Pakistan and other regions with similar topography.
Collapse
|
6
|
Research on the Application of GIS Technology Combined with RBFNN-GA Algorithm in the Delineation of Geological Hazard Prone Areas. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:2677453. [PMID: 34899888 PMCID: PMC8660228 DOI: 10.1155/2021/2677453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 11/09/2021] [Accepted: 11/15/2021] [Indexed: 12/02/2022]
Abstract
With the rapid development of the economy and society, geological disasters such as landslides, collapses, and mudslides have shown an intensifying trend, seriously endangering the safety of people's lives and property, and affecting the sustainable development of the economy and society. Aiming at the problems of merging different data layers and determining the weighting of data stacking in the statistical analysis model based on GIS technology in the evaluation of the risk of geological disasters, this study proposes a logistic regression model combined with the RBFNN-GA algorithm, that is, the determination of the occurrence of geological disasters. The fusion coefficient (CF value) with the RBFNN-GA algorithm model, and with the help of SPSS statistical analysis software, solves the problem of factor selection, heterogeneous data merging, and weighting of each data layer in the risk assessment. In the experimental stage, this study adopts the method of geological hazard certainty coefficients to carry out the sensitivity analysis of the geological hazards in the study area. Using homogeneous grid division, the spatial quantitative evaluation of the risk of geological disasters is realized, and at the same time, the results of the spatial quantitative evaluation of the risk of geological disasters are tested according to the latest landslide points in the region. The existing classification mainly depends on the acquisition of land use/cover information or the processing method of the acquired information, but the existing information acquisition will be limited by time, space, and spectral resolution. The results show that the number of landslide points per unit area in the extremely unstable zone and the unstable zone is 0.0395 points/km2 and 0.0251 points/km2, respectively, which is much higher than 0.0038 points/km2 in the stable zone, indicating the evaluation results and actual landslide conditions.
Collapse
|
7
|
Suggestion for a new deterministic model coupled with machine learning techniques for landslide susceptibility mapping. Sci Rep 2021; 11:6594. [PMID: 33758272 PMCID: PMC7988100 DOI: 10.1038/s41598-021-86137-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 03/10/2021] [Indexed: 01/31/2023] Open
Abstract
Deterministic models have been widely applied in landslide risk assessment (LRA), but they have limitations in obtaining various geotechnical and hydraulic properties. The objective of this study is to suggest a new deterministic method based on machine learning (ML) algorithms. Eight crucial variables of LRA are selected with reference to expert opinions, and the output value is set to the safety factor derived by Mohr-Coulomb failure theory in infinite slope. Linear regression and a neural network based on ML are applied to find the best model between independent and dependent variables. To increase the reliability of linear regression and the neural network, the results of back propagation, including gradient descent, Levenberg-Marquardt (LM), and Bayesian regularization (BR) methods, are compared. An 1800-item dataset is constructed through measured data and artificial data by using a geostatistical technique, which can provide the information of an unknown area based on measured data. The results of linear regression and the neural network show that the special LM and BR back propagation methods demonstrate a high determination of coefficient. The important variables are also investigated though random forest (RF) to overcome the number of various input variables. Only four variables-shear strength, soil thickness, elastic modulus, and fine content-demonstrate a high reliability for LRA. The results show that it is possible to perform LRA with ML, and four variables are enough when it is difficult to obtain various variables.
Collapse
|
8
|
A Novel Hybrid Method for Landslide Susceptibility Mapping-Based GeoDetector and Machine Learning Cluster: A Case of Xiaojin County, China. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2021. [DOI: 10.3390/ijgi10020093] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Landslide susceptibility mapping (LSM) could be an effective way to prevent landslide hazards and mitigate losses. The choice of conditional factors is crucial to the results of LSM, and the selection of models also plays an important role. In this study, a hybrid method including GeoDetector and machine learning cluster was developed to provide a new perspective on how to address these two issues. We defined redundant factors by quantitatively analyzing the single impact and interactive impact of the factors, which was analyzed by GeoDetector, the effect of this step was examined using mean absolute error (MAE). The machine learning cluster contains four models (artificial neural network (ANN), Bayesian network (BN), logistic regression (LR), and support vector machines (SVM)) and automatically selects the best one for generating LSM. The receiver operating characteristic (ROC) curve, prediction accuracy, and the seed cell area index (SCAI) methods were used to evaluate these methods. The results show that the SVM model had the best performance in the machine learning cluster with the area under the ROC curve of 0.928 and with an accuracy of 83.86%. Therefore, SVM was chosen as the assessment model to map the landslide susceptibility of the study area. The landslide susceptibility map demonstrated fit with landslide inventory, indicated the hybrid method is effective in screening landslide influences and assessing landslide susceptibility.
Collapse
|
9
|
Modeling the Settling Velocity of a Sphere in Newtonian and Non-Newtonian Fluids with Machine-Learning Algorithms. Symmetry (Basel) 2021. [DOI: 10.3390/sym13010071] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
The traditional procedure of predicting the settling velocity of a spherical particle is inconvenient as it involves iterations, complex correlations, and an unpredictable degree of uncertainty. The limitations can be addressed efficiently with artificial intelligence-based machine-learning algorithms (MLAs). The limited number of isolated studies conducted to date were constricted to specific fluid rheology, a particular MLA, and insufficient data. In the current study, the generalized application of ML was comprehensively investigated for Newtonian and three varieties of non-Newtonian fluids such as Power-law, Bingham, and Herschel Bulkley. A diverse set of nine MLAs were trained and tested using a large dataset of 967 samples. The ranges of generalized particle Reynolds number (ReG) and drag coefficient (CD) for the dataset were 10−3 < ReG (-) < 104 and 10−1 < CD (-) < 105, respectively. The performances of the models were statistically evaluated using an evaluation metric of the coefficient-of-determination (R2), root-mean-square-error (RMSE), mean-squared-error (MSE), and mean-absolute-error (MAE). The support vector regression with polynomial kernel demonstrated the optimum performance with R2 = 0.92, RMSE = 0.066, MSE = 0.0044, and MAE = 0.044. Its generalization capability was validated using the ten-fold-cross-validation technique, leave-one-feature-out experiment, and leave-one-data-set-out validation. The outcome of the current investigation was a generalized approach to modeling the settling velocity.
Collapse
|