1
|
Nguyen TT, Ngo HH, Guo W, Chang SW, Nguyen DD, Nguyen CT, Zhang J, Liang S, Bui XT, Hoang NB. A low-cost approach for soil moisture prediction using multi-sensor data and machine learning algorithm. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 833:155066. [PMID: 35398433 DOI: 10.1016/j.scitotenv.2022.155066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 03/30/2022] [Accepted: 04/02/2022] [Indexed: 06/14/2023]
Abstract
A high-resolution soil moisture prediction method has recently gained its importance in various fields such as forestry, agricultural and land management. However, accurate, robust and non- cost prohibitive spatially monitoring of soil moisture is challenging. In this research, a new approach involving the use of advance machine learning (ML) models, and multi-sensor data fusion including Sentinel-1(S1) C-band dual polarimetric synthetic aperture radar (SAR), Sentinel-2 (S2) multispectral data, and ALOS Global Digital Surface Model (ALOS DSM) to predict precisely soil moisture at 10 m spatial resolution across research areas in Australia. The total of 52 predictor variables generated from S1, S2 and ALOS DSM data fusion, including vegetation indices, soil indices, water index, SAR transformation indices, ALOS DSM derived indices like digital model elevation (DEM), slope, and topographic wetness index (TWI). The field soil data from Western Australia was employed. The performance capability of extreme gradient boosting regression (XGBR) together with the genetic algorithm (GA) optimizer for features selection and optimization for soil moisture prediction in bare lands was examined and compared with various scenarios and ML models. The proposed model (the XGBR-GA model) with 21 optimal features obtained from GA was yielded the highest performance (R2 = 0. 891; RMSE = 0.875%) compared to random forest regression (RFR), support vector machine (SVM), and CatBoost gradient boosting regression (CBR). Conclusively, the new approach using the XGBR-GA with features from combination of reliable free-of-charge remotely sensed data from Sentinel and ALOS imagery can effectively estimate the spatial variability of soil moisture. The described framework can further support precision agriculture and drought resilience programs via water use efficiency and smart irrigation management for crop production.
Collapse
Affiliation(s)
- Thu Thuy Nguyen
- Centre for Technology in Water and Wastewater, School of Civil and Environmental Engineering, University of Technology Sydney, Sydney, NSW 2007, Australia
| | - Huu Hao Ngo
- Centre for Technology in Water and Wastewater, School of Civil and Environmental Engineering, University of Technology Sydney, Sydney, NSW 2007, Australia.
| | - Wenshan Guo
- Centre for Technology in Water and Wastewater, School of Civil and Environmental Engineering, University of Technology Sydney, Sydney, NSW 2007, Australia
| | - Soon Woong Chang
- Department of Environmental Energy Engineering, Kyonggi University, 442-760, Republic of Korea
| | - Dinh Duc Nguyen
- Department of Environmental Energy Engineering, Kyonggi University, 442-760, Republic of Korea
| | - Chi Trung Nguyen
- Faculty of Science, Agriculture, Business and Law, UNE Business School, University of New England, Elm Avenue, Armidale, NSW 2351, Australia
| | - Jian Zhang
- School of Environmental Science and Engineering, Shandong University, Qingdao 266237, China
| | - Shuang Liang
- School of Environmental Science and Engineering, Shandong University, Qingdao 266237, China
| | - Xuan Thanh Bui
- Key Laboratory of Advanced Waste Treatment Technology & Faculty of Environment and Natural Resources, Ho Chi Minh City University of Technology (HCMUT), Vietnam National University Ho Chi Minh (VNU-HCM), Ho Chi Minh City 700000, Viet Nam
| | - Ngoc Bich Hoang
- NTT Institute of Hi-Technology, Nguyen Tat Thanh University, Ho Chi Minh City, Viet Nam
| |
Collapse
|
2
|
Predicting and Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran. REMOTE SENSING 2020. [DOI: 10.3390/rs12142234] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Estimation of the soil organic carbon (SOC) content is of utmost importance in understanding the chemical, physical, and biological functions of the soil. This study proposes machine learning algorithms of support vector machines (SVM), artificial neural networks (ANN), regression tree, random forest (RF), extreme gradient boosting (XGBoost), and conventional deep neural network (DNN) for advancing prediction models of SOC. Models are trained with 1879 composite surface soil samples, and 105 auxiliary data as predictors. The genetic algorithm is used as a feature selection approach to identify effective variables. The results indicate that precipitation is the most important predictor driving 14.9% of SOC spatial variability followed by the normalized difference vegetation index (12.5%), day temperature index of moderate resolution imaging spectroradiometer (10.6%), multiresolution valley bottom flatness (8.7%) and land use (8.2%), respectively. Based on 10-fold cross-validation, the DNN model reported as a superior algorithm with the lowest prediction error and uncertainty. In terms of accuracy, DNN yielded a mean absolute error of 0.59%, a root mean squared error of 0.75%, a coefficient of determination of 0.65, and Lin’s concordance correlation coefficient of 0.83. The SOC content was the highest in udic soil moisture regime class with mean values of 3.71%, followed by the aquic (2.45%) and xeric (2.10%) classes, respectively. Soils in dense forestlands had the highest SOC contents, whereas soils of younger geological age and alluvial fans had lower SOC. The proposed DNN (hidden layers = 7, and size = 50) is a promising algorithm for handling large numbers of auxiliary data at a province-scale, and due to its flexible structure and the ability to extract more information from the auxiliary data surrounding the sampled observations, it had high accuracy for the prediction of the SOC base-line map and minimal uncertainty.
Collapse
|
3
|
Zhang J, Xiong Y, Min S. A new hybrid filter/wrapper algorithm for feature selection in classification. Anal Chim Acta 2019; 1080:43-54. [DOI: 10.1016/j.aca.2019.06.054] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 05/26/2019] [Accepted: 06/26/2019] [Indexed: 10/26/2022]
|
4
|
Miyao T, Funatsu K. Iterative Screening Methods for Identification of Chemical Compounds with Specific Values of Various Properties. J Chem Inf Model 2019; 59:2626-2641. [PMID: 31058504 DOI: 10.1021/acs.jcim.9b00093] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Identification of chemical compounds having desirable properties is a central goal of screening campaigns. Iterative screening is a means of surveying a set of compounds, during which their property values are determined and used as feedback for regression models. Quantitative models that assess the relationships between chemical structures and property/activity are repeatedly updated through this type of cycle, and the efficient sampling of compounds for the subsequent test is a key factor in the early identification of target compounds. Nevertheless, methodological approaches to comparisons and to establishing the degree of extrapolation of sampled compounds, including the effects of applicability domains, are still required. In the present study, we conducted a series of virtual experiments to assess the characteristics of different iterative screening methods. Genetic algorithm-based partial least-squares regression, support vector regression, Bayesian optimization with Gaussian Process (GP), and batch-based Bayesian optimization with GP (GP_batch) were all compared, based on the analysis of one million compounds extracted from the ZINC database. Our results show that, irrespective of the diversity of the initial set of compounds, it was possible to identify a compound having the desired property value using the appropriate screening method. However, overall, the GP_batch method was found to be preferable when evaluating properties either which are difficult to predict or for which a key factor is present in the set of molecular descriptors.
Collapse
Affiliation(s)
- Tomoyuki Miyao
- Data Science Center and Graduate School of Science and Technology , Nara Institute of Science and Technology , 8916-5 Takayama-cho , Ikoma , Nara 630-0192 , Japan
| | - Kimito Funatsu
- Data Science Center and Graduate School of Science and Technology , Nara Institute of Science and Technology , 8916-5 Takayama-cho , Ikoma , Nara 630-0192 , Japan.,Department of Chemical System Engineering, School of Engineering , The University of Tokyo , 7-3-1 Hongo , Bunkyo-ku , Tokyo 113-8656 , Japan
| |
Collapse
|
5
|
Zhang J, Yan H, Xiong Y, Li Q, Min S. An ensemble variable selection method for vibrational spectroscopic data analysis. RSC Adv 2019; 9:6708-6716. [PMID: 35548689 PMCID: PMC9087301 DOI: 10.1039/c8ra08754g] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 01/14/2019] [Indexed: 11/30/2022] Open
Abstract
Wavelength selection is a critical factor for pattern recognition of vibrational spectroscopic data. Not only does it alleviate the effect of dimensionality on an algorithm's generalization performance, but it also enhances the understanding and interpretability of multivariate classification models. In this study, a novel partial least squares discriminant analysis (PLSDA)-based wavelength selection algorithm, termed ensemble of bootstrapping space shrinkage (EBSS), has been devised for vibrational spectroscopic data analysis. In the algorithm, a set of subsets are generated from a data set using random sampling. For an individual subset, a feature space is determined by maximizing the expected 10-fold cross-validation accuracy with a weighted bootstrap sampling strategy. Then an ensemble strategy and a sequential forward selection method are applied to the feature spaces to select characteristic variables. Experimental results obtained from analysis of real vibrational spectroscopic data sets demonstrate that the ensemble wavelength selection algorithm can reserve stable and informative variables for the final modeling and improve predictive ability for multivariate classification models. A new ensemble method for wavelength selection.![]()
Collapse
Affiliation(s)
- Jixiong Zhang
- College of Science
- China Agricultural University
- Beijing 100193
- P.R. China
| | - Hong Yan
- College of Science
- China Agricultural University
- Beijing 100193
- P.R. China
| | - Yanmei Xiong
- College of Science
- China Agricultural University
- Beijing 100193
- P.R. China
| | - Qianqian Li
- School of Marine Science
- China University of Geosciences in Beijing
- Beijing 100086
- China
| | - Shungeng Min
- College of Science
- China Agricultural University
- Beijing 100193
- P.R. China
| |
Collapse
|
6
|
Lee LC, Liong CY, Jemain AA. Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps. Analyst 2018; 143:3526-3539. [DOI: 10.1039/c8an00599k] [Citation(s) in RCA: 261] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
This review highlights and discusses critically various knowledge gaps in classification modelling using PLS-DA for high dimensional data.
Collapse
Affiliation(s)
- Loong Chuen Lee
- Forensic Science Programme
- FSK
- Universiti Kebangsaan Malaysia
- 50300 Kuala Lumpur
- Malaysia
| | - Choong-Yeun Liong
- Statistics Programme
- FST
- Universiti Kebangsaan Malaysia
- 43600 Bangi
- Malaysia
| | - Abdul Aziz Jemain
- Statistics Programme
- FST
- Universiti Kebangsaan Malaysia
- 43600 Bangi
- Malaysia
| |
Collapse
|
9
|
Quantifying the Impact of NDVIsoil Determination Methods and NDVIsoil Variability on the Estimation of Fractional Vegetation Cover in Northeast China. REMOTE SENSING 2016. [DOI: 10.3390/rs8010029] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|