51
|
Classifying Crop Types Using Two Generations of Hyperspectral Sensors (Hyperion and DESIS) with Machine Learning on the Cloud. REMOTE SENSING 2021. [DOI: 10.3390/rs13224704] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Advances in spaceborne hyperspectral (HS) remote sensing, cloud-computing, and machine learning can help measure, model, map and monitor agricultural crops to address global food and water security issues, such as by providing accurate estimates of crop area and yield to model agricultural productivity. Leveraging these advances, we used the Earth Observing-1 (EO-1) Hyperion historical archive and the new generation DLR Earth Sensing Imaging Spectrometer (DESIS) data to evaluate the performance of hyperspectral narrowbands in classifying major agricultural crops of the U.S. with machine learning (ML) on Google Earth Engine (GEE). EO-1 Hyperion images from the 2010–2013 growing seasons and DESIS images from the 2019 growing season were used to classify three world crops (corn, soybean, and winter wheat) along with other crops and non-crops near Ponca City, Oklahoma, USA. The supervised classification algorithms: Random Forest (RF), Support Vector Machine (SVM), and Naive Bayes (NB), and the unsupervised clustering algorithm WekaXMeans (WXM) were run using selected optimal Hyperion and DESIS HS narrowbands (HNBs). RF and SVM returned the highest overall producer’s, and user’s accuracies, with the performances of NB and WXM being substantially lower. The best accuracies were achieved with two or three images throughout the growing season, especially a combination of an earlier month (June or July) and a later month (August or September). The narrow 2.55 nm bandwidth of DESIS provided numerous spectral features along the 400–1000 nm spectral range relative to smoother Hyperion spectral signatures with 10 nm bandwidth in the 400–2500 nm spectral range. Out of 235 DESIS HNBs, 29 were deemed optimal for agricultural study. Advances in ML and cloud-computing can greatly facilitate HS data analysis, especially as more HS datasets, tools, and algorithms become available on the Cloud.
Collapse
|
52
|
Parsimonious Models of Precipitation Phase Derived from Random Forest Knowledge: Intercomparing Logistic Models, Neural Networks, and Random Forest Models. WATER 2021. [DOI: 10.3390/w13213022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The precipitation phase (PP) affects the hydrologic cycle which in turn affects the climate system. A lower ratio of snow to rain due to climate change affects timing and duration of the stream flow. Thus, more knowledge about the PP occurrence and drivers is necessary and especially important in cities dependent on water coming from glaciers, such as Quito, the capital of Ecuador (2.5 million inhabitants), depending in part on the Antisana glacier. The logistic models (LM) of PP rely only on air temperature and relative humidity to predict PP. However, the processes related to PP are far more complex. The aims of this study were threefold: (i) to compare the performance of random forest (RF) and artificial neural networks (ANN) to derive PP in relation to LM; (ii) to identify the main drivers of PP occurrence using RF; and (iii) to develop LM using meteorological drivers derived from RF. The results show that RF and ANN outperformed LM in predicting PP in 8 out of 10 metrics. RF indicated that temperature, dew point temperature, and specific humidity are more important than wind or radiation for PP occurrence. With these predictors, parsimonious and efficient models were developed showing that data mining may help in understanding complex processes and complements expert knowledge.
Collapse
|
53
|
Paepae T, Bokoro PN, Kyamakya K. From Fully Physical to Virtual Sensing for Water Quality Assessment: A Comprehensive Review of the Relevant State-of-the-Art. SENSORS (BASEL, SWITZERLAND) 2021; 21:6971. [PMID: 34770278 PMCID: PMC8587795 DOI: 10.3390/s21216971] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/17/2021] [Accepted: 10/17/2021] [Indexed: 12/17/2022]
Abstract
Rapid urbanization, industrial development, and climate change have resulted in water pollution and in the quality deterioration of surface and groundwater at an alarming rate, deeming its quick, accurate, and inexpensive detection imperative. Despite the latest developments in sensor technologies, real-time determination of certain parameters is not easy or uneconomical. In such cases, the use of data-derived virtual sensors can be an effective alternative. In this paper, the feasibility of virtual sensing for water quality assessment is reviewed. The review focuses on the overview of key water quality parameters for a particular use case and the development of the corresponding cost estimates for their monitoring. The review further evaluates the current state-of-the-art in terms of the modeling approaches used, parameters studied, and whether the inputs were pre-processed by interrogating relevant literature published between 2001 and 2021. The review identified artificial neural networks, random forest, and multiple linear regression as dominant machine learning techniques used for developing inferential models. The survey also highlights the need for a comprehensive virtual sensing system in an internet of things environment. Thus, the review formulates the specification book for the advanced water quality assessment process (that involves a virtual sensing module) that can enable near real-time monitoring of water quality.
Collapse
Affiliation(s)
- Thulane Paepae
- Department of Mathematics and Applied Mathematics, University of Johannesburg, Doornfontein 2028, South Africa;
| | - Pitshou N. Bokoro
- Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Doornfontein 2028, South Africa
| | - Kyandoghere Kyamakya
- Institute for Smart Systems Technologies, Transportation Informatics Group, Alpen-Adria Universität Klagenfurt, 9020 Klagenfurt, Austria;
| |
Collapse
|
54
|
Yang H, Huang K, Zhang K, Weng Q, Zhang H, Wang F. Predicting Heavy Metal Adsorption on Soil with Machine Learning and Mapping Global Distribution of Soil Adsorption Capacities. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021; 55:14316-14328. [PMID: 34617744 DOI: 10.1021/acs.est.1c02479] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Studying heavy metal adsorption on soil is important for understanding the fate of heavy metals and properly assessing the related environmental risks. Existing experimental methods and traditional models for quantifying adsorption, however, are time-consuming and ineffective. In this study, we developed machine learning models for the soil adsorption of six heavy metals (Cd(II), Cr(VI), Cu(II), Pb(II), Ni(II), and Zn(II)) using 4420 data points (1105 soils) extracted from 150 journal articles. After a comprehensive comparison, our results showed that the gradient boosting decision tree had the best performance for a combined model based on all the data. The Shapley additive explanation method was used to identify the feature importance and the effects of these features on the adsorption, based on which six independent models were developed for the six metals to achieve better model performance than the combined model. Using these independent models, the global distribution of heavy metal adsorption capacities on soils was predicted with known soil properties. Reversed models, including one combined model for all the six metals and six independent models, were also built using the same data sets to predict the heavy metal concentration in water when the adsorbed amount is known for a soil/sediment.
Collapse
Affiliation(s)
- Hongrui Yang
- College of Environmental & Resource Sciences, Zhejiang University, Hangzhou 310058, China
| | - Kuan Huang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| | - Kai Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| | - Qin Weng
- College of Environmental & Resource Sciences, Zhejiang University, Hangzhou 310058, China
| | - Huichun Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| | - Feier Wang
- College of Environmental & Resource Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
55
|
Upscaling Evapotranspiration from a Single-Site to Satellite Pixel Scale. REMOTE SENSING 2021. [DOI: 10.3390/rs13204072] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
It is of great significance for the validation of remotely sensed evapotranspiration (ET) products to solve the spatial-scale mismatch between site observations and remote sensing estimations. To overcome this challenge, this paper proposes a comprehensive framework for obtaining the ground truth ET at the satellite pixel scale (1 × 1 km resolution in MODIS satellite imagery). The main idea of this framework is to first quantitatively evaluate the spatial heterogeneity of the land surface, then combine the eddy covariance (EC)-observed ET (ET_EC) to be able to compare and optimize the upscaling methods (among five data-driven and three mechanism-driven methods) through direct validation and cross-validation, and finally use the optimal method to obtain the ground truth ET at the satellite pixel scale. The results showed that the ET_EC was superior over homogeneous underlying surfaces with a root mean square error (RMSE) of 0.34 mm/d. Over moderately and highly heterogeneous underlying surfaces, the Gaussian process regression (GPR) method performed better (the RMSEs were 0.51 mm/d and 0.60 mm/d, respectively). Finally, an integrated method (namely, using the ET_EC for homogeneous surfaces and the GPR method for moderately and highly heterogeneous underlying surfaces) was proposed to obtain the ground truth ET over fifteen typical underlying surfaces in the Heihe River Basin. Furthermore, the uncertainty of ground truth ET was quantitatively evaluated. The results showed that the ground truth ET at the satellite pixel scale is relatively reliable with an uncertainty of 0.02–0.41 mm/d. The upscaling framework proposed in this paper can be used to obtain the ground truth ET at the satellite pixel scale and its uncertainty, and it has great potential to be applied in more regions around the globe for remotely sensed ET products’ validation.
Collapse
|
56
|
Machine Learning Reveals a Significant Shift in Water Regime Types Due to Projected Climate Change. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2021. [DOI: 10.3390/ijgi10100660] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A water regime type is a cumulative representation of seasonal runoff variability in a textual, qualitative, or quantitative form developed for a particular period. The assessment of the respective water regime type changes is of high importance for local communities and water management authorities, increasing their awareness and opening strategies for adaptation. In the presented study, we trained a machine learning model—the Random Forest classifier—to predict water regime types in northwest Russia based on monthly climatological hydrographs derived for a historical period (1979–1991). Evaluation results show the high efficiency of the trained model with an accuracy of 91.6%. Then, the Random Forest model was used to predict water regime types based on runoff projections for the end of the 21st century (2087–2099) forced by four different General Circulation Models (GCM) and three Representative Concentration Pathway scenarios (RCP). Results indicate that climate is expected to modify water regime types remarkably. There are two primary directions of projected changes. First, we detect the tendency towards less stable summer and winter flows. The second direction is towards a shift in spring flood characteristics. While spring flooding is expected to remain the dominant phase of the water regime, the flood peak is expected to shift towards earlier occurrence and lower magnitude. We identified that the projected changes in water regime types are more pronounced in more aggressive RCP scenarios.
Collapse
|
57
|
Singha S, Pasupuleti S, Singha SS, Singh R, Kumar S. Prediction of groundwater quality using efficient machine learning technique. CHEMOSPHERE 2021; 276:130265. [PMID: 34088106 DOI: 10.1016/j.chemosphere.2021.130265] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 03/07/2021] [Accepted: 03/11/2021] [Indexed: 06/12/2023]
Abstract
To ensure safe drinking water sources in the future, it is imperative to understand the quality and pollution level of existing groundwater. The prediction of water quality with high accuracy is the key to control water pollution and the improvement of water management. In this study, a deep learning (DL) based model is proposed for predicting groundwater quality and compared with three other machine learning (ML) models, namely, random forest (RF), eXtreme gradient boosting (XGBoost), and artificial neural network (ANN). A total of 226 groundwater samples are collected from an agriculturally intensive area Arang of Raipur district, Chhattisgarh, India, and various physicochemical parameters are measured to compute entropy weight-based groundwater quality index (EWQI). Prediction performances of models are determined by introducing five error metrics. Results showed that DL model is the best prediction model with the highest accuracy in terms of R2, i.e., R2 = 0996 against the RF (R2 = 0.886), XGBoost (R2 = 0.0.927), and ANN (R2 = 0.917). The uncertainty of the DL model output is cross-verified by running the proposed algorithm with newly randomized dataset for ten times, where minor deviations in the mean value of performance metrics are observed. Moreover, input variable importance computed by prediction models highlights that DL model is the most realistic and accurate approach in the prediction of groundwater quality.
Collapse
Affiliation(s)
- Sudhakar Singha
- Department of Civil Engineering, Indian Institute of Technology (Indian School of Mines), Dhanbad, 826004, Jharkhand, India
| | - Srinivas Pasupuleti
- Department of Civil Engineering, Indian Institute of Technology (Indian School of Mines), Dhanbad, 826004, Jharkhand, India.
| | - Soumya S Singha
- Department of Civil Engineering, Indian Institute of Technology (Indian School of Mines), Dhanbad, 826004, Jharkhand, India
| | - Rambabu Singh
- Exploration Department, Central Mine Planning and Design Institute Limited, Bilaspur, 495006, Chhattisgarh, India
| | - Suresh Kumar
- Central Ground Water Board, Patna, 800001, Bihar, India
| |
Collapse
|
58
|
Cheung YY, Cheung S, Mak J, Liu K, Xia X, Zhang X, Yung Y, Liu H. Distinct interaction effects of warming and anthropogenic input on diatoms and dinoflagellates in an urbanized estuarine ecosystem. GLOBAL CHANGE BIOLOGY 2021; 27:3463-3473. [PMID: 33934458 DOI: 10.1111/gcb.15667] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 04/20/2021] [Indexed: 06/12/2023]
Abstract
Diatoms and dinoflagellates are two major bloom-forming phytoplankton groups in coastal ecosystems and their dominances will notably affect the marine ecosystems. By analyzing an 18-year monthly monitoring dataset (2000-2017) in the Pearl River Estuary (one of the most highly urbanized and populated estuarine in the world), we observe an increasing trend of the diatom to dinoflagellate ratio (Diatom/Dino). As revealed by multiple statistical models (generalized additive mixed model, random forest, and gradient boosting algorithms), both groups are positively correlated with temperature. Diatoms are positively correlated with nitrate and negatively correlated with ammonium while dinoflagellates show an opposite pattern. The Diatom/Dino trend is explained by an altered nutrient composition caused by a decadal increase in anthropogenic input, at which nitrate increased rapidly while ammonium and phosphate were relatively constant. Regarding the interaction of warming and nutrient dynamics, we observe an additive effect of warming and nitrate enrichment that promotes the increase in diatom cell density, while the dinoflagellate cell density only increases with warming when nutrients are depleted. Our models predict that the Diatom/Dino ratio will further increase with increasing anthropogenic input and global warming in subtropical estuarine ecosystems with nitrate as the dominant inorganic nitrogen; its ecological consequences are worthy of further investigation.
Collapse
Affiliation(s)
- Yan Yin Cheung
- Department of Ocean Science, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Shunyan Cheung
- Department of Ocean Science, The Hong Kong University of Science and Technology, Hong Kong SAR, China
- Southern Marine Science & Engineering Guangdong Laboratory (Zhuhai), Zhuhai, China
| | - Julian Mak
- Department of Ocean Science, The Hong Kong University of Science and Technology, Hong Kong SAR, China
- Hong Kong Branch of Southern Marine Science & Engineering Guangdong Laboratory, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Kailin Liu
- Department of Ocean Science, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xiaomin Xia
- Key Laboratory of Tropical Marine Bio-resources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, China
| | - Xiaodong Zhang
- Department of Ocean Science, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Yingkit Yung
- Water Policy and Planning Group, Hong Kong Government Environmental Protection Department, Hong Kong SAR, China
| | - Hongbin Liu
- Department of Ocean Science, The Hong Kong University of Science and Technology, Hong Kong SAR, China
- Hong Kong Branch of Southern Marine Science & Engineering Guangdong Laboratory, The Hong Kong University of Science and Technology, Hong Kong SAR, China
- State Key Laboratory of Marine Pollution, Hong Kong SAR, China
| |
Collapse
|
59
|
de Oliveira RCG, Cunha CL, Tôrres AR, Corrêa SM. Forecasts of tropospheric ozone in the Metropolitan Area of Rio de Janeiro based on missing data imputation and multivariate calibration techniques. ENVIRONMENTAL MONITORING AND ASSESSMENT 2021; 193:531. [PMID: 34322768 DOI: 10.1007/s10661-021-09333-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 07/22/2021] [Indexed: 06/13/2023]
Abstract
Multivariate calibration based on partial least squares, random forest, and support vector machine methods, combined with the MissForest imputation algorithm, was used to understand the interaction between ozone and nitrogen oxides, carbon monoxide, wind speed, solar radiation, temperature, relative humidity, and others, the data of which were collected by air quality monitoring stations in the metropolitan area of Rio de Janeiro in four distinct sites between, 2014 and, 2018. These techniques provide an easy and feasible way of modeling and analyzing air pollutants and can be used when coupled with other methods. The results showed that random forest and support vector machine chemometric techniques can be used in modeling and predicting tropospheric ozone concentrations, with a coefficient of determination for making predictions up to 0.92, a root-mean square error of calibration between 4.66 and 27.15 µg m-3, and a root-mean square error of prediction between 4.17 and 22.45 µg m-3, depending on the air quality monitoring stations and season.
Collapse
Affiliation(s)
- Rafael C G de Oliveira
- Faculty of Engineering, Rio de Janeiro State University, Rua São Francisco Xavier, 524 Maracanã, Rio de Janeiro, RJ, 20551-013, Brazil
| | - Camilla L Cunha
- Faculty of Technology, Rio de Janeiro State University, Rodovia Presidente Dutra km 298, Resende, RJ, 27537-000, Brazil
| | - Alexandre R Tôrres
- Faculty of Technology, Rio de Janeiro State University, Rodovia Presidente Dutra km 298, Resende, RJ, 27537-000, Brazil
| | - Sergio M Corrêa
- Faculty of Engineering, Rio de Janeiro State University, Rua São Francisco Xavier, 524 Maracanã, Rio de Janeiro, RJ, 20551-013, Brazil.
- Faculty of Technology, Rio de Janeiro State University, Rodovia Presidente Dutra km 298, Resende, RJ, 27537-000, Brazil.
| |
Collapse
|
60
|
Ward BJ, Andriessen N, Tembo JM, Kabika J, Grau M, Scheidegger A, Morgenroth E, Strande L. Predictive models using "cheap and easy" field measurements: Can they fill a gap in planning, monitoring, and implementing fecal sludge management solutions? WATER RESEARCH 2021; 196:116997. [PMID: 33744658 DOI: 10.1016/j.watres.2021.116997] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/19/2021] [Accepted: 03/01/2021] [Indexed: 06/12/2023]
Abstract
The characteristics of fecal sludge delivered to treatment plants are highly variable. Adapting treatment process operations accordingly is challenging due to a lack of analytical capacity for characterization and monitoring at many treatment plants. Cost-efficient and simple field measurements such as photographs and probe readings could be proxies for process control parameters that normally require laboratory analysis. To investigate this, we evaluated questionnaire data, expert assessments, and simple analytical measurements for fecal sludge collected from 421 onsite containments. This data served as inputs to models of varying complexity. Random forest and linear regression models were able to predict physical-chemical characteristics including total solids (TS) and ammonium (NH4+-N) concentrations, and solid-liquid separation performance including settling efficiency and filtration time (R2 from 0.51-0.66) based on image analysis of photographs (sludge color, supernatant color, and texture) and probe readings (conductivity (EC) and pH). Supernatant color was the best predictor of settling efficiency and filtration time, EC was the best predictor of NH4+-N, and texture was the best predictor of TS. Predictive models have the potential to be applied for real-time monitoring and process control if a database of measurements is developed and models are validated in other cities. Simple decision tree models based on the single classifier of containment type can also be used to make predictions about citywide planning, where a lower degree of accuracy is required.
Collapse
Affiliation(s)
- Barbara J Ward
- Eawag: Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland; Institute of Environmental Engineering, ETH Zürich, Zürich, Switzerland.
| | - Nienke Andriessen
- Eawag: Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
| | - James M Tembo
- Department of Civil and Environmental Engineering, School of Engineering, University of Zambia, Lusaka, Zambia
| | - Joel Kabika
- Department of Civil and Environmental Engineering, School of Engineering, University of Zambia, Lusaka, Zambia
| | - Matt Grau
- Department of Physics, ETH Zürich, 8093, Zürich, Switzerland
| | - Andreas Scheidegger
- Eawag: Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
| | - Eberhard Morgenroth
- Eawag: Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland; Institute of Environmental Engineering, ETH Zürich, Zürich, Switzerland
| | - Linda Strande
- Eawag: Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
| |
Collapse
|
61
|
Papacharalampous G, Tyralis H, Papalexiou SM, Langousis A, Khatami S, Volpi E, Grimaldi S. Global-scale massive feature extraction from monthly hydroclimatic time series: Statistical characterizations, spatial patterns and hydrological similarity. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 767:144612. [PMID: 33454612 DOI: 10.1016/j.scitotenv.2020.144612] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 11/27/2020] [Accepted: 12/17/2020] [Indexed: 06/12/2023]
Abstract
Hydroclimatic time series analysis focuses on a few feature types (e.g., autocorrelations, trends, extremes), which describe a small portion of the entire information content of the observations. Aiming to exploit a larger part of the available information and, thus, to deliver more reliable results (e.g., in hydroclimatic time series clustering contexts), here we approach hydroclimatic time series analysis differently, i.e., by performing massive feature extraction. In this respect, we develop a big data framework for hydroclimatic variable behaviour characterization. This framework relies on approximately 60 diverse features and is completely automatic (in the sense that it does not depend on the hydroclimatic process at hand). We apply the new framework to characterize mean monthly temperature, total monthly precipitation and mean monthly river flow. The applications are conducted at the global scale by exploiting 40-year-long time series originating from over 13 000 stations. We extract interpretable knowledge on seasonality, trends, autocorrelation, long-range dependence and entropy, and on feature types that are met less frequently. We further compare the examined hydroclimatic variable types in terms of this knowledge and, identify patterns related to the spatial variability of the features. For this latter purpose, we also propose and exploit a hydroclimatic time series clustering methodology. This new methodology is based on Breiman's random forests. The descriptive and exploratory insights gained by the global-scale applications prove the usefulness of the adopted feature compilation in hydroclimatic contexts. Moreover, the spatially coherent patterns characterizing the clusters delivered by the new methodology build confidence in its future exploitation. Given this spatial coherence and the scale-independent nature of the delivered feature values (which makes them particularly useful in forecasting and simulation contexts), we believe that this methodology could also be beneficial within regionalization frameworks, in which knowledge on hydrological similarity is exploited in technical and operative terms.
Collapse
Affiliation(s)
- Georgia Papacharalampous
- Department of Engineering, Roma Tre University, Rome, Italy; Department of Civil Engineering, School of Engineering, University of Patras, University Campus, Rio, 26504 Patras, Greece; Department of Water Resources and Environmental Engineering, School of Civil Engineering, National Technical University of Athens, Heroon Polytechneiou 5, 15780 Zographou, Greece.
| | - Hristos Tyralis
- Department of Water Resources and Environmental Engineering, School of Civil Engineering, National Technical University of Athens, Heroon Polytechneiou 5, 15780 Zographou, Greece; Air Force Support Command, Hellenic Air Force, Elefsina Air Base, 19200 Elefsina, Greece.
| | - Simon Michael Papalexiou
- Department of Civil, Geological and Environmental Engineering, University of Saskatchewan, Saskatoon, Saskatchewan, Canada; Global Institute for Water Security, Saskatoon, Saskatchewan, Canada; Faculty of Environmental Sciences, Czech University of Life Sciences Prague, Czech Republic.
| | - Andreas Langousis
- Department of Civil Engineering, School of Engineering, University of Patras, University Campus, Rio, 26504 Patras, Greece.
| | - Sina Khatami
- Department of Physical Geography and the Bolin Centre for Climate Research, Stockholm University, SE-10691 Stockholm, Sweden; Climate & Energy College, University of Melbourne, Parkville, Victoria, Australia; Department of Infrastructure Engineering, University of Melbourne, Parkville, Victoria, Australia.
| | - Elena Volpi
- Department of Engineering, Roma Tre University, Rome, Italy.
| | - Salvatore Grimaldi
- Department for Innovation in Biological, Agro-food and Forest Systems, University of Tuscia, Viterbo, Italy; Department of Mechanical and Aerospace Engineering, Tandon School of Engineering, New York University, Brooklyn, NY 10003, USA.
| |
Collapse
|
62
|
Tyralis H, Papacharalampous G. Boosting algorithms in energy research: a systematic review. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05995-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
63
|
Ho L, Jerves-Cobo R, Morales O, Larriva J, Arevalo-Durazno M, Barthel M, Six J, Bode S, Boeckx P, Goethals P. Spatial and temporal variations of greenhouse gas emissions from a waste stabilization pond: Effects of sludge distribution and accumulation. WATER RESEARCH 2021; 193:116858. [PMID: 33540345 DOI: 10.1016/j.watres.2021.116858] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 01/12/2021] [Accepted: 01/18/2021] [Indexed: 06/12/2023]
Abstract
Due to regular influx of organic matter and nutrients, waste stabilization ponds (WSPs) can release considerable quantities of greenhouse gases (GHGs). To investigate the spatiotemporal variations of GHG emissions from WSPs with a focus on the effects of sludge accumulation and distribution, we conducted a bathymetry survey and two sampling campaigns in Ucubamba WSP (Cuenca, Ecuador). The results indicated that spatial variation of GHG emissions was strongly dependent on sludge distribution. Thick sludge layers in aerated ponds and facultative ponds caused substantial CO2 and CH4 emissions which accounted for 21.3% and 78.7% of the total emissions from the plant. Conversely, the prevalence of anoxic conditions stimulated the N2O consumption via complete denitrification leading to a net uptake from the atmosphere, i.e. up to 1.4±0.2 mg-N m-2 d-1. Double emission rates of CO2 were found in the facultative and maturation ponds during the day compared to night-time emissions, indicating the important role of algal respiration, while no diel variation of the CH4 and N2O emissions was found. Despite the uptake of N2O, the total GHG emissions of the WSP was higher than constructed wetlands and conventional centralized wastewater treatment facilities. Hence, it is recommended that sludge management with proper desludging regulation should be included as an important mitigation measure to reduce the carbon footprint of pond treatment facilities.
Collapse
Affiliation(s)
- Long Ho
- Department of Animal Sciences and Aquatic Ecology, Ghent University, Ghent, Belgium.
| | - Ruben Jerves-Cobo
- Department of Animal Sciences and Aquatic Ecology, Ghent University, Ghent, Belgium; PROMAS, Universidad de Cuenca, Cuenca, Ecuador; BIOMATH, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | | | - Josue Larriva
- ETAPA, Empresa Pública Municipal de Telecomunicaciones, Agua Potable, Alcantarillado y Saneamiento de Cuenca, Cuenca, Ecuador; Facultad de Ciencia y Tecnología, Universidad del Azuay, Cuenca, Ecuador
| | | | - Matti Barthel
- Department of Environmental Systems Science, ETH Zurich, Zurich, Switzerland
| | - Johan Six
- Department of Environmental Systems Science, ETH Zurich, Zurich, Switzerland
| | - Samuel Bode
- Department of Green Chemistry and Technology, Isotope Bioscience Laboratory - ISOFYS, Ghent University, Ghent, Belgium
| | - Pascal Boeckx
- Department of Green Chemistry and Technology, Isotope Bioscience Laboratory - ISOFYS, Ghent University, Ghent, Belgium
| | - Peter Goethals
- Department of Animal Sciences and Aquatic Ecology, Ghent University, Ghent, Belgium
| |
Collapse
|
64
|
Assessment of Annual Composite Images Obtained by Google Earth Engine for Urban Areas Mapping Using Random Forest. REMOTE SENSING 2021. [DOI: 10.3390/rs13040748] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Urban areas represent the primary source region of greenhouse gas emissions. Mapping urban areas is essential for understanding land cover change, carbon cycles, and climate change (urban areas also refer to impervious surfaces, i.e., artificial cover and structures). Remote sensing has greatly advanced urban areas mapping over the last several decades. At present, we have entered the era of big data. Long time series of satellite data such as Landsat and high-performance computing platforms such as Google Earth Engine (GEE) offer new opportunities to map urban areas. The objective of this research was to determine how annual time series images from Landsat 8 Operational Land Imager (OLI) can effectively be composed to map urban areas in three cities in China in support of GEE. Three reducer functions, ee.Reducer.min(), ee.Reducer.median(), and ee.Reducer.max() provided by GEE, were selected to construct four schemes to synthesize the annual intensive time series Landsat 8 OLI data for three cities in China. Then, urban areas were mapped based on the random forest algorithm and the accuracy was evaluated in detail. The results show that (1) the quality of annual composite images was improved significantly, particularly in reducing the impact of cloud and cloud shadows, and (2) the annual composite images obtained by the combination of multiple reducer functions had better performance than that obtained by a single reducer function. Further, the overall accuracy of urban areas mapping with the combination of multiple reducer functions exceeded 90% in all three cities in China. In summary, a suitable combination of reducer functions for synthesizing annual time series images can enhance data quality and ensure differences between characteristics and higher precision for urban areas mapping.
Collapse
|
65
|
Ferreira RG, Silva DDD, Elesbon AAA, Fernandes-Filho EI, Veloso GV, Fraga MDS, Ferreira LB. Machine learning models for streamflow regionalization in a tropical watershed. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2021; 280:111713. [PMID: 33257181 DOI: 10.1016/j.jenvman.2020.111713] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 11/17/2020] [Accepted: 11/21/2020] [Indexed: 06/12/2023]
Abstract
This study aims to assess different machine learning approaches for streamflow regionalization in a tropical watershed, analyzing their advantages and limitations, and to point the benefits of using them for water resources management. The algorithms applied were: Random Forest, Earth and linear model. The response variables were the three types of minimum streamflow (Q7.10, Q95 and Q90), besides the long-term average streamflow (Qmld). The database involved 76 environmental covariates related to morphometry, topography, climate, land use and cover, and surface conditions. The elimination of covariates was performed using two processes: Pearson's correlation analysis and importance analysis by Recursive Feature Elimination (RFE). To validate the models, the following statistical metrics were used: Nash-Sutcliffe coefficient (NSE), percent bias (PBIAS), Willmott's index of agreement (d), coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE) and relative error (RE). The linear model was unsatisfactory for all response variables. The results show that nonlinear models performed well, and their covariate of greatest predictive importance was flow equivalent to the precipitated volume, considering the subtraction of an abstraction factor of 750 mm (Peq750). Generally, the Random Forest and Earth models showed similar performances and great ability to predict the minimum streamflow and long-term average streamflow assessed, constituting powerful and promising alternatives for the streamflow regionalization in support to the management and integrated planning of water resources at the level of river basins.
Collapse
Affiliation(s)
- Renan Gon Ferreira
- Department of Agricultural Engineering, Federal University of Viçosa, Campus UFV, 36570-900, Viçosa, MG, Brazil.
| | - Demetrius David da Silva
- Department of Agricultural Engineering, Federal University of Viçosa, Campus UFV, 36570-900, Viçosa, MG, Brazil
| | | | | | - Gustavo Vieira Veloso
- Department of Soil and Plant Nutrition, Federal University of Viçosa, Campus UFV, 36570-900, Viçosa, MG, Brazil
| | | | - Lucas Borges Ferreira
- Department of Agricultural Engineering, Federal University of Viçosa, Campus UFV, 36570-900, Viçosa, MG, Brazil
| |
Collapse
|
66
|
Influence of Random Forest Hyperparameterization on Short-Term Runoff Forecasting in an Andean Mountain Catchment. ATMOSPHERE 2021. [DOI: 10.3390/atmos12020238] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The Random Forest (RF) algorithm, a decision-tree-based technique, has become a promising approach for applications addressing runoff forecasting in remote areas. This machine learning approach can overcome the limitations of scarce spatio-temporal data and physical parameters needed for process-based hydrological models. However, the influence of RF hyperparameters is still uncertain and needs to be explored. Therefore, the aim of this study is to analyze the sensitivity of RF runoff forecasting models of varying lead time to the hyperparameters of the algorithm. For this, models were trained by using (a) default and (b) extensive hyperparameter combinations through a grid-search approach that allow reaching the optimal set. Model performances were assessed based on the R2, %Bias, and RMSE metrics. We found that: (i) The most influencing hyperparameter is the number of trees in the forest, however the combination of the depth of the tree and the number of features hyperparameters produced the highest variability-instability on the models. (ii) Hyperparameter optimization significantly improved model performance for higher lead times (12- and 24-h). For instance, the performance of the 12-h forecasting model under default RF hyperparameters improved to R2 = 0.41 after optimization (gain of 0.17). However, for short lead times (4-h) there was no significant model improvement (0.69 < R2 < 0.70). (iii) There is a range of values for each hyperparameter in which the performance of the model is not significantly affected but remains close to the optimal. Thus, a compromise between hyperparameter interactions (i.e., their values) can produce similar high model performances. Model improvements after optimization can be explained from a hydrological point of view, the generalization ability for lead times larger than the concentration time of the catchment tend to rely more on hyperparameterization than in what they can learn from the input data. This insight can help in the development of operational early warning systems.
Collapse
|
67
|
Machine Learning and Simulation-Optimization Coupling for Water Distribution Network Contamination Source Detection. SENSORS 2021; 21:s21041157. [PMID: 33562175 PMCID: PMC7916058 DOI: 10.3390/s21041157] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 02/02/2021] [Accepted: 02/04/2021] [Indexed: 11/29/2022]
Abstract
This paper presents and explores a novel methodology for solving the problem of a water distribution network contamination event, which includes determining the exact source of contamination, the contamination start and end times and the injected contaminant concentration. The methodology is based on coupling a machine learning algorithm for predicting the most probable contamination sources in a water distribution network with an optimization algorithm for determining the values of contamination start time, end time and injected contaminant concentration for each predicted node separately. Two slightly different algorithmic frameworks were constructed which are based on the mentioned methodology. Both algorithmic frameworks utilize the Random Forest algorithm for classification of top source contamination node candidates, with one of the frameworks directly using the stochastic fireworks optimization algorithm to determine the contamination start time, end time and injected contaminant concentration for each predicted node separately. The second framework uses the Random Forest algorithm for an additional regression prediction of each top node’s start time, end time and contaminant concentration and is then coupled with the deterministic global search optimization algorithm MADS. Both a small sized (92 potential sources) network with perfect sensor measurements and a medium sized (865 potential sources) benchmark network with fuzzy sensor measurements were used to explore the proposed frameworks. Both algorithmic frameworks perform well and show robustness in determining the true source node, start and end times and contaminant concentration, with the second framework being extremely efficient on the fuzzy sensor measurement benchmark network.
Collapse
|
68
|
Identification Framework of Contaminant Spill in Rivers Using Machine Learning with Breakthrough Curve Analysis. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18031023. [PMID: 33498931 PMCID: PMC7908193 DOI: 10.3390/ijerph18031023] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 01/21/2021] [Accepted: 01/21/2021] [Indexed: 11/23/2022]
Abstract
To minimize the damage from contaminant accidents in rivers, early identification of the contaminant source is crucial. Thus, in this study, a framework combining Machine Learning (ML) and the Transient Storage zone Model (TSM) was developed to predict the spill location and mass of a contaminant source. The TSM model was employed to simulate non-Fickian Breakthrough Curves (BTCs), which entails relevant information of the contaminant source. Then, the ML models were used to identify the BTC features, characterized by 21 variables, to predict the spill location and mass. The proposed framework was applied to the Gam Creek, South Korea, in which two tracer tests were conducted. In this study, six ML methods were applied for the prediction of spill location and mass, while the most relevant BTC features were selected by Recursive Feature Elimination Cross-Validation (RFECV). Model applications to field data showed that the ensemble Decision tree models, Random Forest (RF) and Xgboost (XGB), were the most efficient and feasible in predicting the contaminant source.
Collapse
|
69
|
Explanation and Probabilistic Prediction of Hydrological Signatures with Statistical Boosting Algorithms. REMOTE SENSING 2021. [DOI: 10.3390/rs13030333] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Hydrological signatures, i.e., statistical features of streamflow time series, are used to characterize the hydrology of a region. A relevant problem is the prediction of hydrological signatures in ungauged regions using the attributes obtained from remote sensing measurements at ungauged and gauged regions together with estimated hydrological signatures from gauged regions. The relevant framework is formulated as a regression problem, where the attributes are the predictor variables and the hydrological signatures are the dependent variables. Here we aim to provide probabilistic predictions of hydrological signatures using statistical boosting in a regression setting. We predict 12 hydrological signatures using 28 attributes in 667 basins in the contiguous US. We provide formal assessment of probabilistic predictions using quantile scores. We also exploit the statistical boosting properties with respect to the interpretability of derived models. It is shown that probabilistic predictions at quantile levels 2.5% and 97.5% using linear models as base learners exhibit better performance compared to more flexible boosting models that use both linear models and stumps (i.e., one-level decision trees). On the contrary, boosting models that use both linear models and stumps perform better than boosting with linear models when used for point predictions. Moreover, it is shown that climatic indices and topographic characteristics are the most important attributes for predicting hydrological signatures.
Collapse
|
70
|
Wherry SA, Tesoriero AJ, Terziotti S. Factors Affecting Nitrate Concentrations in Stream Base Flow. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021; 55:902-911. [PMID: 33356185 DOI: 10.1021/acs.est.0c02495] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Elevated nitrogen concentrations in streams and rivers in the Chesapeake Bay watershed have adversely affected the ecosystem health of the bay. Much of this nitrogen is derived as nitrate from groundwater that discharges to streams as base flow. In this study, boosted regression trees (BRTs) were used to relate nitrate concentrations in base flow (n = 156) to explanatory variables describing nitrogen sources, geology, and soil and catchment characteristics. From these relations, a BRT model was developed to predict base flow nitrate concentrations in streams throughout the Chesapeake Bay watershed. The highest base flow nitrate concentrations were associated with intensive agricultural land use, carbonate geology, and sparse riparian canopy, which suggested that reduced nitrogen inputs, particularly over carbonate terrane, are critical for limiting nitrate concentrations. The lowest nitrate concentrations in the BRT model were associated with extensive riparian canopy, high levels of organic carbon in soils, and suboxic conditions at shallow depths, which suggested that denitrification in the subsurface, particularly in the riparian zone, is limiting base flow nitrate concentrations. Nitrate transport from aquifers to streams can take decades to occur, resulting in decades-long lag times between the time when a land-use activity is implemented and when its effects are fully observed in streams. Predictive models of base flow nitrate concentrations in streams will help identify which portions of a watershed are likely to have large fractions of total stream nitrogen load derived from pathways with significant lag times.
Collapse
Affiliation(s)
- Susan A Wherry
- U.S. Geological Survey, 2130 SW 5th Avenue, Portland, Oregon 97201, United States
| | - Anthony J Tesoriero
- U.S. Geological Survey, 2130 SW 5th Avenue, Portland, Oregon 97201, United States
| | - Silvia Terziotti
- U.S. Geological Survey, 3916 Sunset Ridge Road, Raleigh, North Carolina 27607, United States
| |
Collapse
|
71
|
Development of a Regional Gridded Runoff Dataset Using Long Short-Term Memory (LSTM) Networks. HYDROLOGY 2021. [DOI: 10.3390/hydrology8010006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Gridded datasets provide spatially and temporally consistent runoff estimates that serve as reliable sources for assessing water resources from regional to global scales. This study presents LSTM-REG, a regional gridded runoff dataset for northwest Russia based on Long Short-Term Memory (LSTM) networks. LSTM-REG covers the period from 1980 to 2016 at a 0.5° spatial and daily temporal resolution. LSTM-REG has been extensively validated and benchmarked against GR4J-REG, a gridded runoff dataset based on a parsimonious regionalization scheme and the GR4J hydrological model. While both datasets provide runoff estimates with reliable prediction efficiency, LSTM-REG outperforms GR4J-REG for most basins in the independent evaluation set. Thus, the results demonstrate a higher generalization capacity of LSTM-REG than GR4J-REG, which can be attributed to the higher efficiency of the proposed LSTM-based regionalization scheme. The developed datasets are freely available in open repositories to foster further regional hydrology research in northwest Russia.
Collapse
|
72
|
Wagenaar D, Hermawan T, van den Homberg MJC, Aerts JCJH, Kreibich H, de Moel H, Bouwer LM. Improved Transferability of Data-Driven Damage Models Through Sample Selection Bias Correction. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2021; 41:37-55. [PMID: 32830337 PMCID: PMC7891600 DOI: 10.1111/risa.13575] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Revised: 03/30/2020] [Accepted: 07/07/2020] [Indexed: 06/11/2023]
Abstract
Damage models for natural hazards are used for decision making on reducing and transferring risk. The damage estimates from these models depend on many variables and their complex sometimes nonlinear relationships with the damage. In recent years, data-driven modeling techniques have been used to capture those relationships. The available data to build such models are often limited. Therefore, in practice it is usually necessary to transfer models to a different context. In this article, we show that this implies the samples used to build the model are often not fully representative for the situation where they need to be applied on, which leads to a "sample selection bias." In this article, we enhance data-driven damage models by applying methods, not previously applied to damage modeling, to correct for this bias before the machine learning (ML) models are trained. We demonstrate this with case studies on flooding in Europe, and typhoon wind damage in the Philippines. Two sample selection bias correction methods from the ML literature are applied and one of these methods is also adjusted to our problem. These three methods are combined with stochastic generation of synthetic damage data. We demonstrate that for both case studies, the sample selection bias correction techniques reduce model errors, especially for the mean bias error this reduction can be larger than 30%. The novel combination with stochastic data generation seems to enhance these techniques. This shows that sample selection bias correction methods are beneficial for damage model transfer.
Collapse
Affiliation(s)
- Dennis Wagenaar
- DeltaresDelftThe Netherlands
- Institute for Environmental StudiesVU University AmsterdamThe Netherlands
| | | | | | - Jeroen C. J. H. Aerts
- DeltaresDelftThe Netherlands
- Institute for Environmental StudiesVU University AmsterdamThe Netherlands
| | - Heidi Kreibich
- GFZ German Research Centre for GeosciencesPotsdamGermany
| | - Hans de Moel
- Institute for Environmental StudiesVU University AmsterdamThe Netherlands
| | - Laurens M. Bouwer
- Climate Service Center GermanyHelmholtz‐Zentrum GeesthachtHamburgGermany
| |
Collapse
|
73
|
Liu H, Hitchcock DB, Samadi SZ. Spatio-temporal analysis of flood data from South Carolina. JOURNAL OF STATISTICAL DISTRIBUTIONS AND APPLICATIONS 2020. [DOI: 10.1186/s40488-020-00112-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
AbstractTo investigate the relationship between flood gage height and precipitation in South Carolina from 2012 to 2016, we built a conditional autoregressive (CAR) model using a Bayesian hierarchical framework. This approach allows the modelling of the main spatio-temporal properties of water height dynamics over multiple locations, accounting for the effect of river network, geomorphology, and forcing rainfall. In this respect, a proximity matrix based on watershed information was used to capture the spatial structure of gage height measurements in and around South Carolina. The temporal structure was handled by a first-order autoregressive term in the model. Several covariates, including the elevation of the sites and effects of seasonality, were examined, along with daily rainfall amount. A non-normal error structure was used to account for the heavy-tailed distribution of maximum gage heights. The proposed model captured some key features of the flood process such as seasonality and a stronger association between precipitation and flooding during summer season. The model is able to forecast short term flood gage height which is crucial for informed emergency decision. As a byproduct, we also developed a Python library to retrieve and handle environmental data provided by some main agencies in the United States. This library can be of general usefulness for studies requiring rainfall, flow, and geomorphological information over specific areas of the conterminous US.
Collapse
|
74
|
Ha NT, Nguyen HQ, Truong NCQ, Le TL, Thai VN, Pham TL. Estimation of nitrogen and phosphorus concentrations from water quality surrogates using machine learning in the Tri An Reservoir, Vietnam. ENVIRONMENTAL MONITORING AND ASSESSMENT 2020; 192:789. [PMID: 33241485 DOI: 10.1007/s10661-020-08731-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 11/03/2020] [Indexed: 06/11/2023]
Abstract
Surface water eutrophication due to excessive nutrients has become a major environmental problem around the world in the past few decades. Among these nutrients, nitrogen and phosphorus are two of the most important harmful cyanobacterial bloom (HCB) drivers. A reliable prediction of these parameters, therefore, is necessary for the management of rivers, lakes, and reservoirs. The aim of this study is to test the suitability of the powerful machine learning (ML) algorithm, random forest (RF), to provide information on water quality parameters for the Tri An Reservoir (TAR). Three species of nitrogen and phosphorus, including nitrite (N-NO2-), nitrate (N-NO3-), and phosphate (P-PO43-), were empirically estimated using the field observation dataset (2009-2014) of six surrogates of total suspended solids (TSS), total dissolved solids (TDS), turbidity, electrical conductivity (EC), chemical oxygen demand (COD), and biochemical oxygen demand (BOD5). Field data measurement showed that water quality in the TAR was eutrophic with an up-trend of N-NO3- and P-PO43- during the study period. The RF regression model was reliable for N-NO2-, N-NO3-, and P-PO43- prediction with a high R2 of 0.812-0.844 for the training phase (2009-2012) and 0.888-0.903 for the validation phase (2013-2014). The results of land use and land cover change (LUCC) revealed that deforestation and shifting agriculture in the upper region of the basin were the major factors increasing nutrient loading in the TAR. Among the meteorological parameters, rainfall pattern was found to be one of the most influential factors in eutrophication, followed by average sunshine hour. Our results are expected to provide an advanced assessment tool for predicting nutrient loading and for giving an early warning of HCB in the TAR.
Collapse
Affiliation(s)
- Nam-Thang Ha
- Environmental Research Institute, School of Science, The University of Waikato, Hamilton, 3216, New Zealand
- Faculty of Fisheries, The University of Agriculture and Forestry, Hue University, Hue, 530000, Vietnam
| | - Hao Quang Nguyen
- Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8572, Japan
| | | | - Thi Luom Le
- Dong Nai Technical Resources and Environment Center, Dong Khoi Street, Tan Hiep Ward, Bien Hoa City, Dong Nai Province, 810000, Vietnam
| | - Van Nam Thai
- Ho Chi Minh City University of Technology (HUTECH), 475A Dien Bien Phu Street, Binh Thanh District, Ho Chi Minh City, 700000, Vietnam
| | - Thanh Luu Pham
- Institute of Tropical Biology, Vietnam Academy of Science and Technology (VAST), 85 Tran Quoc Toan Street, District 3, Ho Chi Minh City, 700000, Vietnam.
- Graduate University of Science and Technology, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Street, Cau Giay district, Hanoi, 100000, Vietnam.
| |
Collapse
|
75
|
Modeling the Impacts of Climate Change on Crop Yield and Irrigation in the Monocacy River Watershed, USA. CLIMATE 2020. [DOI: 10.3390/cli8120139] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Crop yield depends on multiple factors, including climate conditions, soil characteristics, and available water. The objective of this study was to evaluate the impact of projected temperature and precipitation changes on crop yields in the Monocacy River Watershed in the Mid-Atlantic United States based on climate change scenarios. The Soil and Water Assessment Tool (SWAT) was applied to simulate watershed hydrology and crop yield. To evaluate the effect of future climate projections, four global climate models (GCMs) and three representative concentration pathways (RCP 4.5, 6, and 8.5) were used in the SWAT model. According to all GCMs and RCPs, a warmer climate with a wetter Autumn and Spring and a drier late Summer season is anticipated by mid and late century in this region. To evaluate future management strategies, water budget and crop yields were assessed for two scenarios: current rainfed and adaptive irrigated conditions. Irrigation would improve corn yields during mid-century across all scenarios. However, prolonged irrigation would have a negative impact due to nutrients runoff on both corn and soybean yields compared to rainfed condition. Decision tree analysis indicated that corn and soybean yields are most influenced by soil moisture, temperature, and precipitation as well as the water management practice used (i.e., rainfed or irrigated). The computed values from the SWAT modeling can be used as guidelines for water resource managers in this watershed to plan for projected water shortages and manage crop yields based on projected climate change conditions.
Collapse
|
76
|
Combined Approach Using Clustering-Random Forest to Evaluate Partial Discharge Patterns in Hydro Generators. ENERGIES 2020. [DOI: 10.3390/en13225992] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The measurement and analysis of partial discharges (PD) are like medical examinations, such as Electrocardiogram (ECG), in which there are preestablished criteria. However, each patient will present his particularities that will not necessarily imply his condemnation. The consolidated method for PD processing has high qualifications in the statistical analysis of insulation status of electric generators. However, although the IEEE 1434 standard has well-established standards, it will not always be simple to classify signals obtained in the measurement of the hydro generator coupler due to variations in the same type of PD incidence that may occur as a result of the uniqueness of each machine subject to staff evaluation. In order to streamline the machine diagnostic process, a tool is suggested in this article that will provide this signal classification feature. These measurements will be established in groups that represent each known form of partial discharge established by the literature. It was combined with supervised and unsupervised techniques to create a hybrid method that identified the patterns and classified the measurement signals, with a high degree of precision. This paper proposes the use of data-mining techniques based on clustering to group the characteristic patterns of PD in hydro generators, defined in standards. Then, random forest decision trees were trained to classify cases from new measurements. A comparative analysis was performed among eight clustering algorithms and random forest for choosing which is the superior combination to make a better classification of the equipment diagnosis. R2 was used for assessing the data trend.
Collapse
|
77
|
An Improved Approach for Downscaling Coarse-Resolution Thermal Data by Minimizing the Spatial Averaging Biases in Random Forest. REMOTE SENSING 2020. [DOI: 10.3390/rs12213507] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Land surface temperature (LST) plays a fundamental role in various geophysical processes at varying spatial and temporal scales. Satellite-based observations of LST provide a viable option for monitoring the spatial-temporal evolution of these processes. Downscaling is a widely adopted approach for solving the spatial-temporal trade-off associated with satellite-based observations of LST. However, despite the advances made in the field of LST downscaling, issues related to spatial averaging in the downscaling methodologies greatly hamper the utility of coarse-resolution thermal data for downscaling applications in complex environments. In this study, an improved LST downscaling approach based on random forest (RF) regression is presented. The proposed approach addresses issues related to spatial averaging biases associated with the downscaling model developed at the coarse resolution. The approach was applied to downscale the coarse-resolution Satellite Application Facility on Land Surface Analysis (LSA-SAF) LST product derived from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) sensor aboard the Meteosat Second Generation (MSG) weather satellite. The LSA-SAF product was downscaled to a spatial resolution of ~30 m, based on predictor variables derived from Sentinel 2, and the Advanced Land Observing Satellite (ALOS) digital elevation model (DEM). Quantitatively and qualitatively, better downscaling results were obtained using the proposed approach in comparison to the conventional approach of downscaling LST using RF widely adopted in LST downscaling studies. The enhanced performance indicates that the proposed approach has the ability to reduce the spatial averaging biases inherent in the LST downscaling methodology and thus is more suitable for downscaling applications in complex environments.
Collapse
|
78
|
|
79
|
Combining Multi-Sensor Satellite Imagery to Improve Long-Term Monitoring of Temporary Surface Water Bodies in the Senegal River Floodplain. REMOTE SENSING 2020. [DOI: 10.3390/rs12193157] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Accurate monitoring of surface water bodies is essential in numerous hydrological and agricultural applications. Combining imagery from multiple sensors can improve long-term monitoring; however, the benefits derived from each sensor and the methods to automate long-term water mapping must be better understood across varying periods and in heterogeneous water environments. All available observations from Landsat 7, Landsat 8, Sentinel-2 and MODIS over 1999–2019 are processed in Google Earth Engines to evaluate and compare the benefits of single and multi-sensor approaches in long-term water monitoring of temporary water bodies, against extensive ground truth data from the Senegal River floodplain. Otsu automatic thresholding is compared with default thresholds and site-specific calibrated thresholds to improve Modified Normalized Difference Water Index (MNDWI) classification accuracy. Otsu thresholding leads to the lowest Root Mean Squared Error (RMSE) and high overall accuracies on selected Sentinel-2 and Landsat 8 images, but performance declines when applied to long-term monitoring compared to default or site-specific thresholds. On MODIS imagery, calibrated thresholds are crucial to improve classification in heterogeneous water environments, and results highlight excellent accuracies even in small (19 km2) water bodies despite the 500 m spatial resolution. Over 1999–2019, MODIS observations reduce average daily RMSE by 48% compared to the full Landsat 7 and 8 archive and by 51% compared to the published Global Surface Water datasets. Results reveal the need to integrate coarser MODIS observations in regional and global long-term surface water datasets, to accurately capture flood dynamics, overlooked by the full Landsat time series before 2013. From 2013, the Landsat 7 and Landsat 8 constellation becomes sufficient, and integrating MODIS observations degrades performance marginally. Combining Landsat and Sentinel-2 yields modest improvements after 2015. These results have important implications to guide the development of multi-sensor products and for applications across large wetlands and floodplains.
Collapse
|
80
|
Uncertainty Analysis of Monthly Precipitation in GCMs Using Multiple Bias Correction Methods under Different RCPs. SUSTAINABILITY 2020. [DOI: 10.3390/su12187508] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This study quantified the uncertainties in historical and future average monthly precipitation based on different bias correction methods, General Circulation Models (GCMs), Representative Concentration Pathways (RCPs), projection periods, and locations within the study area (i.e., the coastal and inland areas of South Korea). The GCMs were downscaled using deep learning, random forest, and nine quantile mapping bias correction methods for 22 gauge stations in South Korea. Data from the Korean Meteorology Administration (1970–2005) were used as the reference data in this study. Two statistical measures, the standard deviation and interquartile range, were used to quantify the uncertainties. The probability distribution density was used to assess the similarity/variation in rainfall distributions. For the historical period, the uncertainty in the selection of bias correction methods was greater than that in the selection of GCMs, whereas the opposite pattern was observed for the projection period. The projection period had the lowest level of uncertainty in the selection of RCP scenarios, and for the future, the uncertainly related to the time period was slightly lower than that for the other sources but was much greater than that for the RCP selection. In addition, it was clear that the level of uncertainty of inland areas is much lower than that of coastal areas. The uncertainty in the selection of the GCMs was slightly greater than that in the selection of the bias correction method. Therefore, the uncertainty in the selection of coastal areas was intermediate between the selection of bias correction methods and GCMs. This paper contributes to an improved understanding of the uncertainties in climate change projections arising from various sources.
Collapse
|
81
|
Abstract
Reliable seasonal prediction of groundwater levels is not always possible when the quality and the amount of available on-site groundwater data are limited. In the present work, a hybrid K-Nearest Neighbor-Random Forest (KNN-RF) is used for the prediction of variations in groundwater levels (L) of an aquifer with the groundwater relatively close to the surface (<10 m) is proposed. First, the time-series smoothing methods are applied to improve the quality of groundwater data. Then, the ensemble K-Nearest Neighbor-Random Forest (KNN-RF) model is treated using hydro-climatic data for the prediction of variations in the levels of the groundwater tables up to three months ahead. Climatic and groundwater data collected from eastern Rwanda were used for validation of the model on a rolling window basis. Potential predictors were: the observed daily mean temperature (T), precipitation (P), and daily maximum solar radiation (S). Previous day’s precipitation P (t − 1), solar radiation S (t), temperature T (t), and groundwater level L (t) showed the highest variation in the fluctuations of the groundwater tables. The KNN-RF model presents its results in an intelligible manner. Experimental results have confirmed the high performance of the proposed model in terms of root mean square error (RMSE), mean absolute error (MAE), Nash–Sutcliffe (NSE), and coefficient of determination (R2).
Collapse
|
82
|
Artificial intelligence for sustainability: Challenges, opportunities, and a research agenda. INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT 2020. [DOI: 10.1016/j.ijinfomgt.2020.102104] [Citation(s) in RCA: 113] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
83
|
Tyralis H, Papacharalampous G, Langousis A. Super ensemble learning for daily streamflow forecasting: large-scale demonstration and comparison with multiple machine learning algorithms. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05172-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
84
|
Assessment of Native Radar Reflectivity and Radar Rainfall Estimates for Discharge Forecasting in Mountain Catchments with a Random Forest Model. REMOTE SENSING 2020. [DOI: 10.3390/rs12121986] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Discharge forecasting is a key component for early warning systems and extremely useful for decision makers. Forecasting models require accurate rainfall estimations of high spatial resolution and other geomorphological characteristics of the catchment, which are rarely available in remote mountain regions such as the Andean highlands. While radar data is available in some mountain areas, the absence of a well distributed rain gauge network makes it hard to obtain accurate rainfall maps. Thus, this study explored a Random Forest model and its ability to leverage native radar data (i.e., reflectivity) by providing a simplified but efficient discharge forecasting model for a representative mountain catchment in the southern Andes of Ecuador. This model was compared with another that used as input derived radar rainfall (i.e., rainfall depth), obtained after the transformation from reflectivity to rainfall rate by using a local Z-R relation and a rain gauge-based bias adjustment. In addition, the influence of a soil moisture proxy was evaluated. Radar and runoff data from April 2015 to June 2017 were used. Results showed that (i) model performance was similar by using either native or derived radar data as inputs (0.66 < NSE < 0.75; 0.72 < KGE < 0.78). Thus, exhaustive pre-processing for obtaining radar rainfall estimates can be avoided for discharge forecasting. (ii) Soil moisture representation as input of the model did not significantly improve model performance (i.e., NSE increased from 0.66 to 0.68). Finally, this native radar data-based model constitutes a promising alternative for discharge forecasting in remote mountain regions where ground monitoring is scarce and hardly available.
Collapse
|
85
|
Use of Machine Learning in Evaluation of Drought Perception in Irrigated Agriculture: The Case of an Irrigated Perimeter in Brazil. WATER 2020. [DOI: 10.3390/w12061546] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This study aimed to understand the perception of drought among farmers, in order to support decision-making in the water allocation process. This study was carried out in the Tabuleiro de Russas irrigated perimeter, in northeast Brazil, over the drought period of 2012–2018. Two analyses were conducted: (i) drought characterization, using the Standardized Precipitation Index (SPI) based on drought duration and frequency criteria; and (ii) analysis of farmers’ perceptions of drought via selection of explanatory variables using the Random Forest (RF) and the Decision Tree (DT) methods. The 2012–2018 drought period was defined as a meteorological phenomenon by local farmers; however, an SPI evaluation indicated that the drought was of a hydrological nature. According to the RF analysis, four of the nine study variables were more statistically important than the others in influencing farmers’ perception of drought: number of cultivated land plots, farmer’s age, years of experience in the agriculture sector, and education level. These results were confirmed using DT analysis. Understanding the relationship between these variables and farmers’ perception of drought could aid in the development of an adaptation strategy to water deficit scenarios. Farmers’ perception can be beneficial in reducing conflicts, adopting proactive management practices, and developing a holistic and efficient early warning drought system.
Collapse
|
86
|
Machine Learning Methods for Improved Understanding of a Pumping Test in Heterogeneous Aquifers. WATER 2020. [DOI: 10.3390/w12051342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Pumping tests are very important means for investigating aquifer properties; however, interpreting the data using common analytical solutions become invalid in complex aquifer systems. The paper aims to explore the potential of machine learning methods in retrieving the pumping tests information in a field site in the Democratic Republic of Congo. A newly planned mining site with a pumping test of three pumping wells and 28 observation wells over one month was chosen to analyze the significance of machine learning methods in the pumping test analysis. Widely used machine learning methods, including correlation, cluster, time-series analysis, artificial neural network (ANN), support vector machine (SVR), random forest (RF) method, and linear regression, are all used in this study. Correlation and cluster analyses among wells provide visual pictures of possible hydraulic connections. The pathway with the best permeability ranges from the depth of 250 m to 350 m. Time-series analysis perfectly captured changes of drawdowns within the three pumping wells. The RF method is found to have the higher accuracy and the lower sensitivity to model parameters than ANN and SVR methods. The coupling of the linear regressive model and analytical solutions is applied to estimate hydraulic conductivities. The results found that ML methods can significantly and effectively improve our understanding of pumping tests by revealing inherent information hidden in those tests.
Collapse
|
87
|
Establishing an Empirical Model for Surface Soil Moisture Retrieval at the U.S. Climate Reference Network Using Sentinel-1 Backscatter and Ancillary Data. REMOTE SENSING 2020. [DOI: 10.3390/rs12081242] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Progress in sensor technologies has allowed real-time monitoring of soil water. It is a challenge to model soil water content based on remote sensing data. Here, we retrieved and modeled surface soil moisture (SSM) at the U.S. Climate Reference Network (USCRN) stations using Sentinel-1 backscatter data from 2016 to 2018 and ancillary data. Empirical machine learning models were established between soil water content measured at the USCRN stations with Sentinel-1 data from 2016 to 2017, the National Land Cover Dataset, terrain parameters, and Polaris soil data, and were evaluated in 2018 at the same USCRN stations. The Cubist model performed better than the multiple linear regression (MLR) and Random Forest (RF) model (R2 = 0.68 and RMSE = 0.06 m3 m-3 for validation). The Cubist model performed best in Shrub/Scrub, followed by Herbaceous and Cultivated Crops but poorly in Hay/Pasture. The success of SSM retrieval was mostly attributed to soil properties, followed by Sentinel-1 backscatter data, terrain parameters, and land cover. The approach shows the potential for retrieving SSM using Sentinel-1 data in a combination of high-resolution ancillary data across the conterminous United States (CONUS). Future work is required to improve the model performance by including more SSM network measurements, assimilating Sentinel-1 data with other microwave, optical and thermal remote sensing products. There is also a need to improve the spatial resolution and accuracy of land surface parameter products (e.g., soil properties and terrain parameters) at the regional and global scales.
Collapse
|
88
|
Castrillo M, García ÁL. Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods. WATER RESEARCH 2020; 172:115490. [PMID: 31972414 DOI: 10.1016/j.watres.2020.115490] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 12/24/2019] [Accepted: 01/07/2020] [Indexed: 06/10/2023]
Abstract
Continuous high frequency water quality monitoring is becoming a critical task to support water management. Despite the advancements in sensor technologies, certain variables cannot be easily and/or economically monitored in-situ and in real time. In these cases, surrogate measures can be used to make estimations by means of data-driven models. In this work, variables that are commonly measured in-situ are used as surrogates to estimate the concentrations of nutrients in a rural catchment and in an urban one, making use of machine learning models, specifically Random Forests. The results are compared with those of linear modelling using the same number of surrogates, obtaining a reduction in the Root Mean Squared Error (RMSE) of up to 60.1%. The profit from including up to seven surrogate sensors was computed, concluding that adding more than 4 and 5 sensors in each of the catchments respectively was not worthy in terms of error improvement.
Collapse
Affiliation(s)
- María Castrillo
- Instituto de Física de Cantabria (UC - CSIC), Avda. Los Castros S/n, 39005, Santander, Spain.
| | - Álvaro López García
- Instituto de Física de Cantabria (UC - CSIC), Avda. Los Castros S/n, 39005, Santander, Spain.
| |
Collapse
|
89
|
Klåvus A, Kokla M, Noerman S, Koistinen VM, Tuomainen M, Zarei I, Meuronen T, Häkkinen MR, Rummukainen S, Farizah Babu A, Sallinen T, Kärkkäinen O, Paananen J, Broadhurst D, Brunius C, Hanhineva K. "notame": Workflow for Non-Targeted LC-MS Metabolic Profiling. Metabolites 2020; 10:E135. [PMID: 32244411 PMCID: PMC7240970 DOI: 10.3390/metabo10040135] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 03/25/2020] [Accepted: 03/28/2020] [Indexed: 02/06/2023] Open
Abstract
Metabolomics analysis generates vast arrays of data, necessitating comprehensive workflows involving expertise in analytics, biochemistry and bioinformatics in order to provide coherent and high-quality data that enable discovery of robust and biologically significant metabolic findings. In this protocol article, we introduce notame, an analytical workflow for non-targeted metabolic profiling approaches, utilizing liquid chromatography-mass spectrometry analysis. We provide an overview of lab protocols and statistical methods that we commonly practice for the analysis of nutritional metabolomics data. The paper is divided into three main sections: the first and second sections introducing the background and the study designs available for metabolomics research and the third section describing in detail the steps of the main methods and protocols used to produce, preprocess and statistically analyze metabolomics data and, finally, to identify and interpret the compounds that have emerged as interesting.
Collapse
Affiliation(s)
- Anton Klåvus
- Department of Clinical Nutrition and Public Health, University of Eastern Finland, 70210 Kuopio, Finland; (S.N.); (V.M.K.); (M.T.); (I.Z.); (T.M.); (A.F.B.); (T.S.)
| | - Marietta Kokla
- Department of Clinical Nutrition and Public Health, University of Eastern Finland, 70210 Kuopio, Finland; (S.N.); (V.M.K.); (M.T.); (I.Z.); (T.M.); (A.F.B.); (T.S.)
| | - Stefania Noerman
- Department of Clinical Nutrition and Public Health, University of Eastern Finland, 70210 Kuopio, Finland; (S.N.); (V.M.K.); (M.T.); (I.Z.); (T.M.); (A.F.B.); (T.S.)
| | - Ville M. Koistinen
- Department of Clinical Nutrition and Public Health, University of Eastern Finland, 70210 Kuopio, Finland; (S.N.); (V.M.K.); (M.T.); (I.Z.); (T.M.); (A.F.B.); (T.S.)
| | - Marjo Tuomainen
- Department of Clinical Nutrition and Public Health, University of Eastern Finland, 70210 Kuopio, Finland; (S.N.); (V.M.K.); (M.T.); (I.Z.); (T.M.); (A.F.B.); (T.S.)
| | - Iman Zarei
- Department of Clinical Nutrition and Public Health, University of Eastern Finland, 70210 Kuopio, Finland; (S.N.); (V.M.K.); (M.T.); (I.Z.); (T.M.); (A.F.B.); (T.S.)
| | - Topi Meuronen
- Department of Clinical Nutrition and Public Health, University of Eastern Finland, 70210 Kuopio, Finland; (S.N.); (V.M.K.); (M.T.); (I.Z.); (T.M.); (A.F.B.); (T.S.)
| | - Merja R. Häkkinen
- School of Pharmacy, University of Eastern Finland, 70210 Kuopio, Finland; (M.R.H.); (S.R.); (O.K.)
| | - Soile Rummukainen
- School of Pharmacy, University of Eastern Finland, 70210 Kuopio, Finland; (M.R.H.); (S.R.); (O.K.)
| | - Ambrin Farizah Babu
- Department of Clinical Nutrition and Public Health, University of Eastern Finland, 70210 Kuopio, Finland; (S.N.); (V.M.K.); (M.T.); (I.Z.); (T.M.); (A.F.B.); (T.S.)
| | - Taisa Sallinen
- Department of Clinical Nutrition and Public Health, University of Eastern Finland, 70210 Kuopio, Finland; (S.N.); (V.M.K.); (M.T.); (I.Z.); (T.M.); (A.F.B.); (T.S.)
- School of Pharmacy, University of Eastern Finland, 70210 Kuopio, Finland; (M.R.H.); (S.R.); (O.K.)
| | - Olli Kärkkäinen
- School of Pharmacy, University of Eastern Finland, 70210 Kuopio, Finland; (M.R.H.); (S.R.); (O.K.)
| | - Jussi Paananen
- Institute of Biomedicine, University of Eastern Finland, 70210 Kuopio, Finland;
| | - David Broadhurst
- Centre for Integrative Metabolomics & Computational Biology, School of Science, Edith Cowan University, Joondalup, WA 6027, Australia;
| | - Carl Brunius
- Department of Biology and Biological Engineering, Chalmers University of Technology, 41296 Gothenburg, Sweden;
- Chalmers Mass Spectrometry Infrastructure, Chalmers University of Technology, 41296 Gothenburg, Sweden
| | - Kati Hanhineva
- Department of Clinical Nutrition and Public Health, University of Eastern Finland, 70210 Kuopio, Finland; (S.N.); (V.M.K.); (M.T.); (I.Z.); (T.M.); (A.F.B.); (T.S.)
- Department of Biology and Biological Engineering, Chalmers University of Technology, 41296 Gothenburg, Sweden;
- Department of Biochemistry, Food Chemistry and Food Development unit, University of Turku, 20014 Turun yliopisto, Finland
| |
Collapse
|
90
|
Soil Temperature Dynamics at Hillslope Scale—Field Observation and Machine Learning-Based Approach. WATER 2020. [DOI: 10.3390/w12030713] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Soil temperature plays an important role in understanding hydrological, ecological, meteorological, and land surface processes. However, studies related to soil temperature variability are very scarce in various parts of the world, especially in the Indian Himalayan Region (IHR). Thus, this study aims to analyze the spatio-temporal variability of soil temperature in two nested hillslopes of the lesser Himalaya and to check the efficiency of different machine learning algorithms to estimate soil temperature in the data-scarce region. To accomplish this goal, grassed (GA) and agro-forested (AgF) hillslopes were instrumented with Odyssey water level and decagon soil moisture and temperature sensors. The average soil temperature of the south aspect hillslope (i.e., GA hillslope) was higher than the north aspect hillslope (i.e., AgF hillslope). After analyzing 40 rainfall events from both hillslopes, it was observed that a rainfall duration of greater than 7.5 h or an event with an average rainfall intensity greater than 7.5 mm/h results in more than 2 °C soil temperature drop. Further, a drop in soil temperature less than 1 °C was also observed during very high-intensity rainfall which has a very short event duration. During the rainy season, the soil temperature drop of the GA hillslope is higher than the AgF hillslope as the former one infiltrates more water. This observation indicates the significant correlation between soil moisture rise and soil temperature drop. The potential of four machine learning algorithms was also explored in predicting soil temperature under data-scarce conditions. Among the four machine learning algorithms, an extreme gradient boosting system (XGBoost) performed better for both the hillslopes followed by random forests (RF), multilayer perceptron (MLP), and support vector machine (SVMs). The addition of rainfall to meteorological and meteorological + soil moisture datasets did not improve the models considerably. However, the addition of soil moisture to meteorological parameters improved the model significantly.
Collapse
|
91
|
Kim Y, Johnson MS, Knox SH, Black TA, Dalmagro HJ, Kang M, Kim J, Baldocchi D. Gap-filling approaches for eddy covariance methane fluxes: A comparison of three machine learning algorithms and a traditional method with principal component analysis. GLOBAL CHANGE BIOLOGY 2020; 26:1499-1518. [PMID: 31553826 DOI: 10.1111/gcb.14845] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 09/12/2019] [Indexed: 06/10/2023]
Abstract
Methane flux (FCH4 ) measurements using the eddy covariance technique have increased over the past decade. FCH4 measurements commonly include data gaps, as is the case with CO2 and energy fluxes. However, gap-filling FCH4 data are more challenging than other fluxes due to its unique characteristics including multidriver dependency, variabilities across multiple timescales, nonstationarity, spatial heterogeneity of flux footprints, and lagged influence of biophysical drivers. Some researchers have applied a marginal distribution sampling (MDS) algorithm, a standard gap-filling method for other fluxes, to FCH4 datasets, and others have applied artificial neural networks (ANN) to resolve the challenging characteristics of FCH4 . However, there is still no consensus regarding FCH4 gap-filling methods due to limited comparative research. We are not aware of the applications of machine learning (ML) algorithms beyond ANN to FCH4 datasets. Here, we compare the performance of MDS and three ML algorithms (ANN, random forest [RF], and support vector machine [SVM]) using multiple combinations of ancillary variables. In addition, we applied principal component analysis (PCA) as an input to the algorithms to address multidriver dependency of FCH4 and reduce the internal complexity of the algorithmic structures. We applied this approach to five benchmark FCH4 datasets from both natural and managed systems located in temperate and tropical wetlands and rice paddies. Results indicate that PCA improved the performance of MDS compared to traditional inputs. ML algorithms performed better when using all available biophysical variables compared to using PCA-derived inputs. Overall, RF was found to outperform other techniques for all sites. We found gap-filling uncertainty is much larger than measurement uncertainty in accumulated CH4 budget. Therefore, the approach used for FCH4 gap filling can have important implications for characterizing annual ecosystem-scale methane budgets, the accuracy of which is important for evaluating natural and managed systems and their interactions with global change processes.
Collapse
Affiliation(s)
- Yeonuk Kim
- Institute for Resources Environment and Sustainability, University of British Columbia, Vancouver, BC, Canada
| | - Mark S Johnson
- Institute for Resources Environment and Sustainability, University of British Columbia, Vancouver, BC, Canada
- Department of Earth, Ocean and Atmospheric Sciences, University of British Columbia, Vancouver, BC, Canada
| | - Sara H Knox
- Department of Geography, University of British Columbia, Vancouver, BC, Canada
| | - T Andrew Black
- Faculty of Land and Food Systems, University of British Columbia, Vancouver, BC, Canada
| | - Higo J Dalmagro
- Environmental Sciences Graduate Program, University of Cuiabá, Cuiabá, Brazil
| | - Minseok Kang
- National Center for AgroMeteorology, Seoul, South Korea
| | - Joon Kim
- National Center for AgroMeteorology, Seoul, South Korea
- Department of Landscape Architecture & Rural Systems Engineering, Seoul National University, Seoul, South Korea
- Interdisciplinary Program in Agricultural & Forest Meteorology, Seoul National University, Seoul, South Korea
| | - Dennis Baldocchi
- Department of Environmental Science, Policy and Management, University of California, Berkeley, CA, USA
| |
Collapse
|
92
|
Mapping Forest Composition with Landsat Time Series: An Evaluation of Seasonal Composites and Harmonic Regression. REMOTE SENSING 2020. [DOI: 10.3390/rs12040610] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The Landsat program has long supported pioneering research on the recovery of forest information by remote sensing technologies for several decades, and efforts to improve the thematic resolution and accuracy of forest compositional products remains an area of continued innovation. Recent development and application of Landsat time series analysis offers unique opportunities for quantifying seasonality and trend components among different forest types for developing alternative feature sets for forest vegetation mapping. Within a large forested landscape in Southeastern Ohio, USA, we examined the use of harmonic metrics developed from time series of all available Landsat-8 observations (2013–2019) relative to seasonal image composites, including accompanying spectral components and vegetation indices. A reference dataset among three sources was integrated and used to categorize forest inventory data into seven forest type classes and gradient compositional response. Results showed that the combination of harmonic metrics and topographic variables achieved an accuracy agreement with the reference data of 74.9% relative to seasonal composites (71.6%) and spectral indices (70.3%). Differences in agreement were attributed to improved discrimination of three heterogeneous upland hardwood classes and an early-successional, young forest class, all forest types of primary interest among managers across the region. Variable importance metrics often identified the cosine and sine terms that quantify the seasonality in spectral values in the harmonic feature space, suggesting these aspects best support the characterization of forest types at greater thematic detail than seasonal compositing procedures. This study demonstrates how advanced time series metrics can improve forest type modeling and forest gradient quantifications, thus showcasing a need for continued exploration of such approaches across different forest types.
Collapse
|
93
|
Probabilistic Hydrological Post-Processing at Scale: Why and How to Apply Machine-Learning Quantile Regression Algorithms. WATER 2019. [DOI: 10.3390/w11102126] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We conduct a large-scale benchmark experiment aiming to advance the use of machine-learning quantile regression algorithms for probabilistic hydrological post-processing “at scale” within operational contexts. The experiment is set up using 34-year-long daily time series of precipitation, temperature, evapotranspiration and streamflow for 511 catchments over the contiguous United States. Point hydrological predictions are obtained using the Génie Rural à 4 paramètres Journalier (GR4J) hydrological model and exploited as predictor variables within quantile regression settings. Six machine-learning quantile regression algorithms and their equal-weight combiner are applied to predict conditional quantiles of the hydrological model errors. The individual algorithms are quantile regression, generalized random forests for quantile regression, generalized random forests for quantile regression emulating quantile regression forests, gradient boosting machine, model-based boosting with linear models as base learners and quantile regression neural networks. The conditional quantiles of the hydrological model errors are transformed to conditional quantiles of daily streamflow, which are finally assessed using proper performance scores and benchmarking. The assessment concerns various levels of predictive quantiles and central prediction intervals, while it is made both independently of the flow magnitude and conditional upon this magnitude. Key aspects of the developed methodological framework are highlighted, and practical recommendations are formulated. In technical hydro-meteorological applications, the algorithms should be applied preferably in a way that maximizes the benefits and reduces the risks from their use. This can be achieved by (i) combining algorithms (e.g., by averaging their predictions) and (ii) integrating algorithms within systematic frameworks (i.e., by using the algorithms according to their identified skills), as our large-scale results point out.
Collapse
|
94
|
Abstract
This study investigated the potential of random forest (RF) algorithms for regionalizing the parameters of an hourly hydrological model. The relationships between model parameters and climate/landscape catchment descriptors were multidimensional and exhibited nonlinear features. In this case, machine-learning tools offered the option of efficiently handling such relationships using a large sample of data. The performance of the regionalized model using RF was assessed in comparison with local calibration and two benchmark regionalization approaches. Two catchment sets were considered: (1) A target pseudo-ungauged catchment set was composed of 120 urban ungauged catchments and (2) 2105 gauged American and French catchments were used for constructing the RF. By using pseudo-ungauged urban catchments, we aimed at assessing the potential of the RF to detect the specificities of the urban catchments. Results showed that RF-regionalized models allowed for slightly better streamflow simulations on ungauged sites compared with benchmark regionalization approaches. Yet, constructed RFs were weakly sensitive to the urbanization features of the catchments, which prevents their use in straightforward scenarios of the hydrological impacts of urbanization.
Collapse
|
95
|
Duan ZY, Wang LM, Mammadov M, Lou H, Sun MH. Discriminatory Target Learning: Mining Significant Dependence Relationships from Labeled and Unlabeled Data. ENTROPY 2019; 21:e21050537. [PMID: 33267251 PMCID: PMC7515026 DOI: 10.3390/e21050537] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 05/20/2019] [Accepted: 05/24/2019] [Indexed: 11/16/2022]
Abstract
Machine learning techniques have shown superior predictive power, among which Bayesian network classifiers (BNCs) have remained of great interest due to its capacity to demonstrate complex dependence relationships. Most traditional BNCs tend to build only one model to fit training instances by analyzing independence between attributes using conditional mutual information. However, for different class labels, the conditional dependence relationships may be different rather than invariant when attributes take different values, which may result in classification bias. To address this issue, we propose a novel framework, called discriminatory target learning, which can be regarded as a tradeoff between probabilistic model learned from unlabeled instance at the uncertain end and that learned from labeled training data at the certain end. The final model can discriminately represent the dependence relationships hidden in unlabeled instance with respect to different possible class labels. Taking k-dependence Bayesian classifier as an example, experimental comparison on 42 publicly available datasets indicated that the final model achieved competitive classification performance compared to state-of-the-art learners such as Random forest and averaged one-dependence estimators.
Collapse
Affiliation(s)
- Zhi-Yi Duan
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Li-Min Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Musa Mammadov
- Faculty of Science, Engineering & Built Environment, Deakin University Geelong, Burwood, VIC 3125, Australia
| | - Hua Lou
- Changzhou College of Information Technology, Changzhou 213164, China
| | - Ming-Hui Sun
- College of Computer Science and Technology, Jilin University, Changchun 130012, China
- Correspondence: ; Tel.: +86-0431-8515-9403
| |
Collapse
|