1
|
Wahlquist Y, Sundell J, Soltesz K. Learning pharmacometric covariate model structures with symbolic regression networks. J Pharmacokinet Pharmacodyn 2024; 51:155-167. [PMID: 37864654 DOI: 10.1007/s10928-023-09887-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 09/18/2023] [Indexed: 10/23/2023]
Abstract
Efficiently finding covariate model structures that minimize the need for random effects to describe pharmacological data is challenging. The standard approach focuses on identification of relevant covariates, and present methodology lacks tools for automatic identification of covariate model structures. Although neural networks could potentially be used to approximate covariate-parameter relationships, such approximations are not human-readable and come at the risk of poor generalizability due to high model complexity.In the present study, a novel methodology for the simultaneous selection of covariate model structure and optimization of its parameters is proposed. It is based on symbolic regression, posed as an optimization problem with a smooth loss function. This enables training of the model through back-propagation using efficient gradient computations.Feasibility and effectiveness are demonstrated by application to a clinical pharmacokinetic data set for propofol, containing infusion and blood sample time series from 1031 individuals. The resulting model is compared to a published state-of-the-art model for the same data set. Our methodology finds a covariate model structure and corresponding parameter values with a slightly better fit, while relying on notably fewer covariates than the state-of-the-art model. Unlike contemporary practice, finding the covariate model structure is achieved without an iterative procedure involving manual interactions.
Collapse
Affiliation(s)
- Ylva Wahlquist
- Department of Automatic Control, Lund University, P.O. Box 118, 221 00, Lund, Sweden.
| | - Jesper Sundell
- Department of Automatic Control, Lund University, P.O. Box 118, 221 00, Lund, Sweden
| | - Kristian Soltesz
- Department of Automatic Control, Lund University, P.O. Box 118, 221 00, Lund, Sweden
| |
Collapse
|
2
|
Hu W, Zhang L. First-principles, machine learning and symbolic regression modelling for organic molecule adsorption on two-dimensional CaO surface. J Mol Graph Model 2023; 124:108530. [PMID: 37321063 DOI: 10.1016/j.jmgm.2023.108530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 05/15/2023] [Accepted: 05/22/2023] [Indexed: 06/17/2023]
Abstract
Data-driven methods are receiving significant attention in recent years for chemical and materials researches; however, more works should be done to leverage the new paradigm to model and analyze the adsorption of the organic molecules on low-dimensional surfaces beyond using the traditional simulation methods. In this manuscript, we employ machine learning and symbolic regression method coupled with DFT calculations to investigate the adsorption of atmospheric organic molecules on a low-dimensional metal oxide mineral system. The starting dataset consisting of the atomic structures of the organic/metal oxide interfaces are obtained via the density functional theory (DFT) calculation and different machine learning algorithms are compared, with the random forest algorithm achieving high accuracies for the target output. The feature ranking step identifies that the polarizability and bond type of the organic adsorbates are the key descriptors for the adsorption energy output. In addition, the symbolic regression coupled with genetic programming automatically identifies a series of hybrid new descriptors displaying improved relevance with the target output, suggesting the viability of symbolic regression to complement the traditional machine learning techniques for the descriptor design and fast modeling purposes. This manuscript provides a framework for effectively modeling and analyzing the adsorption of the organic molecules on low-dimensional surfaces via comprehensive data-driven approaches.
Collapse
Affiliation(s)
- Wenguang Hu
- Department of Materials Physics, School of Chemistry and Materials Science, Nanjing University of Information Science & Technology, 210044, Nanjing, China
| | - Lei Zhang
- Department of Materials Physics, School of Chemistry and Materials Science, Nanjing University of Information Science & Technology, 210044, Nanjing, China.
| |
Collapse
|
3
|
Liu J, Li W, Yu L, Wu M, Sun L, Li W, Li Y. SNR: Symbolic network-based rectifiable learning framework for symbolic regression. Neural Netw 2023; 165:1021-1034. [PMID: 37467584 DOI: 10.1016/j.neunet.2023.06.046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/14/2023] [Accepted: 06/30/2023] [Indexed: 07/21/2023]
Abstract
Symbolic regression (SR) can be utilized to unveil the underlying mathematical expressions that describe a given set of observed data. At present, SR can be categorized into two methods: learning-from-scratch and learning-with-experience. Compared to learning-from-scratch, learning-with-experience yields results that are comparable to those of several benchmarks and incurs significantly lower time costs for obtaining expressions. However, the learning-with-experience model performs poorly in terms of unseen data distributions and lacks a rectification tool, apart from constant optimization, which exhibits limited performance. In this study, we propose a Symbolic Network-based Rectifiable Learning Framework (SNR) that possesses the ability to correct errors. SNR adopts Symbolic Network (SymNet) to represent an expression, and the encoding of SymNet is designed to provide supervised information, with numerous self-generated expressions, to train a policy net (PolicyNet). The training of PolicyNet can offer prior knowledge to guide effective searches. Subsequently, the incorrectly predicted expressions are revised via a rectification mechanism. This rectification mechanism endows SNR with broader applicability. Experimental results demonstrate that our proposed method achieves the highest averaged coefficient of determination on self-generated datasets when compared with other state-of-the-art methods and yields more accurate results in public datasets.
Collapse
Affiliation(s)
- Jingyi Liu
- Institute of Semiconductors, Chinese Academy of Sciences, 100083, Beijing, China; Center of Materials Science and Optoelectronics Engineering & School of Integrated Circuits, University of Chinese Academy of Sciences, 100049, Beijing, China; Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, 100083, Beijing, China.
| | - Weijun Li
- Institute of Semiconductors, Chinese Academy of Sciences, 100083, Beijing, China; Center of Materials Science and Optoelectronics Engineering & School of Integrated Circuits, University of Chinese Academy of Sciences, 100049, Beijing, China; Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, 100083, Beijing, China.
| | - Lina Yu
- Institute of Semiconductors, Chinese Academy of Sciences, 100083, Beijing, China; Center of Materials Science and Optoelectronics Engineering & School of Integrated Circuits, University of Chinese Academy of Sciences, 100049, Beijing, China; Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, 100083, Beijing, China.
| | - Min Wu
- Institute of Semiconductors, Chinese Academy of Sciences, 100083, Beijing, China; Center of Materials Science and Optoelectronics Engineering & School of Integrated Circuits, University of Chinese Academy of Sciences, 100049, Beijing, China; Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, 100083, Beijing, China.
| | - Linjun Sun
- Institute of Semiconductors, Chinese Academy of Sciences, 100083, Beijing, China; Center of Materials Science and Optoelectronics Engineering & School of Integrated Circuits, University of Chinese Academy of Sciences, 100049, Beijing, China; Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, 100083, Beijing, China.
| | - Wenqiang Li
- Institute of Semiconductors, Chinese Academy of Sciences, 100083, Beijing, China; Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, 100083, Beijing, China.
| | - Yanjie Li
- Institute of Semiconductors, Chinese Academy of Sciences, 100083, Beijing, China; Center of Materials Science and Optoelectronics Engineering & School of Integrated Circuits, University of Chinese Academy of Sciences, 100049, Beijing, China; Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, 100083, Beijing, China.
| |
Collapse
|
4
|
Liu C, Lyu W, Zang X, Zheng F, Zhao W, Xu Q, Lu J. Exploring the factors effecting on carbon emissions in each province in China: A comprehensive study based on symbolic regression, LMDI and Tapio models. Environ Sci Pollut Res Int 2023; 30:87071-87086. [PMID: 37418189 DOI: 10.1007/s11356-023-28608-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 07/01/2023] [Indexed: 07/08/2023]
Abstract
Carbon emission (CE) has led to increasingly severe climate problems. The key to reducing CE is to identify the dominant influencing factors and explore their influence degree. The CE data of 30 provinces from 1997 to 2020 in China were calculated by IPCC method. Based on this, the importance order of six factors included GDP, Industrial Structure (IS), Total Population (TP), Population Structure (PS), Energy Intensity (EI) and Energy Structure (ES) affecting the CE of China's provinces were obtained by using symbolic regression, then the LMDI and the Tapio models were established to deeply explore the influence degree of different factors on CE. The results showed that the 30 provinces were divided into five categories according to the primary factor, GDP was the most important factor, followed by ES and EI, then IS, and the least TP and PS. The growth of per capita GDP promoted the increase of CE, while reduced EI inhibited the increase of CE. The increase of ES promoted CE in some provinces but inhibited in others. The increase of TP weakly promoted the increase of CE. These results can provide some references for governments to formulate relevant CE reduction policies under dual carbon goal.
Collapse
Affiliation(s)
- Chunjing Liu
- Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Department of Environmental Science and Engineering, North China Electric Power University, Baoding, 071003, China
| | - Weiran Lyu
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, 84112, USA
| | - Xuanhao Zang
- Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Department of Environmental Science and Engineering, North China Electric Power University, Baoding, 071003, China
| | - Fei Zheng
- Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Department of Environmental Science and Engineering, North China Electric Power University, Baoding, 071003, China
| | - Wenchang Zhao
- Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Department of Environmental Science and Engineering, North China Electric Power University, Baoding, 071003, China
| | - Qing Xu
- Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Department of Environmental Science and Engineering, North China Electric Power University, Baoding, 071003, China
| | - Jianyi Lu
- Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Department of Environmental Science and Engineering, North China Electric Power University, Baoding, 071003, China.
| |
Collapse
|
5
|
Shakouri B, Ismail I, Safari MJS. Energy loss and contraction coefficients-based vertical sluice gate's discharge coefficient under submerged flow using symbolic regression. Environ Sci Pollut Res Int 2023:10.1007/s11356-023-27388-1. [PMID: 37247139 DOI: 10.1007/s11356-023-27388-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 04/28/2023] [Indexed: 05/30/2023]
Abstract
Accurate calculation of discharge is a critical task in terms of environmental and operational regulations. In the current study, a new approach for determining vertical sluice gates' flow discharge with a minor bias is proposed. Energy-momentum equations are used to characterize the physical expression of the phenomena intended for generation of the coefficient of discharge. The coefficient of discharge is then expressed according to coefficients of energy loss and contraction. Following that, the coefficient of discharge, coefficient of contraction, and coefficient of energy loss are calculated using an optimization approach. Then, dimensional analysis is conducted and regression equations for quantifying the coefficient of energy loss is produced using symbolic regression method. The derived contraction coefficient and energy loss coefficient formulas are accordingly utilized to compute the coefficient of discharge in the vertical sluice gate and also to determine flow discharge. For computing discharge, five different scenarios are considered. The developed approaches' performance is examined against selected benchmarks from the literature. The results show that the symbolic regression method can compute discharge more accurate than its alternatives.
Collapse
Affiliation(s)
- Behzad Shakouri
- Department of Civil Engineering, Urmia University, Urmia, Iran
| | - Imren Ismail
- Department of Agrarian and Industrial, University of Ruse, Ruse, Bulgaria
| | | |
Collapse
|
6
|
Carreres-Prieto D, García JT, Carrillo JM, Vigueras-Rodríguez A. Towards highly economical and accurate wastewater sensors by reduced parts of the LED-visible spectrum. Sci Total Environ 2023; 871:162082. [PMID: 36754331 DOI: 10.1016/j.scitotenv.2023.162082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/02/2023] [Accepted: 02/03/2023] [Indexed: 06/18/2023]
Abstract
Interest is growing in simple, fast and inexpensive systems to analyze urban wastewater quality in real time. In this research project, a methodology is presented for the characterization of COD, BOD5, TSS, TN, and TP of wastewater samples, without the need to alter the samples or use chemical reagents, from a few wavelengths, belonging to the different color groups that compose the visible spectrum in isolation: (380-700 nm): violet (380-427 nm), blue (427-476 nm), cyan (476-497 nm), green (497-570 nm), yellow (570-581 nm), orange (581-618 nm), and red (618-700 nm). In this study, about 650 raw and treated urban wastewater samples from over 43 WWTPs and a total of 36 estimation models based on genetic algorithms have been calculated. Seven models were calculated for each pollutant parameter; one model for each color group of the visible spectrum, except for TN, which includes an additional model combining the wavelengths of the violet and red region of the spectrum. All the calculated models showed high accuracy, with an R2 between 80 and 85 % for COD, BOD5 and TSS, and 66-74 % for TN and TP. The tests carried out have shown the accuracy of the models of the different color groups to be very close to each other. However, it is noted that the models making use of the wavelengths between 497 and 570 nm (green) were the ones that showed the best performance in all the parameters under study. This research work lays the foundations for the development of cheaper, faster, and simpler wastewater monitoring and characterization equipment.
Collapse
Affiliation(s)
- Daniel Carreres-Prieto
- Department of Mining and Civil Engineering, Universidad Politécnica de Cartagena, 30202 Cartagena, Spain.
| | - Juan T García
- Department of Mining and Civil Engineering, Universidad Politécnica de Cartagena, 30202 Cartagena, Spain.
| | - José M Carrillo
- Department of Mining and Civil Engineering, Universidad Politécnica de Cartagena, 30202 Cartagena, Spain
| | - Antonio Vigueras-Rodríguez
- Department of Mining and Civil Engineering, Universidad Politécnica de Cartagena, 30202 Cartagena, Spain
| |
Collapse
|
7
|
Popov S, Lazarev M, Belavin V, Derkach D, Ustyuzhanin A. Symbolic expression generation via variational auto-encoder. PeerJ Comput Sci 2023; 9:e1241. [PMID: 37346583 PMCID: PMC10280571 DOI: 10.7717/peerj-cs.1241] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 01/17/2023] [Indexed: 06/23/2023]
Abstract
There are many problems in physics, biology, and other natural sciences in which symbolic regression can provide valuable insights and discover new laws of nature. Widespread deep neural networks do not provide interpretable solutions. Meanwhile, symbolic expressions give us a clear relation between observations and the target variable. However, at the moment, there is no dominant solution for the symbolic regression task, and we aim to reduce this gap with our algorithm. In this work, we propose a novel deep learning framework for symbolic expression generation via variational autoencoder (VAE). We suggest using a VAE to generate mathematical expressions, and our training strategy forces generated formulas to fit a given dataset. Our framework allows encoding apriori knowledge of the formulas into fast-check predicates that speed up the optimization process. We compare our method to modern symbolic regression benchmarks and show that our method outperforms the competitors under noisy conditions. The recovery rate of SEGVAE is 65% on the Ngyuen dataset with a noise level of 10%, which is better than the previously reported SOTA by 20%. We demonstrate that this value depends on the dataset and can be even higher.
Collapse
Affiliation(s)
- Sergei Popov
- Department of Computer Science, Higher School of Economics, Moscow, Russia
- National University of Science and Technology MISIS, Moscow, Russia
| | - Mikhail Lazarev
- Department of Computer Science, Higher School of Economics, Moscow, Russia
| | - Vladislav Belavin
- Department of Computer Science, Higher School of Economics, Moscow, Russia
| | - Denis Derkach
- Department of Computer Science, Higher School of Economics, Moscow, Russia
| | - Andrey Ustyuzhanin
- Department of Computer Science, Higher School of Economics, Moscow, Russia
- Constructor University, Bremen, Germany
- Institute for Functional Intelligent Materials, National University of Singapore, Singapore
| |
Collapse
|
8
|
Liu C, Lyu W, Zhao W, Zheng F, Lu J. Exploratory research on influential factors of China's sulfur dioxide emission based on symbolic regression. Environ Monit Assess 2022; 195:41. [PMID: 36301357 DOI: 10.1007/s10661-022-10595-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 10/07/2022] [Indexed: 06/16/2023]
Abstract
The amount of China's sulfur dioxide emission remains significantly large in recent years. To further reduce sulfur dioxide emission, the key is to find out the leading factors affecting sulfur dioxide emission and then take measures to control it accordingly. In order to investigate the influential factors of sulfur dioxide emission of various provinces, the data of sulfur dioxide emission of 30 provinces in China from 2001 to 2020 were collected. We established the symbolic regression model to explore the relationship between the GDP (x1), total population (x2), total energy consumption (x3), thermal power installed capacity (x4), and sulfur dioxide emission (dependent variable) for each province. The results show that the amount of China's total sulfur dioxide emission and sulfur dioxide emission in most provinces meet the environmental Kuznets curve (EKC). The influential degree of the factors affecting China's sulfur dioxide emission are GDP, total energy consumption, thermal power installed capacity, and total population. The provinces with the primary factor of GDP have the lowest average total energy consumption and average thermal power installed capacity, and their average sulfur dioxide emissions are also relatively low. The provinces with the primary factor of GDP do not show obvious geographical characteristics, but the provinces with the primary factor of total energy consumption are all distributed in southern China. Based on the research results, some control measures are also put forward.
Collapse
Affiliation(s)
- Chunjing Liu
- Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Department of Environmental Science and Engineering, North China Electric Power University, Baoding, 071003, China
| | - Weiran Lyu
- School of Computing, University of Utah, Salt Lake City, 84112, USA
| | - Wenchang Zhao
- Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Department of Environmental Science and Engineering, North China Electric Power University, Baoding, 071003, China
| | - Fei Zheng
- Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Department of Environmental Science and Engineering, North China Electric Power University, Baoding, 071003, China
| | - Jianyi Lu
- Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Department of Environmental Science and Engineering, North China Electric Power University, Baoding, 071003, China.
| |
Collapse
|
9
|
Wilstrup C, Cave C. Combining symbolic regression with the Cox proportional hazards model improves prediction of heart failure deaths. BMC Med Inform Decis Mak 2022; 22:196. [PMID: 35879758 DOI: 10.1186/s12911-022-01943-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 07/20/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Heart failure is a clinical syndrome characterised by a reduced ability of the heart to pump blood. Patients with heart failure have a high mortality rate, and physicians need reliable prognostic predictions to make informed decisions about the appropriate application of devices, transplantation, medications, and palliative care. In this study, we demonstrate that combining symbolic regression with the Cox proportional hazards model improves the ability to predict death due to heart failure compared to using the Cox proportional hazards model alone. METHODS We used a newly invented symbolic regression method called the QLattice to analyse a data set of medical records for 299 Pakistani patients diagnosed with heart failure. The QLattice identified non-linear mathematical transformations of the available covariates, which we then used in a Cox model to predict survival. RESULTS An exponential function of age, the inverse of ejection fraction, and the inverse of serum creatinine were identified as the best risk factors for predicting heart failure deaths. A Cox model fitted on these transformed covariates had improved predictive performance compared with a Cox model on the same covariates without mathematical transformations. CONCLUSION Symbolic regression is a way to find transformations of covariates from patients' medical records which can improve the performance of survival regression models. At the same time, these simple functions are intuitive and easy to apply in clinical settings. The direct interpretability of the simple forms may help researchers gain new insights into the actual causal pathways leading to deaths.
Collapse
|
10
|
Carreres-Prieto D, García JT, Cerdán-Cartagena F, Suardiaz-Muro J, Lardín C. Implementing Early Warning Systems in WWTP. An investigation with cost-effective LED-VIS spectroscopy-based genetic algorithms. Chemosphere 2022; 293:133610. [PMID: 35051514 DOI: 10.1016/j.chemosphere.2022.133610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Revised: 01/08/2022] [Accepted: 01/11/2022] [Indexed: 06/14/2023]
Abstract
Measuring how the pollution load evolves in real time along sewer networks is key for proper management of water resources and protecting the environment. The technique of molecular spectroscopy for water characterization has increasingly widespread use, as it is a non-invasive technique that leads to the correlation of the physical-chemical conditions of wastewater with spectroscopic surrogates by a series of mathematical estimation models. In the present research work, different symbolic regression models obtained with evolutive genetic algorithms are evaluated for the estimation of chemical oxygen demand (COD); five-day biochemical oxygen demand (BOD5); total suspended solids (TSS); total phosphorus (TP); and total nitrogen (TN), from the spectral response of samples measured between 380 and 700 nm and without the use of chemicals or pre-treatment. Around 650 wastewater samples were used in the campaign, from 43 different wastewater treatment plants (WWTP) in which both, raw/influent and treated/effluent, were examined through 18 models composed of Classical Genetic Algorithm (CGA), the Age-Layered Population Structure (ALPS), and Offspring Selection (OS) by mean of HeuristicLab software, to make a comparison among them and to determine which models and wavelengths are most suitable for the correlation. Models are proposed considering both raw and treated samples together (15) and only with tertiary treated wastewater reclaimed for agriculture irrigation effluent (3). The Pearson correlation coefficients were in the range of 67-91% for the test data in the case of the combined models. The results conform the first steps for a real-time monitoring of WWTP.
Collapse
Affiliation(s)
- Daniel Carreres-Prieto
- Department of Mining and Civil Engineering, Universidad Politécnica de Cartagena, 30202, Cartagena, Spain.
| | - Juan T García
- Department of Mining and Civil Engineering, Universidad Politécnica de Cartagena, 30202, Cartagena, Spain.
| | - Fernando Cerdán-Cartagena
- Department of Information and Communications Technologies, Universidad Politécnica de Cartagena, 30202, Cartagena, Spain.
| | - Juan Suardiaz-Muro
- Department of Electronic Technology, Universidad Politécnica de Cartagena, 30202, Cartagena, Spain.
| | - Carlos Lardín
- Entidad de Saneamiento y Depuración de Aguas Residuales de la Región de Murcia (ESAMUR), c/Madre Paula Gil Cano, s/n, E-30009, Murcia, Spain.
| |
Collapse
|
11
|
Kronberger G, de Franca FO, Burlacu B, Haider C, Kommenda M. Shape-Constrained Symbolic Regression-Improving Extrapolation with Prior Knowledge. Evol Comput 2022; 30:75-98. [PMID: 34623432 DOI: 10.1162/evco_a_00294] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 03/26/2021] [Indexed: 06/13/2023]
Abstract
We investigate the addition of constraints on the function image and its derivatives for the incorporation of prior knowledge in symbolic regression. The approach is called shape-constrained symbolic regression and allows us to enforce, for example, monotonicity of the function over selected inputs. The aim is to find models which conform to expected behavior and which have improved extrapolation capabilities. We demonstrate the feasibility of the idea and propose and compare two evolutionary algorithms for shape-constrained symbolic regression: (i) an extension of tree-based genetic programming which discards infeasible solutions in the selection step, and (ii) a two-population evolutionary algorithm that separates the feasible from the infeasible solutions. In both algorithms we use interval arithmetic to approximate bounds for models and their partial derivatives. The algorithms are tested on a set of 19 synthetic and four real-world regression problems. Both algorithms are able to identify models which conform to shape constraints which is not the case for the unmodified symbolic regression algorithms. However, the predictive accuracy of models with constraints is worse on the training set and the test set. Shape-constrained polynomial regression produces the best results for the test set but also significantly larger models.
Collapse
Affiliation(s)
- G Kronberger
- Josef Ressel Center for Symbolic Regression, University of Applied Sciences Upper Austria, Softwarepark 11, 4232 Hagenberg, Austria
| | - F O de Franca
- Center for Mathematics, Computation and Cognition (CMCC), Heuristics, Analysis and Learning Laboratory (HAL), Federal University of ABC, Santo Andre, Brazil
| | - B Burlacu
- Josef Ressel Center for Symbolic Regression, University of Applied Sciences Upper Austria, Softwarepark 11, 4232 Hagenberg, Austria
| | - C Haider
- Josef Ressel Center for Symbolic Regression, University of Applied Sciences Upper Austria, Softwarepark 11, 4232 Hagenberg, Austria
| | - M Kommenda
- Josef Ressel Center for Symbolic Regression, University of Applied Sciences Upper Austria, Softwarepark 11, 4232 Hagenberg, Austria
| |
Collapse
|
12
|
Urbanová P, Hejna P, Zátopková L, Šafr M. What is the appropriate approach in sex determination of hyoid bones? J Forensic Leg Med 2013; 20:996-1003. [PMID: 24237807 DOI: 10.1016/j.jflm.2013.08.010] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Revised: 05/24/2013] [Accepted: 08/20/2013] [Indexed: 10/26/2022]
Abstract
The hyoid bone is characterized by sexually dimorphic features, enabling it to occasionally be used in the sex determination aspect of establishing the biological profile in skeletal remains. Based on a sample of 298 fused and non-fused hyoid bones, the present paper compares several methodological approaches to sexing human hyoid bones in order to test the legitimacy of osteometrics-based linear discriminant equations and to explore the potentials of symbolic regression and methods of geometric morphometrics. In addition, two sets of published predictive models, one of which originated in an indigenous population, were validated on the studied sample. The results showed that the hyoid shape itself is a moderate sex predictor and a combination of linear measurements is a better representation of sex-related differences. The symbolic regression was shown to exceed the predictive powers of linear discriminant function analysis when two models based on a logistic and step regression reached 96% of correctly classified cases. There was a positive correlation between discriminant scores and an individual's age as the sex assessment was highly skewed in favour of males. This suggests that the human hyoid undergoes age-related modifications which facilitates determination of male bones and complicates determination of females in older individuals. The validation of discriminant equations by Komenda and Černý (1990) and Kindschud et al. (2010) revealed that there are marked inter-population and inter-sample differences which lessened the power to correctly determine female hyoid bones.
Collapse
Affiliation(s)
- Petra Urbanová
- Department of Anthropology, Faculty of Science, Masaryk University, Brno, Czech Republic.
| | | | | | | |
Collapse
|