1
|
Zhang F, Chen X, Liu P, Fan C. Weighted Expectile Regression Neural Networks for Right Censored Data. Stat Med 2024. [PMID: 39343041 DOI: 10.1002/sim.10221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 07/29/2024] [Accepted: 08/30/2024] [Indexed: 10/01/2024]
Abstract
As a favorable alternative to the censored quantile regression, censored expectile regression has been popular in survival analysis due to its flexibility in modeling the heterogeneous effect of covariates. The existing weighted expectile regression (WER) method assumes that the censoring variable and covariates are independent, and that the covariates effects has a global linear structure. However, these two assumptions are too restrictive to capture the complex and nonlinear pattern of the underlying covariates effects. In this article, we developed a novel weighted expectile regression neural networks (WERNN) method by incorporating the deep neural network structure into the censored expectile regression framework. To handle the random censoring, we employ the inverse probability of censoring weighting (IPCW) technique in the expectile loss function. The proposed WERNN method is flexible enough to fit nonlinear patterns and therefore achieves more accurate prediction performance than the existing WER method for right censored data. Our findings are supported by extensive Monte Carlo simulation studies and a real data application.
Collapse
Affiliation(s)
- Feipeng Zhang
- School of Economics and Finance, Xi'an Jiaotong University, Xi'an, China
| | - Xi Chen
- School of Economics and Finance, Xi'an Jiaotong University, Xi'an, China
| | - Peng Liu
- Department of Mathematical Sciences, Loughborough University, Loughborough, UK
| | - Caiyun Fan
- School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai, China
| |
Collapse
|
2
|
Lin J, Tong X, Li C, Lu Q. Expectile Neural Networks for Genetic Data Analysis of Complex Diseases. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:352-359. [PMID: 35085091 PMCID: PMC10201460 DOI: 10.1109/tcbb.2022.3146795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The genetic etiologies of common diseases are highly complex and heterogeneous. Classic methods, such as linear regression, have successfully identified numerous variants associated with complex diseases. Nonetheless, for most diseases, the identified variants only account for a small proportion of heritability. Challenges remain to discover additional variants contributing to complex diseases. Expectile regression is a generalization of linear regression and provides complete information on the conditional distribution of a phenotype of interest. While expectile regression has many nice properties, it has rarely been used in genetic research. In this paper, we develop an expectile neural network (ENN) method for genetic data analyses of complex diseases. Similar to expectile regression, ENN provides a comprehensive view of relationships between genetic variants and disease phenotypes, which can be used to discover variants predisposing to sub-populations. We further integrate the idea of neural networks into ENN, making it capable of capturing non-linear and non-additive genetic effects (e.g., gene-gene interactions). Through simulations, we showed that the proposed method outperformed an existing expectile regression when there exist complex genotype-phenotype relationships. We also applied the proposed method to the data from the Study of Addiction: Genetics and Environment (SAGE), investigating the relationships of candidate genes with smoking quantity.
Collapse
|
3
|
Barry A, Bhagwat N, Misic B, Poline JB, Greenwood CMT. Asymmetric influence measure for high dimensional regression. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2020.1841793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Amadou Barry
- Departments of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Québec, Canada
- Lady Davis Institute, Jewish General Hospital, Montreal, Québec, Canada
| | - Nikhil Bhagwat
- Faculty of Medicine, Department of Neurology and Neurosurgery, Montreal Neurological Institute and Hospital, McConnell Brain Imaging Centre, McGill University, Montreal, Québec, Canada
| | - Bratislav Misic
- Faculty of Medicine, Department of Neurology and Neurosurgery, Montreal Neurological Institute and Hospital, McConnell Brain Imaging Centre, McGill University, Montreal, Québec, Canada
| | - Jean-Baptiste Poline
- Faculty of Medicine, Department of Neurology and Neurosurgery, Montreal Neurological Institute and Hospital, McConnell Brain Imaging Centre, McGill University, Montreal, Québec, Canada
- Henry H. Wheeler Jr. Brain Imaging Center, Helen Wills Neuroscience Institute, University of California, Berkeley, California, USA
- Ludmer Centre for Neuroinformatics & Mental Health, McGill University, Montreal, Québec, Canada
| | - Celia M. T. Greenwood
- Departments of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Québec, Canada
- Lady Davis Institute, Jewish General Hospital, Montreal, Québec, Canada
- Ludmer Centre for Neuroinformatics & Mental Health, McGill University, Montreal, Québec, Canada
- Departments of Oncology and Human Genetics, McGill University, Montreal, Québec, Canada
| |
Collapse
|
4
|
Zhou X, Wang J, Wang H, Lin J. Panel semiparametric quantile regression neural network for electricity consumption forecasting. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2021.101489] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Xu Q, Ding X, Jiang C, Yu K, Shi L. An elastic-net penalized expectile regression with applications. J Appl Stat 2021; 48:2205-2230. [DOI: 10.1080/02664763.2020.1787355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Q.F. Xu
- School of Management, Hefei University of Technology, Hefei, People's Republic of China
- Key Laboratory of Process Optimization and Intelligent Decision-making, Ministry of Education, Hefei, People's Republic of China
| | - X.H. Ding
- School of Management, Hefei University of Technology, Hefei, People's Republic of China
| | - C.X. Jiang
- School of Management, Hefei University of Technology, Hefei, People's Republic of China
| | - K.M. Yu
- Department of Mathematics, Brunel University London, Uxbridge, UK
| | - L. Shi
- School of Computer Science and Technology, Huaibei Normal University, Huaibei, People's Republic of China
| |
Collapse
|
6
|
Barry A, Oualkacha K, Charpentier A. A new GEE method to account for heteroscedasticity using asymmetric least-square regressions. J Appl Stat 2021; 49:3564-3590. [PMID: 36246864 PMCID: PMC9559327 DOI: 10.1080/02664763.2021.1957789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 07/15/2021] [Indexed: 10/20/2022]
Abstract
Generalized estimating equations ( G E E ) are widely used to analyze longitudinal data; however, they are not appropriate for heteroscedastic data, because they only estimate regressor effects on the mean response - and therefore do not account for data heterogeneity. Here, we combine the G E E with the asymmetric least squares (expectile) regression to derive a new class of estimators, which we call generalized expectile estimating equations ( G E E E ) . The G E E E model estimates regressor effects on the expectiles of the response distribution, which provides a detailed view of regressor effects on the entire response distribution. In addition to capturing data heteroscedasticity, the GEEE extends the various working correlation structures to account for within-subject dependence. We derive the asymptotic properties of the G E E E estimators and propose a robust estimator of its covariance matrix for inference (see our R package, github.com/AmBarry/expectgee). Our simulations show that the GEEE estimator is non-biased and efficient, and our real data analysis shows it captures heteroscedasticity.
Collapse
Affiliation(s)
- Amadou Barry
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, QC, Canada
- Lady Davis Institute, Jewish General Hospital, Montréal, QC, Canada
| | - Karim Oualkacha
- Department of Mathematics and Statistics, Université du Québec à Montréal, Montréal, QC, Canada
| | - Arthur Charpentier
- Department of Mathematics and Statistics, Université du Québec à Montréal, Montréal, QC, Canada
| |
Collapse
|
7
|
Setshedi KJ, Mutingwende N, Ngqwala NP. The Use of Artificial Neural Networks to Predict the Physicochemical Characteristics of Water Quality in Three District Municipalities, Eastern Cape Province, South Africa. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18105248. [PMID: 34069195 PMCID: PMC8155895 DOI: 10.3390/ijerph18105248] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 04/22/2021] [Accepted: 04/23/2021] [Indexed: 12/07/2022]
Abstract
Reliable prediction of water quality changes is a prerequisite for early water pollution control and is vital in environmental monitoring, ecosystem sustainability, and human health. This study uses Artificial Neural Network (ANN) technique to develop the best model fits to predict water quality parameters by employing multilayer perceptron (MLP) neural network and the radial basis function (RBF) neural network, using data collected from three district municipalities. Two input combination models, MLP-4-5-4 and MLP-4-9-4, were trained, verified, and tested for their predictive performance ability, and their physicochemical prediction accuracy was compared by using each model's observed data with the predicted data. The MLP-4-5-4 model showed a better understanding of the data sets and water quality predictive ability giving an MSE of 39.06589 and a correlation coefficient (R2) of the observed and the predicted water quality of 0.989383 compared to the MLP-4-9-4 model (R2 = 0.993532, MSE = 39.03087). These results apply to natural water resources management in South Africa and similar catchment systems. The MLP-4-5-4 system can be scaled up for future water quality prediction of the Waste Water Treatment Plants (WWTPs), groundwater, and surface water while raising awareness among the public and industry on future water quality.
Collapse
|
8
|
Artificial Neural Network, Quantile and Semi-Log Regression Modelling of Mass Appraisal in Housing. MATHEMATICS 2021. [DOI: 10.3390/math9070783] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We used a large sample of 188,652 properties, which represented 4.88% of the total housing stock in Catalonia from 1994 to 2013, to make a comparison between different real estate valuation methods based on artificial neural networks (ANNs), quantile regressions (QRs) and semi-log regressions (SLRs). A literature gap in regard to the comparison between ANN and QR modelling of hedonic prices in housing was identified, with this article being the first paper to include this comparison. Therefore, this study aimed to answer (1) whether QR valuation modelling of hedonic prices in the housing market is an alternative to ANNs, (2) whether it is confirmed that ANNs produce better results than SLRs when assessing housing in Catalonia, and (3) which of the three mass appraisal models should be used by Spanish banks to assess real estate. The results suggested that the ANNs and SLRs obtained similar and better performances than the QRs and that the SLRs performed better when the datasets were smaller. Therefore, (1) QRs were not found to be an alternative to ANNs, (2) it could not be confirmed whether ANNs performed better than SLRs when assessing properties in Catalonia and (3) whereas small and medium banks should use SLRs, large banks should use either SLRs or ANNs in real estate mass appraisal.
Collapse
|
9
|
Galvan D, Effting L, Cremasco H, Adam Conte-Junior C. Can Socioeconomic, Health, and Safety Data Explain the Spread of COVID-19 Outbreak on Brazilian Federative Units? INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E8921. [PMID: 33266276 PMCID: PMC7730726 DOI: 10.3390/ijerph17238921] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 11/12/2020] [Accepted: 11/14/2020] [Indexed: 12/15/2022]
Abstract
Infinite factors can influence the spread of COVID-19. Evaluating factors related to the spread of the disease is essential to point out measures that take effect. In this study, the influence of 14 variables was assessed together by Artificial Neural Networks (ANN) of the type Self-Organizing Maps (SOM), to verify the relationship between numbers of cases and deaths from COVID-19 in Brazilian states for 110 days. The SOM analysis showed that the variables that presented a more significant relationship with the numbers of cases and deaths by COVID-19 were influenza vaccine applied, Intensive Care Unit (ICU), ventilators, physicians, nurses, and the Human Development Index (HDI). In general, Brazilian states with the highest rates of influenza vaccine applied, ICU beds, ventilators, physicians, and nurses, per 100,000 inhabitants, had the lowest number of cases and deaths from COVID-19, while the states with the lowest rates were most affected by the disease. According to the SOM analysis, other variables such as Personal Protective Equipment (PPE), tests, drugs, and Federal funds, did not have as significant effect as expected.
Collapse
Affiliation(s)
- Diego Galvan
- COVID-19 Research Group, Center for Food Analysis (NAL), Technological Development Support Laboratory (LADETEC), Cidade Universitária, Rio de Janeiro RJ 21941-598, Brazil;
- Laboratory of Advanced Analysis in Biochemistry and Molecular Biology (LAABBM), Department of Biochemistry, Federal University of Rio de Janeiro (UFRJ), Cidade Universitária, Rio de Janeiro RJ 21941-909, Brazil
- Nanotechnology Network, Carlos Chagas Filho Research Support Foundation of the State of Rio de Janeiro (FAPERJ), Rio de Janeiro RJ 20020-000, Brazil
| | - Luciane Effting
- Chemistry Department, State University of Londrina (UEL), Londrina PR 86057-970, Brazil; (L.E.); (H.C.)
| | - Hágata Cremasco
- Chemistry Department, State University of Londrina (UEL), Londrina PR 86057-970, Brazil; (L.E.); (H.C.)
| | - Carlos Adam Conte-Junior
- COVID-19 Research Group, Center for Food Analysis (NAL), Technological Development Support Laboratory (LADETEC), Cidade Universitária, Rio de Janeiro RJ 21941-598, Brazil;
- Laboratory of Advanced Analysis in Biochemistry and Molecular Biology (LAABBM), Department of Biochemistry, Federal University of Rio de Janeiro (UFRJ), Cidade Universitária, Rio de Janeiro RJ 21941-909, Brazil
- Nanotechnology Network, Carlos Chagas Filho Research Support Foundation of the State of Rio de Janeiro (FAPERJ), Rio de Janeiro RJ 20020-000, Brazil
| |
Collapse
|
10
|
Estimation of Coal's Sorption Parameters Using Artificial Neural Networks. MATERIALS 2020; 13:ma13235422. [PMID: 33260556 PMCID: PMC7730821 DOI: 10.3390/ma13235422] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 11/24/2020] [Accepted: 11/24/2020] [Indexed: 12/02/2022]
Abstract
This article presents research results into the application of an artificial neural network (ANN) to determine coal’s sorption parameters, such as the maximal sorption capacity and effective diffusion coefficient. Determining these parameters is currently time-consuming, and requires specialized and expensive equipment. The work was conducted with the use of feed-forward back-propagation networks (FNNs); it was aimed at estimating the values of the aforementioned parameters from information obtained through technical and densitometric analyses, as well as knowledge of the petrographic composition of the examined coal samples. Analyses showed significant compatibility between the values of the analyzed sorption parameters obtained with regressive neural models and the values of parameters determined with the gravimetric method using a sorption analyzer (prediction error for the best match was 6.1% and 0.2% for the effective diffusion coefficient and maximal sorption capacity, respectively). The established determination coefficients (0.982, 0.999) and the values of standard deviation ratios (below 0.1 in each case) confirmed very high prediction capacities of the adopted neural models. The research showed the great potential of the proposed method to describe the sorption properties of coal as a material that is a natural sorbent for methane and carbon dioxide.
Collapse
|
11
|
Impact of Uncertainty in the Input Variables and Model Parameters on Predictions of a Long Short Term Memory (LSTM) Based Sales Forecasting Model. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2020. [DOI: 10.3390/make2030014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A Long Short Term Memory (LSTM) based sales model has been developed to forecast the global sales of hotel business of Travel Boutique Online Holidays (TBO Holidays). The LSTM model is a multivariate model; input to the model includes several independent variables in addition to a dependent variable, viz., sales from the previous step. One of the input variables, “number of active bookers per day”, is estimated for the same day as sales. This need for estimation requires the development of another LSTM model to predict the number of active bookers per day. The number of active bookers is variable, so the predicted is used as an input to the sales forecasting model. The use of a predicted variable as an input variable to another model increases the chance of uncertainty entering the system. This paper discusses the quantum of variability observed in sales predictions for various uncertainties or noise due to the estimation of the number of active bookers. For the purposes of this study, different noise distributions such as normalized, uniform, and logistic distributions are used, among others. Analyses of predictions demonstrate that the addition of uncertainty to the number of active bookers via dropouts as well as to the lagged sales variables leads to model predictions that are close to the observations. The least squared error between observations and predictions is higher for uncertainties modeled using other distributions (without dropouts) with the worst predictions being for Gumbel noise distribution. Gaussian noise added directly to the weights matrix yields the best results (minimum prediction errors). One possibility of this uncertainty could be that the global minimum of the least squared objective function with respect to the model weight matrix is not reached, and therefore, model parameters are not optimal. The two LSTM models used in series are also used to study the impact of corona virus on global sales. By introducing a new variable called the corona virus impact variable, the LSTM models can predict corona-affected sales within five percent (5%) of the actuals. The research discussed in the paper finds LSTM models to be effective tools that can be used in the travel industry as they are able to successfully model the trends in sales. These tools can be reliably used to simulate various hypothetical scenarios also.
Collapse
|
12
|
Chen T, Su Z, Yang Y, Ding S. Efficient estimation in expectile regression using envelope models. Electron J Stat 2020. [DOI: 10.1214/19-ejs1664] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
A novel (U)MIDAS-SVR model with multi-source market sentiment for forecasting stock returns. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04063-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|