1
|
Kuizinienė D, Savickas P, Kunickaitė R, Juozaitienė R, Damaševičius R, Maskeliūnas R, Krilavičius T. A comparative study of feature selection and feature extraction methods for financial distress identification. PeerJ Comput Sci 2024; 10:e1956. [PMID: 38855232 PMCID: PMC11157601 DOI: 10.7717/peerj-cs.1956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 03/04/2024] [Indexed: 06/11/2024]
Abstract
Financial distress identification remains an essential topic in the scientific literature due to its importance for society and the economy. The advancements in information technology and the escalating volume of stored data have led to the emergence of financial distress that transcends the realm of financial statements and its' indicators (ratios). The feature space could be expanded by incorporating new perspectives on feature data categories such as macroeconomics, sectors, social, board, management, judicial incident, etc. However, the increased dimensionality results in sparse data and overfitted models. This study proposes a new approach for efficient financial distress classification assessment by combining dimensionality reduction and machine learning techniques. The proposed framework aims to identify a subset of features leading to the minimization of the loss function describing the financial distress in an enterprise. During the study, 15 dimensionality reduction techniques with different numbers of features and 17 machine-learning models were compared. Overall, 1,432 experiments were performed using Lithuanian enterprise data covering the period from 2015 to 2022. Results revealed that the artificial neural network (ANN) model with 30 ranked features identified using the Random Forest mean decreasing Gini (RF_MDG) feature selection technique provided the highest AUC score. Moreover, this study has introduced a novel approach for feature extraction, which could improve financial distress classification models.
Collapse
Affiliation(s)
- Dovilė Kuizinienė
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | - Paulius Savickas
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | - Rimantė Kunickaitė
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | - Rūta Juozaitienė
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | | | | | - Tomas Krilavičius
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| |
Collapse
|
2
|
Assessing Bank Default Determinants via Machine Learning. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.10.128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
3
|
Luo Q, Jia Z, Li H, Wu Y. Analysis of parametric and non-parametric option pricing models. Heliyon 2022; 8:e11388. [PMID: 36387555 PMCID: PMC9641221 DOI: 10.1016/j.heliyon.2022.e11388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 09/03/2022] [Accepted: 10/31/2022] [Indexed: 11/06/2022] Open
Abstract
In this paper, a closed-form analytical solution of option price under the Bi-Heston model is derived. Through empirical analysis, the advantages and disadvantages of the parametric pricing model are compared and analysed with those of the non-parametric model. The analysis shows that: (1) the parametric pricing model significantly outperforms the machine learning model in terms of in-sample pricing effects, while the Bi-Heston model slightly outperforms the Heston model. (2) In terms of out-of-sample pricing, the machine learning model is inferior to the parametric model for call options, while the Bi-Heston model is significantly better than the other two models for put options, and the other two models are similar. (3) In the robustness analysis of the three pricing models, the machine learning model shows strong instability, while the Bi-Heston model shows a more stable side.
Collapse
|
4
|
A score-based preprocessing technique for class imbalance problems. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01084-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
5
|
ADA-INCVAE: Improved data generation using variational autoencoder for imbalanced classification. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02566-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
6
|
Liao Z, Huang J, Cheng Y, Li C, Liu PX. A novel decomposition-based ensemble model for short-term load forecasting using hybrid artificial neural networks. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02864-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
7
|
Long term and short term forecasting of horticultural produce based on the LSTM network model. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02845-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
8
|
Cassales G, Gomes H, Bifet A, Pfahringer B, Senger H. Improving the performance of bagging ensembles for data streams through mini-batching. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.08.085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
9
|
|
10
|
Yang D, Zhang W, Wu X, Ablanedo-Rosas JH, Yang L, Yu W. A novel multi-stage ensemble model with fuzzy clustering and optimized classifier composition for corporate bankruptcy prediction. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-200741] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
With the rapid development of commercial credit mechanisms, credit funds have become fundamental in promoting the development of manufacturing corporations. However, large-scale, imbalanced credit application information poses a challenge to accurate bankruptcy predictions. A novel multi-stage ensemble model with fuzzy clustering and optimized classifier composition is proposed herein by combining the fuzzy clustering-based classifier selection method, the random subspace (RS)-based classifier composition method, and the genetic algorithm (GA)-based classifier compositional optimization method to achieve accuracy in predicting bankruptcy among corporates. To overcome the inherent inflexibility of traditional hard clustering methods, a new fuzzy clustering-based classifier selection method is proposed based on the mini-batch k-means algorithm to obtain the best performing base classifiers for generating classifier compositions. The RS-based classifier composition method was applied to enhance the robustness of candidate classifier compositions by randomly selecting several subspaces in the original feature space. The GA-based classifier compositional optimization method was applied to optimize the parameters of the promising classifier composition through the iterative mechanism of the GA. Finally, six datasets collected from the real world were tested with four evaluation indicators to assess the performance of the proposed model. The experimental results showed that the proposed model outperformed the benchmark models with higher predictive accuracy and efficiency.
Collapse
Affiliation(s)
- Dongqi Yang
- School of Information Management and Artificial Intelligence, Zhejiang University of Finance and Economics, Hangzhou, China
| | - Wenyu Zhang
- School of Information Management and Artificial Intelligence, Zhejiang University of Finance and Economics, Hangzhou, China
| | - Xin Wu
- China Academy of Financial Research, Zhejiang University of Finance and Economics, Hangzhou, China
| | - Jose H. Ablanedo-Rosas
- College of Business Administration, University of Texas at El Paso, El Paso, TX, United States
| | - Lingxiao Yang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Wangzhi Yu
- School of Information Management and Artificial Intelligence, Zhejiang University of Finance and Economics, Hangzhou, China
| |
Collapse
|
11
|
|
12
|
Predicting Financial Distress of Slovak Enterprises: Comparison of Selected Traditional and Learning Algorithms Methods. SUSTAINABILITY 2020. [DOI: 10.3390/su12103954] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Predicting the risk of financial distress of enterprises is an inseparable part of financial-economic analysis, helping investors and creditors reveal the performance stability of any enterprise. The acceptance of national conditions, proper use of financial predictors and statistical methods enable achieving relevant results and predicting the future development of enterprises as accurately as possible. The aim of the paper is to compare models developed by using three different methods (logistic regression, random forest and neural network models) in order to identify a model with the highest predictive accuracy of financial distress when it comes to industrial enterprises operating in the specific Slovak environment. The results indicate that all models demonstrated high discrimination accuracy and similar performance; neural network models yielded better results measured by all performance characteristics. The outputs of the comparison may contribute to the development of a reputable prediction model for industrial enterprises, which has not been developed yet in the country, which is one of the world’s largest car producers.
Collapse
|
13
|
Le T, Vo MT, Kieu T, Hwang E, Rho S, Baik SW. Multiple Electric Energy Consumption Forecasting Using a Cluster-Based Strategy for Transfer Learning in Smart Building. SENSORS 2020; 20:s20092668. [PMID: 32392858 PMCID: PMC7362249 DOI: 10.3390/s20092668] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 04/15/2020] [Accepted: 05/03/2020] [Indexed: 11/18/2022]
Abstract
Electric energy consumption forecasting is an interesting, challenging, and important issue in energy management and equipment efficiency improvement. Existing approaches are predictive models that have the ability to predict for a specific profile, i.e., a time series of a whole building or an individual household in a smart building. In practice, there are many profiles in each smart building, which leads to time-consuming and expensive system resources. Therefore, this study develops a robust framework for the Multiple Electric Energy Consumption forecasting (MEC) of a smart building using Transfer Learning and Long Short-Term Memory (TLL), the so-called MEC-TLL framework. In this framework, we first employ a k-means clustering algorithm to cluster the daily load demand of many profiles in the training set. In this phase, we also perform Silhouette analysis to specify the optimal number of clusters for the experimental datasets. Next, this study develops the MEC training algorithm, which utilizes a cluster-based strategy for transfer learning the Long Short-Term Memory models to reduce the computational time. Finally, extensive experiments are conducted to compare the computational time and different performance metrics for multiple electric energy consumption forecasting on two smart buildings in South Korea. The experimental results indicate that our proposed approach is capable of economical overheads while achieving superior performances. Therefore, the proposed approach can be applied effectively for intelligent energy management in smart buildings.
Collapse
Affiliation(s)
- Tuong Le
- Informetrics Research Group, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam;
- Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam
| | - Minh Thanh Vo
- Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam;
| | - Tung Kieu
- University of Science, Vietnam National University, Ho Chi Minh City 700000, Vietnam;
| | - Eenjun Hwang
- School of Electrical Engineering, Korea University, Seoul 02841, Korea;
| | - Seungmin Rho
- Department of Software, Sejong University, Seoul 05006, Korea;
| | - Sung Wook Baik
- Department of Software, Sejong University, Seoul 05006, Korea;
- Correspondence:
| |
Collapse
|
14
|
Chen Z, Duan J, Yang C, Kang L, Qiu G. SMLBoost-adopting a soft-margin like strategy in boosting. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105705] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
|
16
|
|
17
|
Improving Electric Energy Consumption Prediction Using CNN and Bi-LSTM. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9204237] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The electric energy consumption prediction (EECP) is an essential and complex task in intelligent power management system. EECP plays a significant role in drawing up a national energy development policy. Therefore, this study proposes an Electric Energy Consumption Prediction model utilizing the combination of Convolutional Neural Network (CNN) and Bi-directional Long Short-Term Memory (Bi-LSTM) that is named EECP-CBL model to predict electric energy consumption. In this framework, two CNNs in the first module extract the important information from several variables in the individual household electric power consumption (IHEPC) dataset. Then, Bi-LSTM module with two Bi-LSTM layers uses the above information as well as the trends of time series in two directions including the forward and backward states to make predictions. The obtained values in the Bi-LSTM module will be passed to the last module that consists of two fully connected layers for finally predicting the electric energy consumption in the future. The experiments were conducted to compare the prediction performances of the proposed model and the state-of-the-art models for the IHEPC dataset with several variants. The experimental results indicate that EECP-CBL framework outperforms the state-of-the-art approaches in terms of several performance metrics for electric energy consumption prediction on several variations of IHEPC dataset in real-time, short-term, medium-term and long-term timespans.
Collapse
|
18
|
Improving financial bankruptcy prediction in a highly imbalanced class distribution using oversampling and ensemble learning: a case from the Spanish market. PROGRESS IN ARTIFICIAL INTELLIGENCE 2019. [DOI: 10.1007/s13748-019-00197-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|