1
|
Wang Y, Wu Z, Gao J, Liu C, Guo F. A multi-level classification based ensemble and feature extractor for credit risk assessment. PeerJ Comput Sci 2024; 10:e1915. [PMID: 38435611 PMCID: PMC10909241 DOI: 10.7717/peerj-cs.1915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 02/07/2024] [Indexed: 03/05/2024]
Abstract
With the growth of people's demand for loans, banks and other financial institutions put forward higher requirements for customer credit risk level classification, the purpose is to make better loan decisions and loan amount allocation and reduce the pre-loan risk. This article proposes a Multi-Level Classification based Ensemble and Feature Extractor (MLCEFE) that incorporates the strengths of sampling, feature extraction, and ensemble classification. MLCEFE uses SMOTE + Tomek links to solve the problem of data imbalance and then uses a deep neural network (DNN), auto-encoder (AE), and principal component analysis (PCA) to transform the original variables into higher-level abstract features for feature extraction. Finally, it combined multiple ensemble learners to improve the effect of personal credit risk multi-classification. During performance evaluation, MLCEFE has shown remarkable results in the multi-classification of personal credit risk compared with other classification methods.
Collapse
Affiliation(s)
- Yuanyuan Wang
- School of Management and Engineering, Capital University of Economics and Business, BeiJing, Fengtai District, Beijing, China
| | - Zhuang Wu
- School of Management and Engineering, Capital University of Economics and Business, BeiJing, Fengtai District, Beijing, China
| | - Jing Gao
- School of Management and Engineering, Capital University of Economics and Business, BeiJing, Fengtai District, Beijing, China
| | - Chenjun Liu
- School of Management and Engineering, Capital University of Economics and Business, BeiJing, Fengtai District, Beijing, China
| | - Fangfang Guo
- School of Management and Engineering, Capital University of Economics and Business, BeiJing, Fengtai District, Beijing, China
| |
Collapse
|
2
|
Przybyła-Kasperek M, Kusztal K. New Classification Method for Independent Data Sources Using Pawlak Conflict Model and Decision Trees. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1604. [PMID: 36359694 PMCID: PMC9689716 DOI: 10.3390/e24111604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 10/31/2022] [Accepted: 11/01/2022] [Indexed: 06/16/2023]
Abstract
The research concerns data collected in independent sets-more specifically, in local decision tables. A possible approach to managing these data is to build local classifiers based on each table individually. In the literature, many approaches toward combining the final prediction results of independent classifiers can be found, but insufficient efforts have been made on the study of tables' cooperation and coalitions' formation. The importance of such an approach was expected on two levels. First, the impact on the quality of classification-the ability to build combined classifiers for coalitions of tables should allow for the learning of more generalized concepts. In turn, this should have an impact on the quality of classification of new objects. Second, combining tables into coalitions will result in reduced computational complexity-a reduced number of classifiers will be built. The paper proposes a new method for creating coalitions of local tables and generating an aggregated classifier for each coalition. Coalitions are generated by determining certain characteristics of attribute values occurring in local tables and applying the Pawlak conflict analysis model. In the study, the classification and regression trees with Gini index are built based on the aggregated table for one coalition. The system bears a hierarchical structure, as in the next stage the decisions generated by the classifiers for coalitions are aggregated using majority voting. The classification quality of the proposed system was compared with an approach that does not use local data cooperation and coalition creation. The structure of the system is parallel and decision trees are built independently for local tables. In the paper, it was shown that the proposed approach provides a significant improvement in classification quality and execution time. The Wilcoxon test confirmed that differences in accuracy rate of the results obtained for the proposed method and results obtained without coalitions are significant, with a p level = 0.005. The average accuracy rate values obtained for the proposed approach and the approach without coalitions are, respectively: 0.847 and 0.812; so the difference is quite large. Moreover, the algorithm implementing the proposed approach performed up to 21-times faster than the algorithm implementing the approach without using coalitions.
Collapse
|
3
|
Ma H, Li G, Liu R, Shen M, Liu X. The personal credit default discrimination model based on DF21. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-212780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Background: The personal credit default discriminant measures the size of the credit default risk, which provides an essential decision-making basis for banks. Methods: This article constructs a three-stage default discriminant model based on the DF21. In the first stage, this article selects the feature combination. This article obtains the default prediction results by traversing the decision tree from 20 to 500 and the learning rate from 0.08 to 0.12 in XGBoost. Taking the lowest Type II error, the highest AUC and accuracy as the first, the second, and the third principles (TAA principle), respectively, this article infers the optimal parameter of decision tree and learning rate reversely and gets the feature importance. This article uses the forward selection method to determine the optimal feature combination according to the TAA principle. In the second stage, this article screens the base classifier for DF21. Considering the applicability of the classifier on different data sets, this article selects the classifier with the good classification performance as the base classifier on each data set. In the third stage, this article constructs the default discriminant model based on DF21. According to the idea that the combination of strong classifiers generates a stronger result, the four strong classifiers are used as the base classifier to improve the cascade structure of DF21. Results: Compared with the first stage, the Type II error (the proportion of the banks’ principal loss) dropped by 4.41%, 5.98%, and 13.00% in the Japanese, Australian, and German, respectively, which proves the effectiveness of DF21. Conclusion: DF21 is significantly better than other classifiers and other scholars’ models according to the TAA principle.
Collapse
Affiliation(s)
- Hongdong Ma
- School of Business Administration, Northeastern University, Shenyang, China
| | - Gang Li
- School of Business Administration, Northeastern University, Shenyang, China
| | - Rongyue Liu
- School of Business Administration, Northeastern University, Shenyang, China
| | - Mengdi Shen
- School of Business Administration, Northeastern University, Shenyang, China
| | - Xiaohui Liu
- School of Business Administration, Northeastern University, Shenyang, China
| |
Collapse
|
4
|
Wu J, Zhao X, Yuan H, Si YW. CDGAT: a graph attention network method for credit card defaulters prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03996-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
5
|
Hammad M, Alkinani MH, Gupta BB, Abd El-Latif AA. Myocardial infarction detection based on deep neural network on imbalanced data. MULTIMEDIA SYSTEMS 2022; 28:1373-1385. [DOI: 10.1007/s00530-020-00728-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Accepted: 12/01/2020] [Indexed: 09/02/2023]
|
6
|
Li Z, Liu L, Zhu L, Deng F, Zhang Y, Zhang Y. Parallel double-layer prediction model construction and empirical analysis for enterprise credit assessment. INTELL DATA ANAL 2022. [DOI: 10.3233/ida-215943] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Credit is a part of external image of enterprises, and it directly affects interests of enterprises. Nowadays, most of researches on predictions of enterprises credit use a single algorithm model or optimize a single model to predict an enterprises credit score. The accuracy of each model is different, and the generalization ability is generally weak. In order to improve generalization ability of models and accuracy of prediction results, a parallel double-layer prediction model is proposed in this paper. The model is based on Stacking and Bagging methods, which can improve generalization ability with high accuracy. Through experiments, we compare three single algorithm models, four integrated learning models with other combination strategies and parallel double-layer prediction model. Average value of four evaluation indexes are increased by 4.2349%, 63.1464%, 34.11837%, 1.26104%, 15.7862%, 10.1457% and 25.6310% respectively. The results show that the parallel double-layer prediction model is accurate and feasible.
Collapse
Affiliation(s)
- Zhanli Li
- School of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an, Shaanxi, China
| | - Linchao Liu
- School of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an, Shaanxi, China
| | - Li Zhu
- School of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an, Shaanxi, China
| | - Fan Deng
- School of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an, Shaanxi, China
| | - Yun Zhang
- School of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an, Shaanxi, China
| | - Yu Zhang
- University of Texas at Dallas, USA
| |
Collapse
|
7
|
Miniak-Górecka A, Podlaski K, Gwizdałła T. Self-optimizing neural network in the classification of real valued data. PeerJ Comput Sci 2022; 8:e1020. [PMID: 35875630 PMCID: PMC9299286 DOI: 10.7717/peerj-cs.1020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 06/06/2022] [Indexed: 06/15/2023]
Abstract
The classification of multi-dimensional patterns is one of the most popular and often most challenging problems of machine learning. That is why some new approaches are being tried, expected to improve existing ones. The article proposes a new technique based on the decision network called self-optimizing neural networks (SONN). The proposed approach works on discretized data. Using a special procedure, we assign a feature vector to each element of the real-valued dataset. Later the feature vectors are analyzed, and decision patterns are created using so-called discriminants. We focus on how these discriminants are used and influence the final classifier prediction. Moreover, we also discuss the influence of the neighborhood topology. In the article, we use three different datasets with different properties. All results obtained by derived methods are compared with those obtained with the well-known support vector machine (SVM) approach. The results prove that the proposed solutions give better results than SVM. We can see that the information obtained from a training set is better generalized, and the final accuracy of the classifier is higher.
Collapse
Affiliation(s)
- Alicja Miniak-Górecka
- Department of Intelligent Systems, Faculty of Physics and Applied Informatics, University of Lodz, Lodz, Poland
| | - Krzysztof Podlaski
- Department of Intelligent Systems, Faculty of Physics and Applied Informatics, University of Lodz, Lodz, Poland
| | - Tomasz Gwizdałła
- Department of Intelligent Systems, Faculty of Physics and Applied Informatics, University of Lodz, Lodz, Poland
| |
Collapse
|
8
|
Agarwal D, Covarrubias-Zambrano O, Bossmann SH, Natarajan B. Early Detection of Pancreatic Cancers Using Liquid Biopsies and Hierarchical Decision Structure. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2022; 10:4300208. [PMID: 35937463 PMCID: PMC9342860 DOI: 10.1109/jtehm.2022.3186836] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/30/2022] [Accepted: 06/23/2022] [Indexed: 06/15/2023]
Abstract
OBJECTIVE Pancreatic cancer (PC) is a silent killer, because its detection is difficult and to date no effective treatment has been developed. In the US, the current 5-year survival rate of 11%. Therefore, PC has to be detected as early as possible. METHODS AND PROCEDURES In this work, we have combined the use of ultrasensitive nanobiosensors for protease/arginase detection with information fusion based hierarchical decision structure to detect PC at the localized stage by means of a simple Liquid Biopsy. The problem of early-stage detection of pancreatic cancer is modelled as a multi-class classification problem. We propose a Hard Hierarchical Decision Structure (HDS) along with appropriate feature engineering steps to improve the performance of conventional multi-class classification approaches. Further, a Soft Hierarchical Decision Structure (SDS) is developed to additionally provide confidences of predicted labels in the form of class probability values. These frameworks overcome the limitations of existing research studies that employ simple biostatistical tools and do not effectively exploit the information provided by ultrasensitive protease/arginase analyses. RESULTS The experimental results demonstrate that an overall mean classification accuracy of around 92% is obtained using the proposed approach, as opposed to 75% with conventional multi-class classification approaches. This illustrates that the proposed HDS framework outperforms traditional classification techniques for early-stage PC detection. CONCLUSION Although this study is only based on 31 pancreatic cancer patients and a healthy control group of 48 human subjects, it has enabled combining Liquid Biopsies and Machine Learning methodologies to reach the goal of earliest PC detection. The provision of both decision labels (via HDS) as well as class probabilities (via SDS) helps clinicians identify instances where statistical model-based predictions lack confidence. This further aids in determining if more tests are required for better diagnosis. Such a strategy makes the output of our decision model more interpretable and can assist with the diagnostic procedure. CLINICAL IMPACT With further validation, the proposed framework can be employed as a decision support tool for the clinicians to help in detection of pancreatic cancer at early stages.
Collapse
Affiliation(s)
- Deepesh Agarwal
- Department of Electrical and Computer EngineeringKansas State UniversityManhattanKS66506USA
| | | | - Stefan H. Bossmann
- Department of Cancer BiologyThe University of Kansas Medical CenterKansas CityKS66160USA
| | | |
Collapse
|
9
|
Kamara AF, Chen E, Pan Z. An ensemble of a boosted hybrid of deep learning models and technical analysis for forecasting stock prices. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.02.015] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
10
|
Abstract
Credit scoring is an effective tool for banks and lending companies to manage the potential credit risk of borrowers. Machine learning algorithms have made grand progress in automatic and accurate discrimination of good and bad borrowers. Notably, ensemble approaches are a group of powerful tools to enhance the performance of credit scoring. Random forest (RF) and Gradient Boosting Decision Tree (GBDT) have become the mainstream ensemble methods for precise credit scoring. RF is a Bagging-based ensemble that realizes accurate credit scoring enriches the diversity base learners by modifying the training object. However, the optimization pattern that works on invariant training targets may increase the statistical independence of base learners. GBDT is a boosting-based ensemble approach that reduces the credit scoring error by iteratively changing the training target while keeping the training features unchanged. This may harm the diversity of base learners. In this study, we incorporate the advantages of the Bagging ensemble training strategy and boosting ensemble optimization pattern to enhance the diversity of base learners. An extreme learning machine-based supervised augmented GBDT is proposed to enhance the discriminative ability for credit scoring. Experimental results on 4 public credit datasets show a significant improvement in credit scoring and suggest that the proposed method is a good solution to realize accurate credit scoring.
Collapse
|
11
|
Hadji Misheva B, Jaggi D, Posth JA, Gramespacher T, Osterrieder J. Audience-Dependent Explanations for AI-Based Risk Management Tools: A Survey. Front Artif Intell 2022; 4:794996. [PMID: 35028559 PMCID: PMC8751385 DOI: 10.3389/frai.2021.794996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Artificial Intelligence (AI) is one of the most sought-after innovations in the financial industry. However, with its growing popularity, there also is the call for AI-based models to be understandable and transparent. However, understandably explaining the inner mechanism of the algorithms and their interpretation is entirely audience-dependent. The established literature fails to match the increasing number of explainable AI (XAI) methods with the different stakeholders' explainability needs. This study addresses this gap by exploring how various stakeholders within the Swiss financial industry view explainability in their respective contexts. Based on a series of interviews with practitioners within the financial industry, we provide an in-depth review and discussion of their view on the potential and limitation of current XAI techniques needed to address the different requirements for explanations.
Collapse
Affiliation(s)
- Branka Hadji Misheva
- ZHAW, School of Engineering, Institute of Data Analysis and Process Design, Winterthur, Switzerland
| | - David Jaggi
- ZHAW, School of Management and Law, Department Banking and Finance, Winterthur, Switzerland
| | - Jan-Alexander Posth
- ZHAW, School of Management and Law, Department Banking and Finance, Winterthur, Switzerland
| | - Thomas Gramespacher
- ZHAW, School of Management and Law, Department Banking and Finance, Winterthur, Switzerland
| | - Joerg Osterrieder
- ZHAW, School of Engineering, Institute of Data Analysis and Process Design, Winterthur, Switzerland
| |
Collapse
|
12
|
Chen J, Zhang D, Suzauddola M, Zeb A. Identifying crop diseases using attention embedded MobileNet-V2 model. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107901] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
13
|
Beheshti Roui M, Zomorodi M, Sarvelayati M, Abdar M, Noori H, Pławiak P, Tadeusiewicz R, Zhou X, Khosravi A, Nahavandi S, Acharya UR. A novel approach based on genetic algorithm to speed up the discovery of classification rules on GPUs. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
14
|
Elgendy IA, Muthanna A, Hammoudeh M, Shaiba H, Unal D, Khayyat M. Advanced Deep Learning for Resource Allocation and Security Aware Data Offloading in Industrial Mobile Edge Computing. BIG DATA 2021; 9:265-278. [PMID: 33656352 DOI: 10.1089/big.2020.0284] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The Internet of Things (IoT) is permeating our daily lives through continuous environmental monitoring and data collection. The promise of low latency communication, enhanced security, and efficient bandwidth utilization lead to the shift from mobile cloud computing to mobile edge computing. In this study, we propose an advanced deep reinforcement resource allocation and security-aware data offloading model that considers the constrained computation and radio resources of industrial IoT devices to guarantee efficient sharing of resources between multiple users. This model is formulated as an optimization problem with the goal of decreasing energy consumption and computation delay. This type of problem is non-deterministic polynomial time-hard due to the curse-of-dimensionality challenge, thus, a deep learning optimization approach is presented to find an optimal solution. In addition, a 128-bit Advanced Encryption Standard-based cryptographic approach is proposed to satisfy the data security requirements. Experimental evaluation results show that the proposed model can reduce offloading overhead in terms of energy and time by up to 64.7% in comparison with the local execution approach. It also outperforms the full offloading scenario by up to 13.2%, where it can select some computation tasks to be offloaded while optimally rejecting others. Finally, it is adaptable and scalable for a large number of mobile devices.
Collapse
Affiliation(s)
- Ibrahim A Elgendy
- Department of Computer Science and Technology, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
- Department of Computer Science, Faculty of Computers and Information, Menoufia University, Menoufia, Egypt
| | - Ammar Muthanna
- Department of Communication Networks and Data Transmission, St. Petersburg State University of Telecommunication, St. Petersburg, Russia
- Applied Mathematics and Communications Technology Institute, Peoples' Friendship University of Russia (RUDN University), Moscow, Russia
| | - Mohammad Hammoudeh
- Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom
| | - Hadil Shaiba
- Department of Computer Science, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Devrim Unal
- Department of Electrical Engineering, KINDI Center for Computing Research, College of Engineering, Qatar University, Doha, Qatar
| | - Mashael Khayyat
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
15
|
Ye Z, Yu J. Health condition monitoring of machines based on long short-term memory convolutional autoencoder. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107379] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Tian Y, Bian B, Tang X, Zhou J. A new non-kernel quadratic surface approach for imbalanced data classification in online credit scoring. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.02.026] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
17
|
Transmission Quality Classification with Use of Fusion of Neural Network and Genetic Algorithm in Pay&Require Multi-Agent Managed Network. SENSORS 2021; 21:s21124090. [PMID: 34198587 PMCID: PMC8231990 DOI: 10.3390/s21124090] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 05/14/2021] [Accepted: 06/11/2021] [Indexed: 11/21/2022]
Abstract
Modern computer systems practically cannot function without a computer network. New concepts of data transmission are emerging, e.g., programmable networks. However, the development of computer networks entails the need for development in one more aspect, i.e., the quality of the data transmission through the network. The data transmission quality can be described using parameters, i.e., delay, bandwidth, packet loss ratio and jitter. On the basis of the obtained values, specialists are able to state how measured parameters impact on the overall quality of the provided service. Unfortunately, for a non-expert user, understanding of these parameters can be too complex. Hence, the problem of translation of the parameters describing the transmission quality appears understandable to the user. This article presents the concept of using Machine Learning (ML) to solve the above-mentioned problem, i.e., a dynamic classification of the measured parameters describing the transmission quality in a certain scale. Thanks to this approach, describing the quality will become less complex and more understandable for the user. To date, some studies have been conducted. Therefore, it was decided to use different approaches, i.e., fusion of a neural network (NN) and a genetic algorithm (GA). GA’s were choosen for the selection of weights replacing the classic gradient descent algorithm. For learning purposes, 100 samples were obtained, each of which was described by four features and the label, which describes the quality. In the reasearch carried out so far, single classifiers and ensemble learning have been used. The current result compared to the previous ones is better. A relatively high quality of the classification was obtained when we have used 10-fold stratified cross-validation, i.e., SEN = 95% (overall accuracy). The incorrect classification was 5/100, which is a better result compared to previous studies.
Collapse
|
18
|
Hall A, Victor B, He Z, Langer M, Elipot M, Nibali A, Morgan S. The detection, tracking, and temporal action localisation of swimmers for automated analysis. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05485-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
19
|
Li G, Ma HD, Liu RY, Shen MD, Zhang KX. A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest. ENTROPY (BASEL, SWITZERLAND) 2021; 23:582. [PMID: 34066807 PMCID: PMC8150340 DOI: 10.3390/e23050582] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 04/27/2021] [Accepted: 04/27/2021] [Indexed: 11/29/2022]
Abstract
Background: the credit scoring model is an effective tool for banks and other financial institutions to distinguish potential default borrowers. The credit scoring model represented by machine learning methods such as deep learning performs well in terms of the accuracy of default discrimination, but the model itself also has many shortcomings such as many hyperparameters and large dependence on big data. There is still a lot of room to improve its interpretability and robustness. Methods: the deep forest or multi-Grained Cascade Forest (gcForest) is a decision tree depth model based on the random forest algorithm. Using multidimensional scanning and cascading processing, gcForest can effectively identify and process high-dimensional feature information. At the same time, gcForest has fewer hyperparameters and has strong robustness. So, this paper constructs a two-stage hybrid default discrimination model based on multiple feature selection methods and gcForest algorithm, and at the same time, it optimizes the parameters for the lowest type II error as the first principle, and the highest AUC and accuracy as the second and third principles. GcForest can not only reflect the advantages of traditional statistical models in terms of interpretability and robustness but also take into account the advantages of deep learning models in terms of accuracy. Results: the validity of the hybrid default discrimination model is verified by three real open credit data sets of Australian, Japanese, and German in the UCI database. Conclusions: the performance of the gcForest is better than the current popular single classifiers such as ANN, and the common ensemble classifiers such as LightGBM, and CNNs in type II error, AUC, and accuracy. Besides, in comparison with other similar research results, the robustness and effectiveness of this model are further verified.
Collapse
Affiliation(s)
- Gang Li
- School of Business Administration, Northeastern University, Shenyang 110819, China; (H.-D.M.); (R.-Y.L.); (M.-D.S.); (K.-X.Z.)
- School of Economics, Northeastern University at Qinhuangdao, Qinhuangdao 066004, China
- Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
| | - Hong-Dong Ma
- School of Business Administration, Northeastern University, Shenyang 110819, China; (H.-D.M.); (R.-Y.L.); (M.-D.S.); (K.-X.Z.)
| | - Rong-Yue Liu
- School of Business Administration, Northeastern University, Shenyang 110819, China; (H.-D.M.); (R.-Y.L.); (M.-D.S.); (K.-X.Z.)
| | - Meng-Di Shen
- School of Business Administration, Northeastern University, Shenyang 110819, China; (H.-D.M.); (R.-Y.L.); (M.-D.S.); (K.-X.Z.)
| | - Ke-Xin Zhang
- School of Business Administration, Northeastern University, Shenyang 110819, China; (H.-D.M.); (R.-Y.L.); (M.-D.S.); (K.-X.Z.)
| |
Collapse
|
20
|
Pałka F, Książek W, Pławiak P, Romaszewski M, Książek K. Hyperspectral Classification of Blood-Like Substances Using Machine Learning Methods Combined with Genetic Algorithms in Transductive and Inductive Scenarios. SENSORS 2021; 21:s21072293. [PMID: 33805937 PMCID: PMC8037346 DOI: 10.3390/s21072293] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 03/22/2021] [Accepted: 03/23/2021] [Indexed: 01/10/2023]
Abstract
This study is focused on applying genetic algorithms (GAs) to model and band selection in hyperspectral image classification. We use a forensic-inspired data set of seven hyperspectral images with blood and five visually similar substances to test GA-optimised classifiers in two scenarios: when the training and test data come from the same image and when they come from different images, which is a more challenging task due to significant spectral differences. In our experiments, we compare GA with a classic model optimisation through a grid search. Our results show that GA-based model optimisation can reduce the number of bands and create an accurate classifier that outperforms the GS-based reference models, provided that, during model optimisation, it has access to examples similar to test data. We illustrate this with experiments highlighting the importance of a validation set.
Collapse
Affiliation(s)
- Filip Pałka
- Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, 31-155 Krakow, Poland; (F.P.); (W.K.)
| | - Wojciech Książek
- Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, 31-155 Krakow, Poland; (F.P.); (W.K.)
| | - Paweł Pławiak
- Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, 31-155 Krakow, Poland; (F.P.); (W.K.)
- Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, 44-100 Gliwice, Poland; (M.R.); or
- Correspondence: or
| | - Michał Romaszewski
- Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, 44-100 Gliwice, Poland; (M.R.); or
| | - Kamil Książek
- Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, 44-100 Gliwice, Poland; (M.R.); or
- Department of Data Sciences and Engineering, Silesian University of Technology, 44-100 Gliwice, Poland
| |
Collapse
|
21
|
Wu M, Lu Y, Yang W, Wong SY. A Study on Arrhythmia via ECG Signal Classification Using the Convolutional Neural Network. Front Comput Neurosci 2021; 14:564015. [PMID: 33469423 PMCID: PMC7813686 DOI: 10.3389/fncom.2020.564015] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 11/02/2020] [Indexed: 11/13/2022] Open
Abstract
Cardiovascular diseases (CVDs) are the leading cause of death today. The current identification method of the diseases is analyzing the Electrocardiogram (ECG), which is a medical monitoring technology recording cardiac activity. Unfortunately, looking for experts to analyze a large amount of ECG data consumes too many medical resources. Therefore, the method of identifying ECG characteristics based on machine learning has gradually become prevalent. However, there are some drawbacks to these typical methods, requiring manual feature recognition, complex models, and long training time. This paper proposes a robust and efficient 12-layer deep one-dimensional convolutional neural network on classifying the five micro-classes of heartbeat types in the MIT- BIH Arrhythmia database. The five types of heartbeat features are classified, and wavelet self-adaptive threshold denoising method is used in the experiments. Compared with BP neural network, random forest, and other CNN networks, the results show that the model proposed in this paper has better performance in accuracy, sensitivity, robustness, and anti-noise capability. Its accurate classification effectively saves medical resources, which has a positive effect on clinical practice.
Collapse
Affiliation(s)
- Mengze Wu
- Department of Information Engineering, Wuhan University of Technology, Wuhan, China
| | - Yongdi Lu
- Department of Electrical and Electronics Engineering, Xiamen University Malaysia, Sepang, Malaysia
| | - Wenli Yang
- Department of Electrical and Automation Engineering, Nanjing Normal University, Nanjing, China
| | - Shen Yuong Wong
- Department of Electrical and Electronics Engineering, Xiamen University Malaysia, Sepang, Malaysia
| |
Collapse
|
22
|
Abstract
Accurate segmentation of retinal blood vessels is a key step in the diagnosis of fundus diseases, among which cataracts, glaucoma, and diabetic retinopathy (DR) are the main diseases that cause blindness. Most segmentation methods based on deep convolutional neural networks can effectively extract features. However, convolution and pooling operations also filter out some useful information, and the final segmented retinal vessels have problems such as low classification accuracy. In this paper, we propose a multi-scale residual attention network called MRA-UNet. Multi-scale inputs enable the network to learn features at different scales, which increases the robustness of the network. In the encoding phase, we reduce the negative influence of the background and eliminate noise by using the residual attention module. We use the bottom reconstruction module to aggregate the feature information under different receptive fields, so that the model can extract the information of different thicknesses of blood vessels. Finally, the spatial activation module is used to process the up-sampled image to further increase the difference between blood vessels and background, which promotes the recovery of small blood vessels at the edges. Our method was verified on the DRIVE, CHASE, and STARE datasets. Respectively, the segmentation accuracy rates reached 96.98%, 97.58%, and 97.63%; the specificity reached 98.28%, 98.54%, and 98.73%; and the F-measure scores reached 82.93%, 81.27%, and 84.22%. We compared the experimental results with some state-of-art methods, such as U-Net, R2U-Net, and AG-UNet in terms of accuracy, sensitivity, specificity, F-measure, and AUCROC. Particularly, MRA-UNet outperformed U-Net by 1.51%, 3.44%, and 0.49% on DRIVE, CHASE, and STARE datasets, respectively.
Collapse
|
23
|
Iosifidis A. Class mean vector component and discriminant analysis. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2020.10.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
24
|
Janković R, Mihajlović I, Štrbac N, Amelio A. Machine learning models for ecological footprint prediction based on energy parameters. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05476-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
25
|
Książek W, Hammad M, Pławiak P, Acharya UR, Tadeusiewicz R. Development of novel ensemble model using stacking learning and evolutionary computation techniques for automated hepatocellular carcinoma detection. Biocybern Biomed Eng 2020. [DOI: 10.1016/j.bbe.2020.08.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
26
|
One-Dimensional Convolutional Neural Networks with Feature Selection for Highly Concise Rule Extraction from Credit Scoring Datasets with Heterogeneous Attributes. ELECTRONICS 2020. [DOI: 10.3390/electronics9081318] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Convolution neural networks (CNNs) have proven effectiveness, but they are not applicable to all datasets, such as those with heterogeneous attributes, which are often used in the finance and banking industries. Such datasets are difficult to classify, and to date, existing high-accuracy classifiers and rule-extraction methods have not been able to achieve sufficiently high classification accuracies or concise classification rules. This study aims to provide a new approach for achieving transparency and conciseness in credit scoring datasets with heterogeneous attributes by using a one-dimensional (1D) fully-connected layer first CNN combined with the Recursive-Rule Extraction (Re-RX) algorithm with a J48graft decision tree (hereafter 1D FCLF-CNN). Based on a comparison between the proposed 1D FCLF-CNN and existing rule extraction methods, our architecture enabled the extraction of the most concise rules (6.2) and achieved the best accuracy (73.10%), i.e., the highest interpretability–priority rule extraction. These results suggest that the 1D FCLF-CNN with Re-RX with J48graft is very effective for extracting highly concise rules for heterogeneous credit scoring datasets. Although it does not completely overcome the accuracy–interpretability dilemma for deep learning, it does appear to resolve this issue for credit scoring datasets with heterogeneous attributes, and thus, could lead to a new era in the financial industry.
Collapse
|
27
|
Gholami J, Pourpanah F, Wang X. Feature selection based on improved binary global harmony search for data classification. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106402] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
28
|
A permutation entropy-based EMD–ANN forecasting ensemble approach for wind speed prediction. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05141-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
29
|
A New Machine Learning Algorithm Based on Optimization Method for Regression and Classification Problems. MATHEMATICS 2020. [DOI: 10.3390/math8061007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
A convex minimization problem in the form of the sum of two proper lower-semicontinuous convex functions has received much attention from the community of optimization due to its broad applications to many disciplines, such as machine learning, regression and classification problems, image and signal processing, compressed sensing and optimal control. Many methods have been proposed to solve such problems but most of them take advantage of Lipschitz continuous assumption on the derivative of one function from the sum of them. In this work, we introduce a new accelerated algorithm for solving the mentioned convex minimization problem by using a linesearch technique together with a viscosity inertial forward–backward algorithm (VIFBA). A strong convergence result of the proposed method is obtained under some control conditions. As applications, we apply our proposed method to solve regression and classification problems by using an extreme learning machine model. Moreover, we show that our proposed algorithm has more efficiency and better convergence behavior than some algorithms mentioned in the literature.
Collapse
|
30
|
Basiri ME, Abdar M, Cifci MA, Nemati S, Acharya UR. A novel method for sentiment classification of drug reviews using fusion of deep and machine learning techniques. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105949] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
31
|
J-LDFR: joint low-level and deep neural network feature representations for pedestrian gender classification. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05015-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|