1
|
Towards Explainable Machine Learning for Bank Churn Prediction Using Data Balancing and Ensemble-Based Methods. MATHEMATICS 2022. [DOI: 10.3390/math10142379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
The diversity of data collected on both social networks and digital interfaces is extremely increased, raising the problem of heterogeneous variables that are not often favourable to classification algorithms. Despite the significant improvement in machine learning (ML) and predictive analysis efficiency for classification in customer relationship management systems (CRM), their performance remains very limited by heterogeneous data processing, class imbalance, and feature scales. This impact turned out to be more important for simple ML methods which in addition often suffer from over-fitting. This paper proposes a succinct and detailed ML model building process including cross-validation of the combination of SMOTE to balance data and ensemble methods for modelling. From the conducted experiments, the random forest (RF) model yielded the best performance of 0.86 in terms of accuracy and f1-scoreusing balanced data. It confirms the literature summary about this topic which shows that RF was among the most effective algorithms for customer predictive classification issues. The constructed and optimized models were interpreted by Shapley values and feature importance analysis which shows that the “age” feature was the most significant while “HasCrCard” was the less one. This process has proven effective in bridging previously reported research gaps and the resulting model should be used for supporting bank customer loyalty decision-making.
Collapse
|
2
|
Customer Churn Prediction in B2B Non-Contractual Business Settings Using Invoice Data. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12105001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Customer churn is a problem virtually all companies face, and the ability to predict it reliably can be a cornerstone for successful retention campaigns. In this study, we propose an approach to customer churn prediction in non-contractual B2B settings that relies exclusively on invoice-level data for feature engineering and uses multi-slicing to maximally utilize available data. We cast churn as a binary classification problem and assess the ability of three established classifiers to predict it when using different churn definitions. We also compare classifier performance when different amounts of historical data are used for feature engineering. The results indicate that robust models for different churn definitions can be derived by using invoice-level data alone and that using more historical data for creating some of the features tends to lead to better performing models for some classifiers. We also confirm that the multi-slicing approach to dataset creation yields better performing models compared to the traditionally used single-slicing approach.
Collapse
|