Customer_Churn_Prediction_employing_Ensemble_Learning
Customer_Churn_Prediction_employing_Ensemble_Learning
00 ©2024 IEEE | DOI: 10.1109/ICCCMLA63077.2024.10871687 2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA)
5th Lahari K 6th Md. Asadul Haque 7th Pavan Kumar Pagadala
Department of CSE Paari School of Business Department of Computer Science and Engineering
SRM University AP SRM University AP Koneru Lakshmaiah Education Foundation
Amaravati, India Amaravati, India Hyderabad, India
lahari [email protected] [email protected] [email protected]
Abstract—In recent years, there has been an enormous increase said to be the least expensive method when compared to
in the number of companies and of customers for almost every other strategies like acquiring new customers and up-selling
industry. The increment in the number of companies has also existing customers [5]. Hence, churn prediction stands as an
provided the choices to the customer but in turn it has also
created new challenges. Thus, the companies must work not only important task for a company in any industry, and with the
to improve their products or services but to sustain customers growing field of machine learning comes the power to use
in the competitive world. Churn prediction is the prediction various techniques to do so [6]. Whatever method being used
of customers who are at a potential risk of discontinuing the for churn prediction, the main aim is to predict potential
product or service of the company. Thus, in today’s competitive churners and develop strategies for retaining them. In the
world, churn prediction is more relevant. In the present work,
we have employed various machine learning models for an early present era of digital dominance, predicting churn behaviour of
prediction of churns, to mitigate the potential risk of losing the customers is an uphill task for organizations striving to retain
customers. The authors have chosen ensemble models for this customer base and increase sustained profitability. The present
task. Finally, the models are trained on the dataset. The results study examines the efficiency of deploying various ensemble
for various models are compared using accuracy, precision, recall, models to prophecise customer churn behaviour. By computing
and F1 score. Moreover, it is also observed that for our dataset
XGBoost outperformed over other models. and leveraging the vigor of different machine learning (ML)
Index Terms—Churn prediction, Random Forest, AdaBoost, algorithms viz. Random Forest, AdaBoost, Gradient Boosting,
XGBoost, Gradient Boosting Machine, accuracy, precision, recall, XGBoost etc. to enhance precision and robustness. This study
F1 score comprises of the Iranian telecom dataset available on the UC
Irvine Machine Learning Repository (UCI). With extensive
I. I NTRODUCTION data analysis and validation, it is found that ensemble models
Customer churn is one of the major concerning issues exhibit better performance compared to traditional approaches,
for organization, it not only affects organization in terms of attaining higher precision, recall and F1-scores. The present
decreasing revenue and losing customers to their competitive study not only sheds light on the significance of modern
counterparts but also hindering the growth of organization predictive modeling but also imparts practical acumen for
and reducing its chances to thrive in the market [1], [2]. various stakeholders such as organizations to develop effec-
Customer churn can also signify underlying issues in the tive strategies for retaining customers. The outcome proposes
organization, like poor customer service, support systems, etc. that advanced machine learning techniques can serve as an
[3], [4]. Predicting the customers who are potential churners effective instrument in customer churn prediction, empowering
beforehand may greatly help an organization to take measures businesses to cautiously recognize customer who might be at
towards retaining them, as retaining existing customers is the helm of switching and device appropriate strategy enabling
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:02:02 UTC from IEEE Xplore. Restrictions apply.
979-8-3315-0579-0/24/$31.00 ©2024 IEEE
207
2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA)
organizations to minimize churn rates and nurturing loyalty of preparation, k-means cluster analysis, and CHAID decision
customers. tree modeling. By analyzing demographics, service usage,
contracts, and billing data, the study identifies high-churn
A. Problem description market segments and develops classification models to predict
To tackle the problem of churn, companies can predict churn determinants. The structured approach enhances churn
the potential churners and establish strategies to retain them, management effectiveness by continuously refining strategies
focusing on prediction is more helpful and actionable than based on market segmentation and predictive rules, contribut-
working towards necessary changes, development, and strate- ing to improved customer retention. In the competitive e-
gies after customers start to leave the company. Generally, commerce sector, customer retention is paramount as acquiring
almost all companies have huge amounts of data, and machine new customers becomes increasingly costly. Detecting poten-
learning techniques can be very helpful in making predictions tial churn and implementing temporary retention measures
out of that enormous amount of data. In this work, to tackle is crucial, alongside under- standing reasons for customer
this problem we have used the following ensemble models: departure to tailor win-back strategies. By lever- aging vast
Random Forest, Adaboost, Gradient Boosting Machine, and amounts of customer data, machine learning techniques like
XGboost. The data used to train and evaluate these models support vector machines (SVM) enable predictive analysis
has been balanced using the Synthetic Minority Oversampling for customer attrition, complemented by hybrid recommenda-
Technique (SMOTE) and has been pre-processed. tion strategies for targeted retention efforts. Empirical results
demonstrate significant improvements in key metrics, while
B. Summary of the contribution
utilizing RFM principles aids in categorizing lost customer
The work present in the paper is an effort to know the types for effective churn retention strategies [6]. Ullah et
effectiveness of the ensemble models for churn predictions. al. [9] introduces an intelligent hybrid structure for churn
Following are the distinctive features: prediction of customers in telecom sector, aiming to address
• The data has been balanced using the Synthetic Minority revenue impact of churn. By integrating classification and
Oversampling Technique (SMOTE). clustering algorithms into ensemble models, the proposed
• After balancing the data, it was preprocessed and used model enhances churn prediction performance. Experimenta-
for the training of well-known ensemble learning mod- tion with various clustering algorithms like k-means, coupled
els: Random Forest, Adaboost, Gradient Boosting, and with classifiers such as Gradient Boosted Tree and Deep
XGboost. We have used the power of ensemble learning Learning, achieves high accuracies, surpassing conventional
to achieve better performance. churn prediction methods. Specifically, the stacking-based
• Then the trained models are evaluated using test data hybrid model demonstrates enhanced accuracy. Ahmad et al.
using accuracy, precision, recall, and F1 score metrics, [10] outlines data discretization. In this research, continuous
a comparison of these models has been made to show data are converted into intervals. Later using weighted K-
best best-performing algorithm on this data set. means clustering on the training dataset the data is partitioned.
The rest of this paper is organized as follows: Next section, Then Rule extraction is done after the data has been separated
consists of the work carried on Customer Churn Prediction. and rules are taken out of it. Next, predictions are made using
In section 3, the models used in this paper are explained in the extracted rules. Two steps are usually involved in hybrid
brief and then in the next section, we have discussed the models: preprocessing and mining of pre-processed data. In
methodology of this paper. The results obtained on the test contrast, this study presents a revised system that assumes
set using these models - accuracy, precision, recall, and F1 the presented data has already completed preparation and
score have been evaluated and compared in section 5. Finally, skips the preprocessing step. Rather, the emphasis switches
section 6 discusses the future work and concludes the paper. to dividing the training data into discrete groups and after
that for every collection of data, a set of sub- classifiers
II. LITERATURE REVIEW is built using rule induction. Xiahou and Harada [11] tried
Preetha and Rayapeddi [7], employed various machine to predict customer churn behaviour using K-means and
learning techniques such as decision tree, artificial neural net- SVM in business to customer (B2C) e-commerce setup. They
work (ANN), K-nearest neighbor, and support vector machine segmented customers into three categories and identified the
(SVM). The authors observed improved churn prediction in core customer segment. For predicting customer churn, they
telecommunications industry. This approach offers flexibility compared logistic regression and SVM result, finding SVM
depending on particular predictive requirements by allowing accuracy as superior, signifying better customer relationship
adjustments to be made between precision and recall. In ad- (CRM) of B2C e-commerce organizations.
dition to improving churn prediction accuracy, this study pro-
III. MACHINE LEARNING MODELS USED
vides a methodological framework that may be useful in other
customer-focused industries. Pejic´ et al. [8], a framework A. Random Forest:
is used for combining clustering and classification to man- Random forest is an ensemble learning algorithm. It con-
age churn in telecommunications industry, focusing on mar- structs decision trees during the training phase and provides
ket segmentation. It employs a three-stage approach: dataset the mode of classes in the classification tasks. Decision
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:02:02 UTC from IEEE Xplore. Restrictions apply.
208
2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA)
trees are simple yet they are effective for classification or in which weak learners (usually decision trees) are added
regression tasks, and random forest at its core is composed sequentially to the ensemble to fix errors resulting from earlier
of decision trees making it efficient to handle large datasets models. It minimizes a differentiable loss function through
typically encountered in real-world datasets. For the purpose gradient descent. XGBoost supports early stopping, which is a
of classification, decision trees partition the feature space into technique for preventing overfitting by evaluating performance
regions and make predictions based on majority voting, etc. on a validation set while training. Training ends when the per-
Random Forest can model complex, nonlinear interactions formance on the validation set fails to improve after a specific
between features and the target variable. This is essential number of consecutive iterations. XGBoost can perform k-
in churn prediction, as the causes of customer attrition can fold cross-validation for hyperparameter adjustment and model
be complex and difficult to represent using linear models. evaluation. It automatically divides the data into training and
Random Forest lowers overfitting by combining forecasts from validation sets and averages the performance measures across
several trees, as opposed to individual decision trees. It strikes many folds to provide a more accurate approximation of model
a balance between bias and variance, making it more robust performance.
and less susceptible to memorizing noise in training data.
B. AdaBoost:
Adaptive Boosting works by combining many weak learn-
ers, often decision trees that have only a few levels (sometimes
known as ”stumps”), to create an efficient learner. A weak
learner is a model that works marginally better than random
guessing on training data. AdaBoost trains these weak learners
iteratively. In each iteration, it gives greater weight to instances
that were incorrectly classified by the preceding weak learner.
This focuses the next weak learner on cases that are more
difficult to accurately classify. AdaBoost distributes weights to
each weak learner according to its performance. Weak learners
with higher accuracy receive more weight in the final forecast.
AdaBoost combines the predictions of all weak learners using
a weighted majority voting technique. The final prediction is
based on the weighted sum of predictions from each weak
learner.
D. XGBoost:
IV. PROPOSED WORK
XGBoost, or eXtreme Gradient Boosting, is a highly opti-
mized and scalable implementation of the Gradient Boosting The proposed works have been divided into six phases,
Machine (GBM) method. It has grown in recognition and namely:
is often used in machine-learning contests and real-world 1) Data collection
applications due to its efficiency and excellent predicted 2) Data Pre-processing
accuracy. XGBoost uses the gradient boosting framework, 3) Model selection
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:02:02 UTC from IEEE Xplore. Restrictions apply.
209
2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA)
4) Model training and hyperparameter tuning The phase 3, Model selection, generally, involves the selec-
5) Churn prediction using the trained model tion of machine learning models. As a part of our experimental
6) Model evaluation study we have used various ensemble learning models namely
The first phase starts with the collection of the interested Random Forest, AdaBoost, XGBoost, and Gradient Boosting
dataset. For our experimental study we have taken “Iranian Machine.
Churn” dataset available on the UC Irvine Machine Learning
Repository (UCI). The dataset is a random collection of an Model n est Acc. Prec. Rec. F1
Iranian telecom company’s database over a period of 13 AdaBoost 90 0.9200 0.9000 0.9500 0.9200
months. The dataset consists of 3150 tuples, each one is
representing one customer. Moreover, it consists of 13 features. Random Forest 40 0.9700 0.9600 0.9800 0.9700
The metadata related to the employed dataset represented in
XGBoost 80 0.9600 0.9700 0.9500 0.9600
Table I.
Gradient Boosting 90 0.9500 0.9200 0.9800 0.9500
Features / Attributes Description Datatype Any Missing
Values
TABLE III
Call Failures Count of unsuccessful calls Integer No
R ESULTS FROM THE EVALUATION OF ENSEMBLE MODELS ON THE TEST
Complains Indicates customer complaints Binary No SET.
Subscription Length Total subscription months Integer No
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:02:02 UTC from IEEE Xplore. Restrictions apply.
210
2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA)
As mentioned in the previous section we have employed Ira- different ensemble models for churn prediction in the telecom
nian churn dataset for the experimentation purpose. However, sector.
the same settings of the models can be applied to any of the
R EFERENCES
real world data-sets. Moreover, the efficacy of the models have
been observed using metrics such as accuracy, precision, recall [1] H. Ribeiro, B. Barbosa, A. C. Moreira, and R. G. Rodrigues, “Deter-
minants of churn in telecommunication services: a systematic literature
and F1 score. In addition to this, the meaning of True positive, review,” Management Review Quarterly, pp. 1–38, 2023.
false positives, True negative and false negative should only [2] M. Pejic´ Bach, J. Pivar, and B. Jakovic´, “Churn management in
be considered in terms of churner and non-churners only. telecommunications: Hybrid approach using cluster analysis and deci-
sion trees,” Journal of Risk and Financial Management, vol. 14, no. 11,
p. 544, 2021.
[3] J. Shobana, C. Gangadhar, R. K. Arora, P. Renjith, J. Bamini, and Y.
Devidas Chincholkar, “E-commerce customer churn prevention using
machine learning-based business intelligence strategy,” Measurement:
Sensors, vol. 27, p. 100728, 2023.
[4] R. Liu, S. Ali, S. F. Bilal, Z. Sakhawat, A. Imran, A. Almuhaimeed, A.
Alzahrani, and G. Sun, “An intelligent hybrid scheme for customer churn
prediction integrating clustering and classification algorithms,” Applied
Sciences, vol. 12, no. 18, p. 9355, 2022.
[5] Y. Huang and T. Kechadi, “An effective hybrid learning system for
telecommunication churn prediction,” Expert Systems with Applications,
vol. 40, no. 14, pp. 5635–5647, 2013.
[6] A. M. Almana, M. S. Aksoy, and R. Alzahrani, “A survey on data
mining techniques in customer churn analysis for telecom industry,”
International Journal of Engineering Research and Applications, vol. 4,
no. 5, pp. 165–171, 2014.
[7] X. Xiahou and Y. Harada, “B2C e-commerce customer churn prediction
based on k-means and SVM,” Journal of Theoretical and Applied
Electronic Commerce Research, vol. 17, no. 2, pp. 458–475, 2022.
[8] H. Hwang, T. Jung, and E. Suh, “An LTV model and customer
segmentation based on customer value: a case study on the wireless
telecommunication industry,” Expert Systems with Applications, vol. 26,
no. 2, pp. 181–188, 2004.
Table III shows that the Random forest shows best pre-
diction accuracy out of all the four classifiers. Whereas, the
precision value of XG boost is maximum and st the same
time the recall value of Gradient Boosting algorithms is
maximum which is 98%. One important observation from all
the ensemble models is that all the models shows very high
accuracy values, in most of the cases more than 95%, means
any of the ensemble model will give competitive prediction
accuracy for churn prediction.
VI. CONCLUSION
In this paper, we introduced a basic user-adaptable sys-
tem for churn prediction, intending to assist businesses in
identifying probable churners so they can retain customers.
Using an Iranian Churn dataset from UCI, we demonstrated
the framework’s usefulness, with an accuracy and F1 score
of 97% and a recall of 98%. Our research emphasizes that
ensemble learning models can be used for an early prediction
of churns. Companies need to identify potential churners to
stay in competition and keep their current customer base. We
conducted a comparative analysis of customer churn prediction
in the telecom sector using popular ensemble models such as
Random Forest, AdaBoost, Gradient Boosting Machine, and
XGBoost. This research presents insights into the efficacy of
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:02:02 UTC from IEEE Xplore. Restrictions apply.
211