0% found this document useful (0 votes)
34 views

Customer_Churn_Prediction_employing_Ensemble_Learning

The paper presents a study on customer churn prediction using ensemble learning techniques, specifically focusing on Random Forest, AdaBoost, Gradient Boosting, and XGBoost models. It highlights the importance of predicting potential churners to enable companies to develop effective retention strategies, utilizing a dataset from the Iranian telecom sector. The results indicate that XGBoost outperforms other models in terms of accuracy, precision, recall, and F1 score, emphasizing the effectiveness of advanced machine learning methods in addressing churn issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Customer_Churn_Prediction_employing_Ensemble_Learning

The paper presents a study on customer churn prediction using ensemble learning techniques, specifically focusing on Random Forest, AdaBoost, Gradient Boosting, and XGBoost models. It highlights the importance of predicting potential churners to enable companies to develop effective retention strategies, utilizing a dataset from the Iranian telecom sector. The results indicate that XGBoost outperforms other models in terms of accuracy, precision, recall, and F1 score, emphasizing the effectiveness of advanced machine learning methods in addressing churn issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA) | 979-8-3315-0579-0/24/$31.

00 ©2024 IEEE | DOI: 10.1109/ICCCMLA63077.2024.10871687 2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA)

Customer Churn Prediction employing Ensemble


Learning
1st Mudassir Rafi* 2nd Md. Faiz Ahmad 3rd Varshitha K 4th Siri Varsha T
Department of CSE Paari School of Business Department of CSE Department of CSE
SRM University AP SRM University AP SRM University AP SRM University AP
Amaravati, India Amaravati, India Amaravati, India Amaravati, India
[email protected] [email protected] varshitha [email protected] sirivarsha [email protected]

5th Lahari K 6th Md. Asadul Haque 7th Pavan Kumar Pagadala
Department of CSE Paari School of Business Department of Computer Science and Engineering
SRM University AP SRM University AP Koneru Lakshmaiah Education Foundation
Amaravati, India Amaravati, India Hyderabad, India
lahari [email protected] [email protected] [email protected]

8th Sushama Rani Dutta


Department of Computer Science and Engineering
Koneru Lakshmaiah Education Foundation
Hyderabad, India
[email protected]

Abstract—In recent years, there has been an enormous increase said to be the least expensive method when compared to
in the number of companies and of customers for almost every other strategies like acquiring new customers and up-selling
industry. The increment in the number of companies has also existing customers [5]. Hence, churn prediction stands as an
provided the choices to the customer but in turn it has also
created new challenges. Thus, the companies must work not only important task for a company in any industry, and with the
to improve their products or services but to sustain customers growing field of machine learning comes the power to use
in the competitive world. Churn prediction is the prediction various techniques to do so [6]. Whatever method being used
of customers who are at a potential risk of discontinuing the for churn prediction, the main aim is to predict potential
product or service of the company. Thus, in today’s competitive churners and develop strategies for retaining them. In the
world, churn prediction is more relevant. In the present work,
we have employed various machine learning models for an early present era of digital dominance, predicting churn behaviour of
prediction of churns, to mitigate the potential risk of losing the customers is an uphill task for organizations striving to retain
customers. The authors have chosen ensemble models for this customer base and increase sustained profitability. The present
task. Finally, the models are trained on the dataset. The results study examines the efficiency of deploying various ensemble
for various models are compared using accuracy, precision, recall, models to prophecise customer churn behaviour. By computing
and F1 score. Moreover, it is also observed that for our dataset
XGBoost outperformed over other models. and leveraging the vigor of different machine learning (ML)
Index Terms—Churn prediction, Random Forest, AdaBoost, algorithms viz. Random Forest, AdaBoost, Gradient Boosting,
XGBoost, Gradient Boosting Machine, accuracy, precision, recall, XGBoost etc. to enhance precision and robustness. This study
F1 score comprises of the Iranian telecom dataset available on the UC
Irvine Machine Learning Repository (UCI). With extensive
I. I NTRODUCTION data analysis and validation, it is found that ensemble models
Customer churn is one of the major concerning issues exhibit better performance compared to traditional approaches,
for organization, it not only affects organization in terms of attaining higher precision, recall and F1-scores. The present
decreasing revenue and losing customers to their competitive study not only sheds light on the significance of modern
counterparts but also hindering the growth of organization predictive modeling but also imparts practical acumen for
and reducing its chances to thrive in the market [1], [2]. various stakeholders such as organizations to develop effec-
Customer churn can also signify underlying issues in the tive strategies for retaining customers. The outcome proposes
organization, like poor customer service, support systems, etc. that advanced machine learning techniques can serve as an
[3], [4]. Predicting the customers who are potential churners effective instrument in customer churn prediction, empowering
beforehand may greatly help an organization to take measures businesses to cautiously recognize customer who might be at
towards retaining them, as retaining existing customers is the helm of switching and device appropriate strategy enabling

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:02:02 UTC from IEEE Xplore. Restrictions apply.
979-8-3315-0579-0/24/$31.00 ©2024 IEEE
207
2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA)

organizations to minimize churn rates and nurturing loyalty of preparation, k-means cluster analysis, and CHAID decision
customers. tree modeling. By analyzing demographics, service usage,
contracts, and billing data, the study identifies high-churn
A. Problem description market segments and develops classification models to predict
To tackle the problem of churn, companies can predict churn determinants. The structured approach enhances churn
the potential churners and establish strategies to retain them, management effectiveness by continuously refining strategies
focusing on prediction is more helpful and actionable than based on market segmentation and predictive rules, contribut-
working towards necessary changes, development, and strate- ing to improved customer retention. In the competitive e-
gies after customers start to leave the company. Generally, commerce sector, customer retention is paramount as acquiring
almost all companies have huge amounts of data, and machine new customers becomes increasingly costly. Detecting poten-
learning techniques can be very helpful in making predictions tial churn and implementing temporary retention measures
out of that enormous amount of data. In this work, to tackle is crucial, alongside under- standing reasons for customer
this problem we have used the following ensemble models: departure to tailor win-back strategies. By lever- aging vast
Random Forest, Adaboost, Gradient Boosting Machine, and amounts of customer data, machine learning techniques like
XGboost. The data used to train and evaluate these models support vector machines (SVM) enable predictive analysis
has been balanced using the Synthetic Minority Oversampling for customer attrition, complemented by hybrid recommenda-
Technique (SMOTE) and has been pre-processed. tion strategies for targeted retention efforts. Empirical results
demonstrate significant improvements in key metrics, while
B. Summary of the contribution
utilizing RFM principles aids in categorizing lost customer
The work present in the paper is an effort to know the types for effective churn retention strategies [6]. Ullah et
effectiveness of the ensemble models for churn predictions. al. [9] introduces an intelligent hybrid structure for churn
Following are the distinctive features: prediction of customers in telecom sector, aiming to address
• The data has been balanced using the Synthetic Minority revenue impact of churn. By integrating classification and
Oversampling Technique (SMOTE). clustering algorithms into ensemble models, the proposed
• After balancing the data, it was preprocessed and used model enhances churn prediction performance. Experimenta-
for the training of well-known ensemble learning mod- tion with various clustering algorithms like k-means, coupled
els: Random Forest, Adaboost, Gradient Boosting, and with classifiers such as Gradient Boosted Tree and Deep
XGboost. We have used the power of ensemble learning Learning, achieves high accuracies, surpassing conventional
to achieve better performance. churn prediction methods. Specifically, the stacking-based
• Then the trained models are evaluated using test data hybrid model demonstrates enhanced accuracy. Ahmad et al.
using accuracy, precision, recall, and F1 score metrics, [10] outlines data discretization. In this research, continuous
a comparison of these models has been made to show data are converted into intervals. Later using weighted K-
best best-performing algorithm on this data set. means clustering on the training dataset the data is partitioned.
The rest of this paper is organized as follows: Next section, Then Rule extraction is done after the data has been separated
consists of the work carried on Customer Churn Prediction. and rules are taken out of it. Next, predictions are made using
In section 3, the models used in this paper are explained in the extracted rules. Two steps are usually involved in hybrid
brief and then in the next section, we have discussed the models: preprocessing and mining of pre-processed data. In
methodology of this paper. The results obtained on the test contrast, this study presents a revised system that assumes
set using these models - accuracy, precision, recall, and F1 the presented data has already completed preparation and
score have been evaluated and compared in section 5. Finally, skips the preprocessing step. Rather, the emphasis switches
section 6 discusses the future work and concludes the paper. to dividing the training data into discrete groups and after
that for every collection of data, a set of sub- classifiers
II. LITERATURE REVIEW is built using rule induction. Xiahou and Harada [11] tried
Preetha and Rayapeddi [7], employed various machine to predict customer churn behaviour using K-means and
learning techniques such as decision tree, artificial neural net- SVM in business to customer (B2C) e-commerce setup. They
work (ANN), K-nearest neighbor, and support vector machine segmented customers into three categories and identified the
(SVM). The authors observed improved churn prediction in core customer segment. For predicting customer churn, they
telecommunications industry. This approach offers flexibility compared logistic regression and SVM result, finding SVM
depending on particular predictive requirements by allowing accuracy as superior, signifying better customer relationship
adjustments to be made between precision and recall. In ad- (CRM) of B2C e-commerce organizations.
dition to improving churn prediction accuracy, this study pro-
III. MACHINE LEARNING MODELS USED
vides a methodological framework that may be useful in other
customer-focused industries. Pejic´ et al. [8], a framework A. Random Forest:
is used for combining clustering and classification to man- Random forest is an ensemble learning algorithm. It con-
age churn in telecommunications industry, focusing on mar- structs decision trees during the training phase and provides
ket segmentation. It employs a three-stage approach: dataset the mode of classes in the classification tasks. Decision

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:02:02 UTC from IEEE Xplore. Restrictions apply.

208
2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA)

trees are simple yet they are effective for classification or in which weak learners (usually decision trees) are added
regression tasks, and random forest at its core is composed sequentially to the ensemble to fix errors resulting from earlier
of decision trees making it efficient to handle large datasets models. It minimizes a differentiable loss function through
typically encountered in real-world datasets. For the purpose gradient descent. XGBoost supports early stopping, which is a
of classification, decision trees partition the feature space into technique for preventing overfitting by evaluating performance
regions and make predictions based on majority voting, etc. on a validation set while training. Training ends when the per-
Random Forest can model complex, nonlinear interactions formance on the validation set fails to improve after a specific
between features and the target variable. This is essential number of consecutive iterations. XGBoost can perform k-
in churn prediction, as the causes of customer attrition can fold cross-validation for hyperparameter adjustment and model
be complex and difficult to represent using linear models. evaluation. It automatically divides the data into training and
Random Forest lowers overfitting by combining forecasts from validation sets and averages the performance measures across
several trees, as opposed to individual decision trees. It strikes many folds to provide a more accurate approximation of model
a balance between bias and variance, making it more robust performance.
and less susceptible to memorizing noise in training data.

B. AdaBoost:
Adaptive Boosting works by combining many weak learn-
ers, often decision trees that have only a few levels (sometimes
known as ”stumps”), to create an efficient learner. A weak
learner is a model that works marginally better than random
guessing on training data. AdaBoost trains these weak learners
iteratively. In each iteration, it gives greater weight to instances
that were incorrectly classified by the preceding weak learner.
This focuses the next weak learner on cases that are more
difficult to accurately classify. AdaBoost distributes weights to
each weak learner according to its performance. Weak learners
with higher accuracy receive more weight in the final forecast.
AdaBoost combines the predictions of all weak learners using
a weighted majority voting technique. The final prediction is
based on the weighted sum of predictions from each weak
learner.

C. Gradient Boosting Machine:


GBM sequentially constructs an ensemble of weak learn-
ers, which are often decision trees. Unlike AdaBoost, which
focuses on modifying instance weights, GBM attempts to
enhance model predictions by minimizing a loss function using
gradient descent. GBM begins by training a weak learner
using the original data. Then it sequentially trains more weak
learners to remedy the prior ones’; errors. Each new weak
learner is trained using the residuals from the preceding weak
learner ensemble. GBM optimizes weak learner parameters
by descending the gradient of a loss function in relation to
the ensemble’s predictions. It iteratively changes the weak
learners’; settings to minimize the loss function. GBM is
resistant to noisy data and outliers because it focuses on
reducing the overall loss function rather than fitting each data
point individually. Fig. 1. Steps involved in the Experiment.

D. XGBoost:
IV. PROPOSED WORK
XGBoost, or eXtreme Gradient Boosting, is a highly opti-
mized and scalable implementation of the Gradient Boosting The proposed works have been divided into six phases,
Machine (GBM) method. It has grown in recognition and namely:
is often used in machine-learning contests and real-world 1) Data collection
applications due to its efficiency and excellent predicted 2) Data Pre-processing
accuracy. XGBoost uses the gradient boosting framework, 3) Model selection

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:02:02 UTC from IEEE Xplore. Restrictions apply.

209
2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA)

4) Model training and hyperparameter tuning The phase 3, Model selection, generally, involves the selec-
5) Churn prediction using the trained model tion of machine learning models. As a part of our experimental
6) Model evaluation study we have used various ensemble learning models namely
The first phase starts with the collection of the interested Random Forest, AdaBoost, XGBoost, and Gradient Boosting
dataset. For our experimental study we have taken “Iranian Machine.
Churn” dataset available on the UC Irvine Machine Learning
Repository (UCI). The dataset is a random collection of an Model n est Acc. Prec. Rec. F1
Iranian telecom company’s database over a period of 13 AdaBoost 90 0.9200 0.9000 0.9500 0.9200
months. The dataset consists of 3150 tuples, each one is
representing one customer. Moreover, it consists of 13 features. Random Forest 40 0.9700 0.9600 0.9800 0.9700
The metadata related to the employed dataset represented in
XGBoost 80 0.9600 0.9700 0.9500 0.9600
Table I.
Gradient Boosting 90 0.9500 0.9200 0.9800 0.9500
Features / Attributes Description Datatype Any Missing
Values
TABLE III
Call Failures Count of unsuccessful calls Integer No
R ESULTS FROM THE EVALUATION OF ENSEMBLE MODELS ON THE TEST
Complains Indicates customer complaints Binary No SET.
Subscription Length Total subscription months Integer No

Charge Amount Charge on a scale (0-9) Integer No


As a part of Phase 4, all the models selected for experimen-
Seconds of Use Call duration in seconds Integer No
tation in phase 3 have been implemented and their parameters
Frequency of use Regularity of calls (count) Integer No
have been tuned. We have focused on n estimators i.e., the
Frequency of SMS Overall number of texts Integer No
number of trees or boosting levels used in the ensemble
Distinct Called Num- Count of unique dialed numbers Integer No
bers models, because of three reasons. Firstly, the four ensem-
Age Group Customer’s age range (1-5) Integer No ble models rely on building an ensemble of weak learners,
Tariff Plan Plan type (1: Pay-per-use, 2: Contract) Binary No secondly, the number of estimators determines the ensemble
Status Customer activity (1: Active, 2: Dor- Binary No
size, lastly, the number of estimators significantly affects
mant) the model complexity and generalization. Therefore, tuning
Churn Customer churn (1: Churned, 0: Not
churned)
Binary No this parameter helps achieve a balance between accuracy and
Customer Value Calculated customer value (CLTV) Continuous No
generalization.
In the next phase, the trained model is used for Churn
TABLE I prediction. The model is used on test data to evaluate the
M ETADATA OF THE USED DATASET. models in the next phase. Phase 6, the last phase evaluates
the models on various metrics namely accuracy, precision,
recall,and F1 score. The metrics have been described in the
The phase 2, is data pre-processing which involves various next section V.
steps including data cleaning, missing values treatment etc.
As our data set exhibits class imbalance, which is clear from V. RESULT AND DISCUSSION
the number of churn and non-churn classes. The imbalance With respect to churn predictions, the accuracy, precision
can negatively impact the performance of machine learning and Recall have been defined as follows: Accuracy is defined
models, hence, we have used overe-sampling technique named as the ratio of Correct churn predictions to the total number
as Synthetic Minority Over-sampling Technique (SMOTE). of predictions.
Synthetic Minority Over-sampling Technique is one of the
Number of correct predictions
class-balancing techniques and is used to address class im- Accuracy = (1)
Total number of predictions
balance in datasets having a much smaller representation of
one class compared to other classes. The number of churn and Precision is the ratio of correctly predicted positive churns to
non-churn instances before and after SMOTE are presented in the total number of positive churn predictions.
Table II. Correctly predicted positive instances
Precision = (2)
Total number of positive predictions
Class Before SMOTE After SMOTE
Recall is the ratio of positively predicted samples to actual
Churn 495 2655 positive samples.
Non-Churn 2655 2655 Number of correct positive predictions
Recall = (3)
Total positive instances present in the dataset
TABLE II
C LASS DISTRIBUTION BEFORE AND AFTER APPLICATION OF SMOTE ON F1 score is the mean of the reciprocals of precision and recall.
THE I RANIAN C HURN DATASET. 2 × Recall × Precision
f1 score = (4)
Recall + Precision

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:02:02 UTC from IEEE Xplore. Restrictions apply.

210
2024 IEEE 6th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA)

As mentioned in the previous section we have employed Ira- different ensemble models for churn prediction in the telecom
nian churn dataset for the experimentation purpose. However, sector.
the same settings of the models can be applied to any of the
R EFERENCES
real world data-sets. Moreover, the efficacy of the models have
been observed using metrics such as accuracy, precision, recall [1] H. Ribeiro, B. Barbosa, A. C. Moreira, and R. G. Rodrigues, “Deter-
minants of churn in telecommunication services: a systematic literature
and F1 score. In addition to this, the meaning of True positive, review,” Management Review Quarterly, pp. 1–38, 2023.
false positives, True negative and false negative should only [2] M. Pejic´ Bach, J. Pivar, and B. Jakovic´, “Churn management in
be considered in terms of churner and non-churners only. telecommunications: Hybrid approach using cluster analysis and deci-
sion trees,” Journal of Risk and Financial Management, vol. 14, no. 11,
p. 544, 2021.
[3] J. Shobana, C. Gangadhar, R. K. Arora, P. Renjith, J. Bamini, and Y.
Devidas Chincholkar, “E-commerce customer churn prevention using
machine learning-based business intelligence strategy,” Measurement:
Sensors, vol. 27, p. 100728, 2023.
[4] R. Liu, S. Ali, S. F. Bilal, Z. Sakhawat, A. Imran, A. Almuhaimeed, A.
Alzahrani, and G. Sun, “An intelligent hybrid scheme for customer churn
prediction integrating clustering and classification algorithms,” Applied
Sciences, vol. 12, no. 18, p. 9355, 2022.
[5] Y. Huang and T. Kechadi, “An effective hybrid learning system for
telecommunication churn prediction,” Expert Systems with Applications,
vol. 40, no. 14, pp. 5635–5647, 2013.
[6] A. M. Almana, M. S. Aksoy, and R. Alzahrani, “A survey on data
mining techniques in customer churn analysis for telecom industry,”
International Journal of Engineering Research and Applications, vol. 4,
no. 5, pp. 165–171, 2014.
[7] X. Xiahou and Y. Harada, “B2C e-commerce customer churn prediction
based on k-means and SVM,” Journal of Theoretical and Applied
Electronic Commerce Research, vol. 17, no. 2, pp. 458–475, 2022.
[8] H. Hwang, T. Jung, and E. Suh, “An LTV model and customer
segmentation based on customer value: a case study on the wireless
telecommunication industry,” Expert Systems with Applications, vol. 26,
no. 2, pp. 181–188, 2004.

Fig. 2. Churn Prediction by various ensemble models.

Table III shows that the Random forest shows best pre-
diction accuracy out of all the four classifiers. Whereas, the
precision value of XG boost is maximum and st the same
time the recall value of Gradient Boosting algorithms is
maximum which is 98%. One important observation from all
the ensemble models is that all the models shows very high
accuracy values, in most of the cases more than 95%, means
any of the ensemble model will give competitive prediction
accuracy for churn prediction.

VI. CONCLUSION
In this paper, we introduced a basic user-adaptable sys-
tem for churn prediction, intending to assist businesses in
identifying probable churners so they can retain customers.
Using an Iranian Churn dataset from UCI, we demonstrated
the framework’s usefulness, with an accuracy and F1 score
of 97% and a recall of 98%. Our research emphasizes that
ensemble learning models can be used for an early prediction
of churns. Companies need to identify potential churners to
stay in competition and keep their current customer base. We
conducted a comparative analysis of customer churn prediction
in the telecom sector using popular ensemble models such as
Random Forest, AdaBoost, Gradient Boosting Machine, and
XGBoost. This research presents insights into the efficacy of

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:02:02 UTC from IEEE Xplore. Restrictions apply.

211

You might also like