This repository is dedicated to studying how we can prevent customers from switching to other telecommunications services, a practice commonly referred to as 'churn.' By analyzing customer data, we aim to develop a machine learning model capable of identifying which customers are most likely to leave for competitors. This is crucial for helping companies tailor personalized offers and services to retain at-risk customers.
A telecommunications company is particularly concerned about the number of clients switching from their fixed-line services to cable competitors. The goal is to understand who is leaving and why. As a data analyst for this company, your task is to identify these customers and uncover the reasons behind their migration, enabling the business to take proactive measures to win them back and improve customer retention.
For this machine learning algorithm, we used CatBoost. CatBoost, short for Categorical Boosting, is a gradient boosting algorithm that excels in handling categorical data directly and efficiently. It builds decision trees using gradient boosting, optimizing for speed and accuracy while minimizing overfitting. CatBoost also has built-in support for handling missing values, and it reduces the need for extensive data preprocessing, making it well-suited for structured data tasks.
For this project, we used the database called WA_Fn-UseC_-Telco-Customer-Churn avaliable on Kaggle.
-
WA_Fn-UseC_-Telco-Customer-Churn. Kaggle, 2018. Disponível em: https://www.kaggle.com/datasets/palashfendarkar/wa-fnusec-telcocustomerchurn. Acesso em: 17 de set. de 2024.
-
KOTLER, Philip; KELLER, Kevin Lane. Administração de Marketing. 14. ed. São Paulo: Pearson Prentice Hall, 2012.
-
RICHARDSON, Alan. Retention Strategies for Telecom Customers. Telecommunications Policy, v. 34, n. 11, p. 666-679, 2010.
-
TUKEY, John W. Exploratory Data Analysis. Reading: Addison-Wesley, 1977.
-
AGARWAL, Ruchi; AGGARWAL, Ajay. Data Analytics: Principles, Tools and Practices. 1. ed. Nova York: Apress, 2019.
-
LITTLE, Roderick J. A.; RUBIN, Donald B. Statistical Analysis with Missing Data. 2. ed. Nova York: John Wiley & Sons, 2019.
-
ALLISON, Paul D. Handling Missing Data by Maximum Likelihood. Sociological Methods & Research, v. 28, n. 3, p. 301-309, 2000.
-
GÉRON, Aurélien. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. 2. ed. Sebastopol: O'Reilly Media, 2019.
-
WITTEN, Ian H.; FRANK, Eibe; HALL, Mark A. Data Mining: Practical Machine Learning Tools and Techniques. 4. ed. Burlington: Morgan Kaufmann, 2016.
-
QUINLAN, J. R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann, 1993.
-
BREIMAN, Leo et al. Classification and Regression Trees. Belmont: Wadsworth International Group, 1984. Sure! Here's the translation to English:
-
Classification: ROC and AUC. Google Machine Learning. Available at: https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc?hl=en. Accessed on: September 20, 2024.
Copyright 2024 Mindful-AI-Assistants. Code released under the MIT license.