Posted at 11.12.2018
Abstract-Customer churn is the business term that is utilized to describe lack of clients or customers. Finance institutions, Telecom companies, ISPs, Insurance firms, etc. use customer churn examination and customer churn rate as one of these key business metrics, because retaining a preexisting customer is far less than acquiring a new one. Corporates have dedicated departments which try to regain defecting clients, because recovered permanent customers can be worth much more to a company than recently recruited clients. Customer Churn can be classified into voluntary churn and involuntary churn. In voluntary churn, customer decides to switch to another service provider, whereas in involuntary churn, the client leaves the service scheduled to relocation, loss of life, etc. Businesses usually exclude involuntary churn from churn prediction models, and give attention to voluntary churn, since it usually occurs credited to company-customer romantic relationship, on which the business has full control. Churn is usually measured as gross churn and net churn. Gross churn is determined as loss of past customers and their associated recurring revenue, produced by those customers. World wide web churn is measured as sum of Gross Churn and addition of new similar customers. This is measure as Recurring Monthly Revenue (RMR) in the Financial Systems.
Predicting and protecting against customer churn is becoming the primary target of many enterprises. Every enterprise wants to hold on to its every single customer, to be able to maximize maximum revenue and earnings from them. With the benefits of business and management systems, and automation of procedure circulation, corporates have obtained lots of customer and business related data through the daily operating activities, which give data mining techniques a good ground for working and predicting. Lots of data mining algorithms and models have surfaced to rescue from this problem of customer damage. These algorithms have been trusted, from past ages, in this field.
For prediction of customer churn, many algorithms and models have been applied. Most typical of these are Decision tree , Artificial Neural Network , Logistic Regression . Furthermore, other algorithms such as Bayesian Network , Support Vector Machine , Tough established , and Success Analysis  are also used.
In addition of algorithms and models, other techniques, such as suggestions adjustable selection, feature selection, outlier diagnosis, etc. have also been applied to get better results out of the above algorithms.
First three models i. e. Decision tree, Artificial Neural Network and Logistic Regression have been applied maturely at multiple corporates. Each algorithm has been upgraded over multiple iterations, and are now pretty much secure. But as the procedure and activities of business are growing, it is now increasingly more complex challenge to resolve the challenge of customer churn, which is requesting for the generation of new churn prediction models, which are fast and robust, and which can quickly be trained and scored on huge amounts of data.
Jiayin and Yuanquan  provided a step-by-step methodology on selecting effective input factors for customer churn prediction model in telecommunication industry. In telecommunication industry, there are usually very large number of source variables is available for churn prediction models. Of all these variables, there could be variables which have positive effect on the model, and few that are redundant. These redundant variables cause overload for the churn prediction model. So that it is always easier to select only important features and remove redundant, noisy and less helpful variables. Within their study, they have got suggested Area under ROC (AUC) way for calculating classifying capabilities of the adjustable, where ROC is Receiver Operating Characteristics, and then selecting factors which have the highest classifying abilities. Furthermore, he also suggested to compute common information among all picked variables and finally selecting variables which have relatively low mutual information co-efficient.
Huang and Kechadi  proposed a new technique for Feature Selection for the churn prediction models. As their principal concentration was telecommunication industry, and in telecom the amount of input parameters / feature is very large, and it is always easier to decide on a subset of features, that have the most ability to classify the target classes. Otherwise working algorithm on all the suggestions variables will be too much to time and source of information consuming. Most commonly used approaches for selection of features only judges whether an insight feature is effective to classify the classes or not. The approach proposed by them takes into account the relationship between your given categorical value of the feature and a course for selecting or eliminating the feature.
Luo, Shoa and Rest  proposed the client churn prediction using Decision Tree for Personal Handyphone System Service (PHSS), where in fact the number of variables in input data set in place is really small. Decision Tree is probably the mostly used data mining algorithm. Decision Tree model is a predictive model that predicts using a classification process. It is represented as upside down Tree, in which root reaches the most notable and leaves are at underneath. Decision Trees and shrubs is the representation of guidelines. This can help us in understanding, why a record has been classified in a specific way. And these guidelines can be used to find details that fall into some specific category. In their work they found out the optimal prices of insight dataset with reference to time sub-period, cost of misclassification and sampling method. Using their research, they emerged up to final result that 10-times of sub-period, 1:5 cost of misclassification and arbitrary sampling method will be the most optimal guidelines when training a data model using decision trees and shrubs, when the number of input variables is really small.
Ming, Huili and Yuwei  proposed a model for churn prediction using Bayesian Network. The concept of Bayesian Network was in the beginning proposed by Judea Pearl (1986). That is a kind of graphics setting used showing the joint probability among different factors. It provides an all natural way to describe the causality information which could be used in discovering the potential relations in data. This algorithm has been successively found in knowledge representation of expert system, data mining and machine learning. Recently, it has additionally been applied in areas of artificial intellect, including causal reasoning, uncertain knowledge representation, routine recognition cluster evaluation and etc.
A Bayesian network includes many nodes representing characteristics linked by some lines, so the problems are concerned that more than one attribute determine another one which involving the theory of multiple probability circulation. Besides, since different Bayesian networks have different set ups plus some conceptions in graph theory such as tree, graph and directed acyclic graph can explain these structures evidently, graph theory can be an important theoretical base of Bayesian networks as well as the probability theory, thus the results of Customer Churn using Bayesian network are extremely promising.
Jiayin, Yangming, Yingying and Shuang  proposed a fresh algorithm for churn prediction and called it TreeLogit. This algorithm is combination of ADTree and Logistic Regression models. It incorporates the advantages of both algorithms and rendering it equally good as TreeNet Model which won the best prize in 2003 customer churn prediction contest. As Treelogit combines the advantages of both base algorithms so it becomes very powerful tool for customer churn prediction.
The Modeling procedure for TreeLogit starts by Making Customer's personality variables based on prior knowledge. Then the character variables are grouped into m sub-vectors, and a choice tree for each and every sub-vector is established. After we have your choice tree for every sub-vector, then we develop logistic regression models for each sub-vector. And lastly we measure the precision and interpretability of the model. If they are acceptable then the customer retention process is started out, often the model is re-tuned for greater results.
Jing and Xinghua  in their work on customer churn prediction, shown a model based on Support Vector Machines. Support Vector Machines are developed based on statistical learning theory which is undoubtedly the best theory for the small test estimation and predictive learning. The studies on the machine learning of finite sample were began by Vapnik in sixties of previous century and a comparatively complete theoretical system called statistical learning theory was set up in nineties. From then on, Support Vector Machines, a new learning machine was proposed. SVM is built on the structural risk minimization basic principle that is to reduce the real error probability and is principally used to resolve the pattern reputation problems. Due to SVM's complete theoretical construction and the nice effects in request, it has been widely respected in machine learning field.
Xu E, Liangeshan Shao, XXuedong Gao and Zhai Baofeng launched Rough established algorithm for customer churn prediction . Dengh Hu also analyzed the applications of harsh collection for customer churn prediction. Matching to them, Abrasive collection is a data analysis theory suggested by Z. Pawlak. Its main idea is to export your choice or classification rules by knowledge decrease at the idea of keeping the classification ability unchanged. This theory has some unique views such as knowledge granularity which make Rough collection theory especially suited to data analysis. Tough set is made on the basis of classification mechanism and the space's partition made by equivalence relation is regarded as knowledge. In most cases, it explains the imprecise or uncertain knowledge using the knowledge that has been proved. On this theory, knowledge is undoubtedly a kind of classification potential on data and the items in the universe are usually identified by decision table that is clearly a two-dimensional desk whose row symbolizes an subject and column an attribute. The attribute includes decision attribute and condition feature. The objects in the world can be distributed into decision classes with different decision features according to the condition attributes of these. Among the core details in the abrasive set in place theory is reduction that is a process where some unimportant or irrelevant knowledge are deleted at the idea of keeping the classification capability unchanged. A decision stand may have several reductions whose intersection was thought as the center of your choice table. The attribute of the central is important due to the impact to classification.
Survival analysis is some sort of Statistical Analysis method to assess and deduce the life expectancy of the animals or products according to the data comes from surveys or experiments. It always combines the consequences of some occurrences and the matching time span to analyze some problems. It was initially used in medical science to review the drugs' influence to the life expectancy of the study objects. The survival time should be acknowledged generally, that is, the duration of some condition in dynamics, society or technical process. In such a paper, the churn of a person is undoubtedly the finish of the customer's success time. Inside the fifties of last century, the statisticians commenced to study the trustworthiness of professional products, which advanced the development of the survival evaluation theoretically and software. The proportional risk regression model is a commonly used survival analysis strategy that was first suggested by Cox in 1972.
Jiayin and Yuanquan  suggested a simple way for the variable selection. The technique proposed is very effective and practical, But there will be more organized methods available, designed to use advance neural network, induction algorithms and hard set.
Huang's and Kechadi's  strategy for taking into account the categorical prices into consideration when feature selection is being performed, is good. But their theory is limited to categorical principles and continues beliefs can't be applied on the approach. Continues ideals need to be discretized into categorical worth, before their feature selection idea could be applied, but this transformation from is constantly on the discrete may result in loss of information.
Luo, Shoa and Lie  chosen Decision Tree as their choice of data mining algorithm for churn prediction, which is the simplest and understandable algorithm for classification. Its straightforwardness also makes it the most widely used algorithm. But decision tree has its own limitations, they are extremely unstable and a very little change in the type variables, such as addition of newer ones, require rebuilding and re-training of complete decision tree. In addition, they should have also focused on how to enrich the source variables, with the addition of new derived parameters that could improve the efficiency of the model.
Ming, Huili and Yuwei  Bayesian network model has advantages plus some brief comings. It has the ability to product best results even though the type datasets are imperfect. In addition, it has the capacity to take connections into account when predicting churn and to take previous knowledge into consideration. This algorithm also has the capability to effectively prevent over fitting. If the dataset is large, the structure learning of the Bayesian sites will be too difficult. Thus this model is not fit for telecom, where in fact the dataset is usually large.
Jiayin, Yangming, Yingying and Shuang  TreeLogit combines the advantages of both algorithms i. e. ADTree and logistic regression, thus it is both data-driven and assumption-driven and it has the capability of studying objects with incomplete information. Furthermore, its efficiency is not influenced by the bad quality data and it creates continues outcome with relatively low difficulty.
Jing and Xinghua  used Support Vector Machine algorithm for Churn Prediction. This algorithm is most beneficial if you have a restricted number of sample information, but on the other hand its theory is highly complex and there are many versions in it. So it is difficult to find the version which best suites your trouble.
There are multiple alternatives available for customer churn prediction. Each has its advantages and disadvantages. So an individual solution may not be best for just about any organization. The business may need to use the mixture of algorithms and ways to have the best results for churn prediction.