A COMPARISON OF DATA MINING TECHNIQUES FOR CREDIT SCORING IN BANKING: A MANAGERIAL PERSPECTIVE

. Credit scoring is a very important task for lenders to evaluate the loan applications they receive from consumers as well as for insurance companies, which use scoring systems today to evaluate new policyholders and the risks these prospective customers might present to the insurer. Credit scoring systems are used to model the potential risk of loan applications, which have the advantage of being able to handle a large volume of credit applications quickly with minimal labour, thus reducing operating costs, and they may be an effective substitute for the use of judgment among inexperienced loan ofﬁ cers, thus helping to control bad debt losses. This study explores the performance of credit scoring models using traditional and artiﬁ cial intelligence approaches: discriminant analysis, logistic regression, neural networks and classiﬁ cation and regression trees. Experimental studies using real world data sets have demonstrated that the classiﬁ cation and regression trees and neural networks outperform the traditional credit scoring models in terms of predictive accuracy and type II errors.


Introduction
One of the main tasks of a bank is to lend money. As a fi nancial intermediary, one of its roles is to reduce lending risks. Bank lending is an art as well as a science. Success depends on techniques used, knowledge and on an aptitude to assess both credit-worthiness of a potential borrower and the merits of the proposition to be fi nanced. In recent years, banks have increasingly used credit-scoring techniques to evaluate the loan applications they receive from consumers (Blochlinger and Leippold 2006;Vojtek and Kočenda 2006;Mačerinskienė and Ivaskevičiūtė 2008;Karan and Arslan 2008). Since severe competition and rapid growth in the consumer credit market, credit scoring models have been extensively used for the credit admission evaluation. Credit scoring is a method of modeling potential risk of credit applications (Vojtek and Kočenda 2006;Zhao 2007;Avery et al. 2004;Bodur and Teker 2005;Crook and Banasik 2004;Jacobson and Roszbach 2003). Credit scoring models have been developed by the fi nancial institution and researchers in order to solve the problems involved during the evaluation process.
In the fi rst beginning, fi nancial institutions always utilized the rules or principles built by the analysts to decide whom to give credit. Since the number of applicants increase tremendously, it is impossible in both economic and man power terms to evaluate the credit applications. Several quantitative methods have been developed for credit admission decision. The credit scoring models are developed to categorize applicants as either accepted or rejected with respect to the applicants' characteristics. The objective of credit scoring models is to assign credit applicants to either a 'good credit' group that is likely to repay fi nancial obligation or a 'bad credit' group whose application will be denied because of its high possibility of defaulting on the fi nancial obligation (Lee et al. 2006). The statistical methods, nonparametric statistical methods, and artifi cial intelligence approaches have been proposed to support the credit decision (Thomas 2000). Credit scoring problems are basically in the domain of the more general and widely discussed classifi cation problems (Lee et al. 2002).
The classifi cation problems have long played important roles in business related decision making due to its wide applications in decision support, fi nancial forecasting, fraud detection, marketing strategy, process control, and other related fi elds (Chen et al. 1996;Fayyad et al. 1996;Lee at al. 2006). The classifi cation problem can be solved by using different techniques ranging from statistical methods to artifi cial intelligence algorithms.
Statistical methods, including regression, linear and nonlinear discriminant analysis, logit and probit models were most commonly applied to construct credit scoring models (Vojtek and Kočenda 2006;Lee et al. 2002;Lee et al. 2006). The most popular methods applied to credit scoring models are linear discriminant analysis, logistic regression and their variations. They are relatively easy to implement and are able to generate straightforward results that can be readily interpreted. However, there are some limitations associated with their applications in credit scoring. First of all, these methods are not effective for problems with high-dimensional inputs and small sample size. Most importantly, these techniques rely on linear separability and normality assumptions. Furthermore, it is diffi cult to automate the modeling process and design a continuous update fl ow. According to Yang (2007), the static models usually fail to adapt when environment or population changes over the time. Therefore, these models may need to be rebuilt from scratch.
In addition to these classical methodologies, artifi cial intelligence techniques have been applied to credit scoring. Practitioners and researchers have developed a variety of techniques for credit scoring, which involve k-nearest neighbor (Henley and Hand 1996), decision trees (Lee et al. 2006), neural networks (Lee et al. 2002;Malhotra, R. and Malhotra, D. K. 2002;West 2000), and genetic programming (Ong et al. 2005), support vector machines models. These techniques can be used as an alternative to discriminant analysis and logistic regression, in situations where the dependent and independent variables exhibit complex nonlinear relationships (Lee et al. 2006).
The purpose of this study is to explore the performance of credit scoring using discriminant analysis, logistic regression, neural networks and classifi cation and regression tree. The rest of the paper is organized as follows: We will briefl y review the literature on credit scoring models and a brief outline of statistical methods and artifi cial intelligence techniques in Section 2. The analytic results of credit scoring models using discriminant analysis, logistic regression, neural networks and classifi cation and regression trees are presented in Section 3. Finally, Section 4 addresses the conclusion.

Research methodology and literature review
The credit scoring models investigate the objective and subjective factors that may infl uence the individuals. In order to predict individual's ability to fulfi ll his or her fi nancial commitment as expected, credit scoring models have been developed by using quantitative and qualitative analysis. Next, we briefl y review the background and related literature on credit scoring models.

Statistical methods
Two models have been used widely in credit scoring. These are discriminant analysis and logistic regressions. Several variations of these methods have been proposed. Discriminant analysis, proposed by Fisher (1936), involves the linear combination of explanatory variables that differentiate best between a priori defi ned groups. In order to achieve this, one has to maximize the between-group variance relative to the within group variance. The following equation expresses the discriminant analysis: where Z is the discriminant score, β are the coeffi cients and X are the independent variables. Discriminant analysis can be used if the dependent variable is categorical and the independent variables are metric. In order to use discriminant analysis, the data has to be independent and normally distributed and covariance matrix is required to comply with the variation homogeneity assumption (Rencher 2002). If the covariance matrices of the given populations are not equal, then the separation surface of the discriminant function is quadratic. Therefore, the quadratic discriminant analysis (QDA) needs to be used. Despite the fact that LDA is only a special case of QDA with stronger assumptions, LDA has been reported to be a more robust method when the theoretical presumptions are violated (Lee and Chen 2005). Discriminant analysis has been used to solve classifi cation problems for fi nance, business, and marketing research (Lee et al. 1997;Kim et al. 2000;Trevino and Daniels 1995). For credit scoring problems, several researchers have proposed and used the discriminant analysis and its variations (Lee et al. 2002;Lee et al. 2006).
Logistic regression is a widely used statistical modeling technique in which the probability of a dichotomous outcome is related to a set of potential explanatory variables in the form (Hosmer and Lemeshow 1989): where p is the probability of the outcome of interest, β 0 is the intercept term, and β i represents the β coeffi cient associated with the corresponding independent variable x i (i = 1,…, n). According to Lee et al. (2002), the logistic regression model does not necessarily require the assumptions of discriminant analysis. However, logistic regression can be as effi cient and accurate as discriminant analysis even though the assumptions of discriminant analysis are satisfi ed. An advantage of discriminant analysis is that ordinary least square estimation procedure can be implemented to estimate the coeffi cients of the linear discriminant function, but maximum likelihood methods are required for the estimation of logistic regression models. Logistic regression models have been widely adopted in many areas ranging from business to engineering (Laitinen, E. K. and Laitinen, T. 2000;Suh et al. 1999;Vellido et al. 1999). Logistic regression has also been explored by several in building credit scoring models for personal loan, business loan, and credit card applications (Lee et al. 2006).

Artifi cial intelligence techniques
The artifi cial intelligence techniques, which have made signifi cant contribution to the fi eld of information science (Chen and Liu 2004) can be adopted to construct the credit scoring models. Several artifi cial intelligence techniques, which are decision trees, neural networks, genetic programming, k-nearest neighbor models, have been developed by practitioners and researchers for credit scoring (Malhotra, R. and Malhotra, D. K. 2002;West 2000, Ong et al. 2005Lee et al. 2006;Lee and Chen 2005;Lee et al. 2002). In this study, we will develop and compare credit scoring models based on neural networks, and classifi cation and regression trees.
Neural network (NN), which is an algorithmic procedure for transforming inputs into desired outputs using highly inter-connected networks of relatively simple processing elements (nodes), is a class of nonlinear regression and discrimination models. The neural networks consist of the nodes, the network topology describing the connections between nodes, and the training algorithm used to determine the values of network weights for a particular network. The nodes are connected to one another in the sense that the output from one node can be served as the input to other nodes. Each node transforms an input to an output using a transfer function. Network topology gives the organization of nodes and the types of connections. The nodes are arranged in a series of layers with connections between nodes in different layers. The fi rst layer called input layer receives the inputs. An example of neural networks with one hidden layer is shown in Fig. 1 (Crook et al. 2007). The appropriate network topology (i.e., the number of hidden neurons in hidden layer) can be determined by the following equation: where h is the number of the hidden units, n and m is the number of input and output units respectively.
Neural networks can be classifi ed into different categories such as feedforward and feedback networks. The nodes in feedforward networks can take inputs only from the previous layer and send outputs to the next layer. The multilayer perceptron (MLP) uses back propagation algorithm which is a gradient steepest descent algorithm. In order to fi nd the optimal weight, BP tries to minimize the network error. The step size, called the learning rate, must be specifi ed fi rst. The learning rate is crucial for BPN since smaller values tend to slow down the training process before convergence while larger ones may cause network oscillation and are unable to converge. Several variations of BP algorithm have been proposed to overcome the difficulties such as reaching local minimum, slow convergence and overtraining, detailed information on neural networks can be found in (Haykin 1998).
Decision tree is one of the different approaches to build a classifi cation model by using inductive reasoning. It produces a model of tree-shaped structure representing segmentation of the data that is created by applying a series of simple rules. These rules can be used for prediction through repetitive process of splitting. The decision tree theory is very suitable for credit scoring model and used widely (Lee and Chen 2005). The following decision tree algorithms have been used for prediction and classifi cation: ID3, C4.5, Classifi cation and Regression Trees (CART), and Chi-squared Automatic Interactive Detector (CHAID) models.
ID3 (Iterative Dichotomiser 3) was proposed by Quinlan (1993) to generate decision trees. It is based on theory of information gain. ID3 determines the optimal information gain as an attribute for branching of decision trees so that the tree thus built has a simple structure (Zhao 2007). Information gain is computed by the entropy of the sub-trees produced by a node of a decision tree using a certain attribute, as well as that of the whole data set. The disadvantage of ID3 is that it uses the information gain as a rule to select attributes for branching which result in bias over attributes of higher values. In order to remove this drawback, C4.5, which is an extension and revision of ID3, was proposed (Chang and Chen 2008). C4.5 algorithm uses information gain-ratio to segment attributes. C5 algorithm offering improvements for C4.5 can be used in processing a huge data set because it uses boosting trees to increase modeling accuracy (Chang and Chen 2008). In addition to this, it is much faster in speed and is more effi cient than C4.5 in terms of memory usage. According to (Tso and Yau 2007), C5 has the following advantages over C4.5 algorithm: "(1) the branch-merging option for nominal splits is the default; (2) misclassifi cation costs can be specifi ed; (3) boosting and cross-validation are available; and (4) the algorithm for creating rule sets from trees is much improved".
Besides these algorithms, several researchers have proposed other decision trees techniques. One of them is classifi cation and regression trees known as CART, a statistical procedure introduced by Breiman et al. (1984). It is a recursive partitioning method to be used both for regression and classifi cation. It is primarily used as a classifi cation tool to classify an object into two or more populations. It can be used to analyze the continuous data. The CART algorithm can be summarized in three stages as follows (Chang and Chen 2008): 1. In this stage, recursive partitioning technique is used to select variables and split points using a splitting criterion. The best predictor is chosen using a variety of impurity or diversity measures (Gini, twoing, ordered twoing and least-squared deviation). The detailed information about how to compute these impurity measures can be found in (Breiman et al. 1984). The objective is to produce subsets of the data which are as homogeneous as possible with respect to the target variable (Breiman et al. 1984) 2. After identifying a large tree, CART uses the pruning procedure that incorporates a minimal cost complexity. Pruning procedure yields a nested subset of trees starting from the largest tree grown and continuing the process until only one node of the tree remains. 3. In the last stage, the optimal tree is selected by using the lowest cross-validated or testing set error criteria.
Neural networks and decision trees have been widely used to solve several problems related to engineering, science, business, forecasting fi elds (Vellido et al. 1999;Lee et al. 2002;Lee et al. 2006). Neural networks and decision trees have been used to deal with credit scoring problems (Lee et al. 2006;Lee et al. 2002). Also, decision trees have been used widely in the context of credit scoring models. We will use multilayer perceptron (MLP) networks, CART decision trees algorithm.

Empirical study
To verify the feasibility and effectiveness of the credit scoring models using discriminant analysis, logistic regression, decision trees (C5, CART), and neural networks, credit card data set provided by a Turkish bank is used. Each bank customer in the data set contains nine predictor variables, namely, gender, age, marital status, educational level, occupation, job position, income, customer type and credit cards from the other banks. The response variable is the credit status of the customer-good or bad credit. The data set is composed of 1260 customers' records. Among them, 890 data sets with respect to the ratio of good and bad credit were randomly selected as the training sample to estimate the parameters of the corresponding credit scoring model. The remaining 370 will be retained for validation (evaluating the classifi cation capability of the scoring models).
Weka data mining software (Witten and Frank 2005) will be utilized to develop neural networks, decision trees and logistic regression credit scoring models. The discriminant analysis credit scoring models will be implemented by using SPSS 13.0. All the modeling tasks are implemented on an IBM PC with Intel Pentium D 3.0GHz CPU processor with 2 GB of RAM. The detailed credit scoring results using the above-mentioned fi ve modeling techniques can be summarized as follows.

Discriminant Analysis
The stepwise discriminant approach (Rencher 2002) is adopted in building the discriminant analysis credit scoring model. The fi nal discriminant function has fi ve signifi cant predictor variables, namely income, education, age, occupation, marital status. The credit scoring results of the training and testing sample using the obtained discriminant function are summarized in Table 1. For training and testing sample, the average correct classifi cation rate is 65.23% and 62.00% respectively. For training set, 137 customers with good credit are classifi ed as bad credit customers for training and 169 customers with bad credit are classifi ed as good credit customers. 52 customers with good credit are classifi ed as bad credit customers, and 81 customers with bad credit are classifi ed as good credit customers for testing.

Logistic Regression
The stepwise logistic regression procedure is used in building the credit scoring model. The variables included in credit scoring model are income, education, customer type. The following Table shows the credit scoring results of the training and testing sample. As it can be seen from Table 2, average co rrect classifi cation rates of training and testing are 66.37% and 62.33%, respectively. For training set, 113 customers with good credit are classifi ed as bad credit customers for training and 186 customers with bad credit are classifi ed as good credit customers. 81 customers with good credit are classifi ed as bad credit customers, and 58 customers with bad credit are classifi ed as good credit customers for testing.

Neural Networks
The most widely used algorithm for neural networks is back propagation (BPN) algorithm (Lee et al. 2006). According to Vellido et al. (1999), more than 75% of business applications using neural networks adopted the BPN algorithm. Based on these facts, we will use the BPN algorithm for credit scoring model. In BPN, data set is splitted into two subsets: a training set of 70% (860), a holdout (testing) set of 30% (370) of the total data (1230) respectively.
According to Lee et al. (2002), any complex system can be modeled by one-hidden-layer network. Determining the optimal number of hidden nodes (neurons) is crucial and complicated. The most commonly used way in determining the number of hidden nodes is via experiments or trial-and-error. In addition to this, equation (3) can be used to determine the number of hidden neurons. In this study, we have used the equa-tion (3), to determine the number of hidden neurons. The number of hidden neurons is determined as thirteen. The convergence criteria used for training are a root-meansquared error (RMSE) less than or equal to 0.0001 or a maximum of 5000 iterations.
The prediction results of the neural networks for training and testing sets are summarized in Table 3. From Table 3, average correct classifi cation rates of training and testing are 78.85% and 61.52%, respectively. For training set, 49 customers with good credit are classifi ed as bad credit customers for training and 139 customers with bad credit are classifi ed as good credit customers. 99 customers with good credit are classifi ed as bad credit customers, and 43 customers with bad credit are classifi ed as good credit customers for testing.

Decision Trees
We use the single classifi cation tree for credit scoring model. We employ the most commonly used decision tree algorithm CART with 1-SE rule in the pruning procedure. CART methods are always preference for the best effective variable to split the node. Therefore, the order of the split node can refl ect the important variable in the credit scoring. The variable, income,  Table 4 shows that average correct classifi cation rates of training and testing are 72.89% and 65.58%, respectively. For training set, 100 customers with good credit are classifi ed as bad credit customers for training and 141 customers with bad credit are classifi ed as good credit customers. 64 customers with good credit are classifi ed as bad credit customers, and 63 customers with bad credit are classifi ed as good credit customers for testing.

Comparison of the credit scoring models
In order to evaluate the overall credit scoring capability of the designed credit scoring models, predicted results of the credit scoring models and the misclassifi cation costs are used. The predictive results can be determined by the average correct classifi cation rate for the testing set. The following Table 5 shows the predictive accuracy of the four credit scoring models.
It is apparent that the misclassifi cation costs associated with Type I error (a customer with good credit is misclassifi ed as a customer with bad credit) and Type II error (a customer with bad credit is misclassifi ed as a customer with good credit) are signifi cantly different. The misclassifi cation costs associated with Type II errors are much higher than those associated with Type I errors. Since the relative ratio of misclassifi cation costs associated with Type I and Type II errors is 1-5 (West, 2000), special attention should be paid to Type II errors of the four constructed models in order to evaluate the overall credit scoring capability. Table 6 shows the Type I and Type II errors of the four models being discussed.
As the results revealed in Table 6, the neural networks model has the lowest Type II error in comparison with the other three approaches. Therefore, we can conclude that the neural networks can successfully reduce the possible risks of extra losses due to high misclassifi cation costs associated with Type II errors.

Conclusions
In this paper, four different techniques have been applied to explore credit scoring and evaluate the bank's credit card policy. Credit scoring has become an important issue as the competition among fi nancial institutions becomes very intense. More and more, fi nancial institutions are seeking better strategies through the help of credit scoring models. Therefore, credit scoring problems are one of the applications that have gained serious attention over the past decades with advances in information technology and modeling techniques. Modeling techniques like traditional statistical analyses and artifi cial intelligence techniques have been developed in order to successfully attack the credit scoring tasks. The purpose of this study is to explore the performance of credit scoring using discriminant analysis, logistic regression, neural networks and classifi cation and regression tree. In order to evaluate the feasibility and effectiveness of these techniques, credit-scoring task is performed on one bank credit card data set. Analytic results demonstrate that CART has better average correct classifi cation rate in comparison with discriminant analysis, logistic regression, and neural networks. On the other hand, neural network credit scoring model has lower Type II errors associated with high misclassifi cation costs and therefore has better overall credit scoring capabilities.