A COST-SENSITIVE LOGISTIC REGRESSION CREDIT SCORING MODEL BASED ON MULTI-OBJECTIVE OPTIMIZATION APPROACH

Credit scoring is an important process for peer-to-peer (P2P) lending companies as it determines whether loan applicants are likely to default. The aim of most credit scoring models is to minimize the classification error rate, which implies that all classification errors bear the same cost; however, in reality, there is a significant cost-sensitive problem in credit scoring methods. Therefore, in this paper, a new cost-sensitive logistic regression credit scoring model based on a multi-objective optimization approach is proposed that has two objectives in the cost-sensitive logistic regression process. The cost-sensitive logistic regression parameters are solved using a multiple objective particle swarm optimization (MOPSO) algorithm. In the empirical analysis, the proposed model was applied to the credit scoring of a Chinese famous P2P company, from which it was found that compared with other common credit scoring models, the proposed model was able to effectively reduce type II error rates and total classification error costs, and improve the AUC, the F1 values (reconciliation average of Recall and Precision), and the G-means. The proposed model was compared with other multi-objective optimization algorithms to further demonstrate that MOPSO is the best approach for cost-sensitive logistic regression credit scoring models.


Introduction
Due to the prevalence of financial frictions and credit constraints, it is difficult for individuals and SME (small to medium-sized enterprise) to obtain loans from banks (Rashid & Jabeen, 2018;Abraham, 2018). The peer-to-peer (P2P) lending market began to develop as an emerging lending channel. P2P lending market is a platform based portal that allows individual lenders to come into contact with borrowers seeking loans, for which the lender assumes full risk. P2P lending has become popular because of the reduced financing costs (Guo, Zhou, Luo, Liu, & Xiong, 2016). In the P2P lending market, the borrower submits a loan applica-tion through the website platform, and the lender decides whether or not to lend the amount requested based on the relevant borrower information (Zhu, Li, Wu, Wang, & Liang, 2013). However, relaxed regulation has caused many credit risks to society as well as increased losses for investors (Chen, Li, Wu, & Luo, 2017). Therefore, as predicting whether a borrower is able to repay the loan in the requested or granted time frame is extremely important, algorithmic online credit scoring has become a vital tool for P2P lending companies (Verbraken, Bravo, Weber, & Baesens, 2014). In credit scoring models, the dependent variable is dichotomous, with a default being assigned a "0", and a successful loan being assigned a "1" (Serrano-Cinca & Gutiérrez-Nieto, 2016). Credit scoring models estimate default probabilities (PD) based on applicant credit histories so that lending institutions can determine whether to approve or reject the loan application (Bequé, Coussement, Gayler, & Lessmann, 2017).
Traditional credit scoring models build a series of statistical models based on the premise that different classification errors have a consistent cost using classifiers such as logistic regression (LR) (Wiginton, 1980), support vector machines (SVM) (Min & Lee, 2005), decision trees (DT) (Huysmans, Dejaeger, Mues, Vanthienen, & Baesens, 2011), neural networks (NN) (Khashman, 2010), or ensemble approaches such as bagging (Yu, Yue, Wang, & Lai, 2010), and boosting (Wang, Ma, Huang, & Xu, 2012). However, in the credit scoring process, cost-sensitivity is more important as the cost associated with approving an application for someone who defaults on the loan is far greater than the cost associated with rejecting an application from a customer who may have successfully repaid the loan (Kao, C. C. Chiu, & F. Y. Chiu, 2012).
Therefore, cost-sensitive learning is a relatively new machine learning approach that trains classifiers to recognize the different costs associated with different classification errors. As shown in Figure 1, the orange circle indicates loan applicants who may default in the future and the green circle indicates applicants who may not default in the future. If it is assumed that the cost of misjudging a non-defaulter as a defaulter is 1, and the cost of judging a defaulter as a non-defaulter is 5, in Figure 1 left, there is a traditional classifier that minimizes this error rate; that is, the classification error cost of judging a defaulter as a non-defaulter is 5. The right side is the cost-sensitive learner that minimizes the cost of the classification errors by judging both borrowers as possible defaulters so that the resulting classification error cost is 1 + 1 = 2. Therefore, the cost-sensitive classifier has less total cost and is more in line with the actual credit scoring situation (Günnemann & Pfeffer, 2017). Cost-sensitive learning algorithms can be divided into direct and indirect cost-sensitive methods (Ling & Sheng, 2011). The direct method builds a cost-sensitive learning algorithm by directly introducing different misclassification costs into the learning process, and the indirect cost-sensitive approach converts the cost-insensitive classifiers into cost-sensitive classifiers by preprocessing the training data through undersampling or oversampling or through postprocessing such as thresholding, which changes the cutoff value into a positive or negative when classifying a sample (Xia, C. Liu, & N. Liu, 2017). García, Marqués, and Sánchez (2012) examined whether resampling methods could improve classifier performance and found that for any classifier, the classifier performance after resampling was better than using unbalanced data. Marqués, García, and Sánchez (2013) then compared this credit scoring resampling method and found that it was able to balance cost-sensitive data and that using an oversampling method was better than using an undersampling method. As indirect cost-sensitive methods are only modeled at the data level and are therefore not studied with the model itself, direct cost-sensitive methods may have more value. Therefore, the proposed method in this paper is based on a direct cost-sensitive method.
In a direct cost-sensitive approach, J. Kim, Choi, G. Kim, and Suh (2012) compared a traditional classifier with a cost-sensitive decision tree and found that the lowest classification cost was incurred when a MetaCost approach was used and when the non-fraud data and fraud data were balanced. Bahnsen, Aouada, and Ottersten (2015) constructed an example-dependent cost-sensitive decision tree with a novel impurity measure and pruning criteria, which was found to outperform baseline models in terms of cost savings and training time. Of the various direct methods, cost-sensitive decision trees have been the most popular method because of their simple operability; however, as cost-sensitive decision trees are sensitive to data patterns, any small change in the training set can result in a completely different tree and a significant change in the predictions (James, Witten, Hastie, & Tibshirani, 2013). Logistic regression, however, is not sensitive to data, is suitable for continuous variables, and the regression results contain the parameters for each variable, with the size and symbol of the parameter indicating whether and how the variable affects the loan applicant's default probability; therefore, logistic regression is very explanatory and practical (Bequé et al., 2017).
Even though logistic regression has been found to have significant interpretability advantages, there have been few studies focused on the use of cost-sensitive logistic regression. Bahnsen, Aouada, and Ottersten (2014) developed an example-dependent cost-sensitive logistic regression model using two publicly available datasets, which achieved high-cost savings compared to the benchmarks; however, the loss function was linear, which caused weak differentiation between the false and correctly classified instances. Günnemann and Pfeffer (2017) introduced a new cost-sensitive prediction model based on a nonlinear loss function, which extended logistic regression and allowed for the different costs of misclassified instances, thereby obtaining prediction results with an overall lower cost. However, most of these cost-sensitive models have been based on a single optimization criterion. Therefore, in order to ensure the excellent performance of the classification model while identifying the defaulters who have the higher costs as accurately as possible, this paper seeks to synthesize the two goals to construct a multi-objective optimization logistic regression model based on a maximum AUC and a minimum total classification cost, and then solves the proposed model using a MOPSO evolutionary algorithm.
In this paper, a logarithmic loss function and the AUC objective function are used to optimize the logistic regression parameters. The two objective functions are embedded in the MOPSO optimization algorithm to determine the optimal parameter set. This paper aims to construct a direct cost-sensitive logistic regression method based on multi-objectives to obtain a model that minimizes the total cost and maximizes the AUC. As the constructed model is better able to identify the sample categories with higher losses, it can therefore assist lenders identify possible defaulters, reduce default losses, and increase profits. The remainder of this paper is structured as follows. Section 1 introduces the preliminary knowledge used in this paper; logistic regression, the cost matrix, the AUC, and the multi-objective optimization algorithm. In Section 2, the multi-objective optimization cost-sensitive logistic regression is constructed, and in Section 3, the proposed model and algorithm are applied to a P2P dataset to evaluate its performance, and the results are compared with single-objective optimization and other multi-objective optimization algorithms to verify its effectiveness. Finally, conclusions and further research recommendations are given in the last section.

Preliminary knowledge
This section gives a brief introduction to the theoretical knowledge used in the following sections; the theory of logistic regression, the derivation of the AUC, and the meaning of the cost matrix.

Logistic regression
Logistic regression is a classification model that estimates the posterior probability of a positive class in a specific binary classification context. Logistic regression is one of the most widely used statistical models for deriving classification algorithms (Abdou, Tsafack, Ntim, & Baker, 2016). Given an instance X i , the estimated probability of the positive class is evaluated as ( ) = respectively for the negative class occurrence. Here, , f g x b is defined as the sigmoid function, which is known as the logistic (Desai, Crook, & Overstreet Jr, 1996): ∑ is a linear function of the logistic regression parameters b and the explanatory variables, and 0 1 i p ≤ ≤ refers to the hypothesis of i given the parameters q. The constructed logistic regression model is solved to ensure a consistent estimate with a true value, and is usually determined using the maximum likelihood estimation method (MLE), which assumes that each sample in the dataset is independent. When the true value y i = 1, p i represents the probability that the sample is predicted to be 1; that is, the bigger the p i , the smaller the difference between the predicted value and the true value. When the true value y i = 0, 1 -p i represents the probability that the sample is predicted to be 0; that is, the bigger the 1 -p i , the smaller the difference between the predicted value and the true value. The MLE principle ensures that the difference between the predicted value and the true value is the smallest; that is, when y i = 1, p i is the largest and when y i = 0, 1 -p i is the largest. The likelihood function for MLE is derived as follows (Günnemann & Pfeffer, 2017).
As the b in logistic regression is obtained by maximizing ( ) , L x b , the maximum likelihood function solves the parameter by maximizing the difference between the predicted value and the true value; therefore, there is no cost-sensitive thinking in the MLE.

AUC
In traditional credit scoring models, the AUC (area under the receiver operating characteristic curve) is an effective measure for evaluating model performance. This curve is an ROC curve (Receiver Operating Characteristic). To predict the default probability of applicants ( 1| ) , traditional credit scoring models construct a suitable classifier ( ) i h x based on a historical dataset of the loan applicants, for which an appropriate threshold t is chosen, typically 0.5; when p i < t, ( ) 1 i h x = , the applicant's loan is approved, and when p i > t, ( ) 0 i h x = , the applicant's loan is declined. However, threshold t selection greatly affects the evaluation index value (Ala'raj & Abbod, 2016). However, as the calculation of the AUC does not need to turn the prediction probabilities into categories, it is more useful for evaluating model performance.
There is a need to evaluate the classifier ( ) i h x performance after it is estimated. In practice, AUC has been widely used to assess the performance of a credit scoring model. In a dichotomous problem, samples have two categories; positive and negative. When a classifier predicts samples, positive samples can be predicted as negative samples or positive samples, and negative samples can be predicted as negative samples or positive samples; that is, there are four circumstances. A sample that is truly positive and is also predicted to be positive is called a true positive (TP), a sample that is truly positive but predicted to be negative is called a false negative (FN), a sample that is truly negative and also predicted to be negative is called true negative (TN), and a sample that is truly negative but predicted to be positive is called a false positive (FP). These four cases constitute the classification matrix, as shown in Table 1. .
The ROC curve is based on the false positive rate on the horizontal axis and the true positive rate on the vertical axis. First, the classification threshold is adjusted to the maximum, at which time the true and false positive rates are both 0, after which the classification threshold is set as the predicted value of each example in sequence so that each predicted value corresponds to a set of true positive rates and false positive rates. The ROC curve is obtained by plotting these true positive rates and false positive rates on a graph, with the area under the ROC curve being the AUC; the larger the AUC, the better the model performance.
This article uses the following formula to calculate the AUC (Fawcett, 2006): where M is the number of positive samples and N is the number of negative samples, and rank i is the ranking of the forecast probability of the ith positive sample.

Cost matrix
In a cost-sensitive learning method, the different categories have the different classification error cost, and the cost-sensitive learning model performance evaluation is not limited to the AUC. The cost matrix is used to express the different classification error costs. As classification errors have different costs, different weights are assigned to different classifications, as shown in Table 2. The weights that classify a positive class as a positive class are c TP , the weights that classify a positive class as a negative class are c FN , the weights that classify a negative class as a positive class are c FP , and the weights that classify a negative class as a negative class are c TN (Günnemann & Pfeffer, 2017).
In general, only the cost of misclassification is considered; that is c TP = c TN = 0. Therefore, for the classifier ( ) i h x , the expected cost based on this cost matrix is is the predicted category for instance i, with n being the number of instances.

Multi-objective particle swarm optimization
In a single-objective optimization algorithm, the particle swarm optimization (PSO) algorithm, which is a novel optimization algorithm inspired by birds swarming, provides high-speed convergence. Compared to other evolutionary computing models, PSO is more popular (Kou, Chao, Peng, Alsaadi, & Herrera-Viedma, 2019). The multi-objective Particle Swarm Optimization (MOPSO) was developed based on the PSO. However, when applying a single-objective PSO algorithm to a multi-objective optimization problem, there are several problems: (1) selecting the leader particles for the updating process, (2) maintaining the non-dominated solution to obtain the optimal Pareto front, and (3) preventing the particles from converging to a local optimum and maintaining group diversity (Ding, Chen, Xin, & Pardalos, 2018). To resolve these three problems for multi-objective optimization problems, each particle may have a set of different leaders from which just one can be selected to update its position. As these leader sets are usually stored in a different place from the swarm, these are called an external archive, which is a repository in which the non-dominated solutions found so far are stored. The solutions in the external archive are used as the leaders when the positions of the swarm particles have to be updated, and the external archive contents are also usually reported as the final algorithmic output (Reyes-Sierra & Coello, 2006); therefore, updating the external archives is a core issue for the MOPSO.
The main advantage of single-target PSO is the use of ε-dominance (Coello, Pulido, & Lechuga, 2004). In MOPSO, however, this method is primarily used to filter and update the external archives, as shown in Figure 2. In an optimization problem in which the objective function minimizes both f1 and f2, the particles in the external archives are analyzed. First, particle 7 is inferior to all other particles because box (2ε, 3ε) is dominated by box (2ε, 2ε). Particle 1 dominates Particle 2 because it is closer to the lower left corner of the point (ε, 3ε). The same particle 3 is superior to particle 4, and particle 5 is superior to particle 6. Therefore, after each iteration, the external archive is filtered according to the ε-dominance method. After the iteration, the optimized solution is randomly selected from among all externally archived particles.
The pseudo code for the MOPSO algorithm is shown in Figure 3 (Reyes-Sierra & Coello, 2006). x y x y x y  The number of particle group max; Training rounds N. Initialize swarm Initialize leader in an external archive Quality (leaders) Process: 1: for t = 1, 2, ..., N, do 2: for i = 1, 2, ..., max do 3: Select leader 4: Update Position 5: Mutation 6: Update pbest 9: end for 10: Update leaders in the external archive 11: Quality (leaders) 12: end for Output: the optimal particle is randomly selected from the repository.

Proposed credit scoring model
The essence of the credit scoring model is the data mining and machine learning of large quantities of historical information to ensure the most accurate prediction of the loan applicant default probabilities. After the prediction result is obtained, the prediction value is compared with the true value to evaluate the performance of the credit scoring model using the AUC. The AUC indicator does not need to set the threshold in advance, which avoids the impact of the threshold on the classification result. It is the comprehensive indicator used to evaluate model performance. Therefore, this paper uses the AUC as one of the model optimization goals.
However, one of the AUC indicator assumptions is that different classification errors have consistent costs. This is unreasonable in some realistic credit scoring questions. In the credit scoring question, the cost of recognizing a possible defaulter as a non-defaulter is greater than recognizing a non-defaulter as a defaulter. If only the AUC is used as the optimization goal of the model, the proposed model has the same ability to recognize the defaulters and the non-defaulters, which would increase the total classification error cost of the model. However, in the realistic credit business, the total cost is closely related to the investor's revenue. In order to reduce the total classification error cost, the model would have a better ability to identify defaulters. Therefore, while considering the comprehensive performance of the classifier (evaluated in AUC), we cannot ignore the total classification error cost. Therefore, this paper uses the total classification error cost as one of the optimization goals. This paper describes how to construct a cost-sensitive loss function in the following sections.
From the introduction in section 2.1, logistic regression uses a maximum likelihood function to solve the parameter: The logarithm of the above formula can be obtained: If the deviation of the predicted value from the true value is defined as the loss, the parameter can also be solved from a loss minimization perspective. Then, the problem is to find the optimal parameters by minimizing the given cost function. In logistic regression, the cost function J(q) usually refers to the negative logarithm of the likelihood; therefore, maximizing ( ) Function J(q) constructed in this way then becomes a logarithmic loss function. The characteristic of J(q) is that when the actual sample y i = 1 and the predicted probability p i = 1, J(q) = 0; however, J(q) increases as p i decreases, as shown in Figure 4. If the threshold selected is 0.5, when p i is greater than 0.5, the classification cost is F TP , and when it is less than 0.5, the classification error cost is F FN . When the actual sample y i = 0 and the predicted probability p i = 0, J(q) = 0; however, J(q) increases as p i increases, as shown in Figure 5. If the threshold selected is 0.5, when p i is less than 0.5, the classification cost is F TN , and when it is greater than 0.5, the classification error cost is F FP .
If the threshold is not determined in advance, the logarithmic loss function can effectively measure the deviation between the predicted value and the true value; that is, the greater the deviation, the greater the loss function value. However, the above logarithmic loss function does not reflect cost-sensitive thinking. − is the cost of classifying the negative category into the positive category, but it can be seen that the integrals of the two curves are the same. Therefore, the cost function indicates that the different classification errors costs are the same and needs to be improved. The different classification error costs need to be given different weights.
The cost matrix shows that the cost of classifying a positive category into a negative category is c FN , and the cost of classifying a negative category into a positive category is c FP . For the loss function J(q), is the cost of classifying a positive category into a negative category and ( ) ( ) is the cost of classifying a negative category into a positive category. Therefore, the cost-sensitive loss function for the logistic regression is: This method is equivalent to directly increasing the area under the loss function to increase the cost of the classification error; that is, instances with higher costs receive a higher average loss value. As the smaller the value of ( ) i Cost q , the smaller the classification error cost, and the greater the profit, the cost-sensitive loss function has practical business significance. Therefore, minimizing the cost-sensitive loss function is taken as one of the optimization goals in this paper when constructing the multi-objective optimization cost-sensitive logistic regression. Therefore, a cost-sensitive loss function is constructed based on the cost matrix and the logarithmic loss function, with the maximization of the AUC and the minimization of the cost-sensitive loss function being the two goals for determining the parameters for the proposed model.
. 1 The multi-objective optimization algorithm MOPSO is then used to solve the proposed model parameters. As mentioned, MOPSO uses an ε-dominance method to update the external archive when searching for the optimal parameter set. In the proposed model, the horizontal axis of the external archive is the total costs of the classification errors, and the vertical axis is the inverse AUC value. During each iteration, the mesh closest to the origin in the entire mesh and the particles closest to the lower left corner in each mesh are selected and stored in the external archive. The position and velocity of each particle are updated using the formula before proceeding to the subsequent iteration, after which the particles in the external archive are output, from which the optimal solution is randomly selected.
The algorithmic steps for solving the multi-objective optimization cost-sensitive logistic regression parameter values using MOPSO are as follows ( Figure 6): x y x y x y  The range of parameters of the proposed model; The number of parameter group max; Training rounds N.

Dataset description
The credit data set used in this paper was collected from a well-known P2P lending platform in China, for which the definition of a default loan is when the repayment is at least three times overdue (Greene, 1998); otherwise, the loans are classified as non-default. After delet-ing the credit target data, 5000 defaults and 10000 non-default loans were randomly selected as the experimental data for this paper; the default rate was about 33.33% (5000/15000). Each loan instance in the credit data had 22 variables, which included 21 observation variables and the loan status labels. The input variables for each sample included the borrower's basic information, work information, credit certificate information, and asset information. Because there are decision trees and Bayes in the selected model, the selected variables were discretized in this paper. The definition for each variable is given in Table 3. Company size size 1 = less than 10 persons, 2 = 10−100, 3 = 100−500, 4 = more than 500 Working time length 1 = less than 1 year, 2 = 1−3 years, 3 = 3−5 years, 4 = more than 5 years The descriptive statistical analysis of these variables is shown in Table 4.

Expected loss
The cost matrix for the research dataset was developed before credit scoring. The cost matrix can describe the default risk in the lending process (Nayak & Misra, 2018). If credit agencies divide non-defaulters into possibly loan defaulters, it is possible that they could lose loan interest; however, if they divide possible loan defaulters into non-defaulters, they could lose both loan interest and principal, which is usually measured as the expected loss (EL) and is calculated as shown in the following formula (Altman & Sabato, 2005;Thomas, 2010): where PD is the possibility of a loan applicant defaulting, and EAD is the exposure at default or the total amount owed at default and is expressed in monetary units, and LGD is the loss given default, which indicates the borrower's financial losses after default and is expressed as a percentage.

Recoveries costs
The loan amount and loan interest for a non-defaulter in the training set are used to calculate the possible interest income that could be lost when a non-defaulting applicant is judged as a possible defaulter. When the model recognizes that the non-defaulter as a defaulter, the interest income would be lost, corresponding to the value of c FN in the cost matrix. The expected loss (EL) is calculated using the total default amount, the recovery amount, and the recovery costs of a defaulter sample in the training set. When the model recognizes that the defaulter as a non-defaulter, the loss is calculated from the EL, corresponding to the value of c FP in the cost matrix. As the value of the EL is 3.22 times the interest loss, the cost of recognizing a possible defaulter as a non-defaulter is 3.22 times greater than recognizing a non-defaulter as a possible defaulter in the research dataset.

Experimental design
K-fold Cross-Validation was used to train the training set and ensure the stability of the experimental results as it makes full use of the dataset to test the algorithmic effect, thereby avoiding over-fitting. The basic idea is to randomly divide the dataset into k parts, one of which is used as a test set and the remaining k-1 used as the training set. The experimental process is shown in Figure 7.
This paper used a 10-fold cross-validation to train the model; that is, the dataset was divided into 10 parts, 10 tests were performed, and the average result of the ten tests used to evaluate the model performance.

Experimental results
The empirical results were comprehensively evaluated using type I error rate, type II error rate, F value, G-mean, accuracy, AUC, and total cost (TC) indicators. The definitions and calculation formulas for the above indicators are described in the following.
The type I error rate indicates the ratio of false negatives to all positive examples (Tsai, 2009).
type I error rate .
The type II error rate indicates the ratio of false positives to all counterexamples (Tsai, 2009 The F1 indicator is a harmonic mean with Recall (also known as TPR) and Precision, which is presented as Eq. (19). The Recall and Precision are respectively calculated using Eq. (17) and Eq. (18) (Baldi, Brunak, Chauvin, Andersen, & Nielsen, 2000).

Recall Precision
The G-Mean (Geometric mean) indicator is a comprehensive evaluation method for imbalance dataset (Shen, Zhao, Z. Li, K. Li, & Meng, 2019), and is determined using Eq. (20). A higher G-Mean indicates that the balance between the classes is reasonable and has good performance in the binary classification model.
From the collected data, 15000 samples were selected for training and prediction, and a 10-fold Cross-Validation used to prevent model overfitting. To assess the performance of the proposed multi-objective optimization cost-sensitive logistic regression, it was compared with other explanatory models; traditional logistic regression, decision tree, linear discriminant, Bayes, SVM, Bagging, Adaboost, cost-sensitive decision tree (CS-DT). The experimental results are shown in Table 5 and Table 6. As can be seen from the experimental results, compared to the other classifiers, the proposed model had the smallest total classification error costs and a large AUC. Figures 8 and  9 show the box plots for the total costs and AUC obtained from the 10-fold cross-validation by the proposed model and the comparative models. The boxplots have six data nodes, which arrange the data from the largest to the smallest and then calculate the maximum, the upper quartile, the median, the lower quartile, the minimum, and the outliers; if there are no outliers, there are only five data nodes. As can be seen from Figures 8 and 9, the total cost distribution for the proposed model was lower in all experiments with the 10-fold cross-validation, and the maximum total cost in all experiments was lower than the minimum total cost of the other classifiers. Therefore, the proposed model performance clearly demonstrated that it was effective in dealing with cost-sensitive issues. At the same time, the proposed model in this paper was found to have a large AUC, with the value being only lower than the SVM and Adaboost models. As the logistic regression used in this paper was a simple classifier and the data dimension was small, it was acceptable that the AUC for the proposed model was slightly lower than these two models.
To further verify that the AUC and cost indicators were better than the other comparison models, a paired t-test was performed, the results for which are shown in Table 6. As can be seen, the AUC for the proposed model was significantly higher than the Decision Tree, Bayes, Bagging, and the CS-DT, and the total cost was significantly lower than all other comparison models; therefore, the proposed model was found to be effective in dealing with cost-sensitive issues.
As can be seen from the Table 5, the empirical results showed that the type II error rate for the proposed model was significantly lower, which indicated that the model was better able to identify possible future loan defaulters. As this article defined non-defaulters as 1, FP indicated that the defaulters were classified as non-defaulters; therefore, Precision TP TP FP = + was able to measure the ability and better identify defaulters. As can be seen from the Table 5, except for the Bayes model, the precision of the proposed model in this paper was the highest. However, the F1, G-mean, and Recall of Bayes were low, indicating that while the model developed in this article was better at identifying the defaulters, the ability to identify non-defaulters was not reduced. Figure 10 shows about 100 defaulted samples that the proposed model sample classifications were correct but that the traditional logistic regression classifications were incorrect. The black dots show the traditional logistic regression prediction results and the blue dots show the proposed model prediction results. The true label for these samples was 0 (loan defaulters). When traditional logistic regression was used to classify these samples, as the prediction probability was greater than the 0.5 threshold, these samples were judged as non-defaulters. However, when the proposed model was used to classify these samples, the prediction probability was less than 0.5, and the samples were judged as probable future loan defaulters. Therefore, the proposed model classifications were correct. If the Figure 10. Some examples of the prediction probabilities for the two models Proposed model credit scoring model can better identify loan defaulters, then credit institutions would be able to reduce their default loans and improve their overall profit performances. It can be seen from the F value, G-mean, and accuracy that the proposed model had a large G-mean and accuracy and a large F1 value, as shown in the line graph in Figure 11. As can be seen, the G-mean value of the proposed model is slightly lower than SVM and Adaboost. As the G-mean is a key indicator for the evaluation of unbalanced data performance, a high G-mean indicates that the model performed better on the unbalanced data sets.

Discussion and Robust test
This section first compares the results of the proposed multi-objective model with the results of the single goal, which the PSO method is used to estimate the logistic regression parameters based on minimum cost or maximum AUC. The comparative experimental results are shown in Table 7, where PSO(Cost) represents the experimental results of the PSO optimization based on the minimum cost, and PSO(AUC) represents the experimental results of the PSO optimization based on the maximum AUC.  It can be seen from the experimental results that the results based on multi-objective optimization had the lowest total cost compared to the single-objective optimization. Although the AUC based on the multi-objective optimization was significantly lower than the result of the AUC-based single-objective optimization by 3‰, the total cost based on the multiobjective optimization was less than half. The box plots of the corresponding comparisons are shown in Figures 12 and 13. As can be seen from Figure 12, the total cost based on the multi-objective optimization was the lowest and had the most stable distribution. At the same time, it can be seen from the results of the significance test that the cost of the multi-objective optimization model proposed in this paper is significantly lower than that of single-objective optimization. Therefore, it can be seen from the above analysis that the model had a better practical effect based on the multi-objective optimization than on the single-objective optimization. When using the credit scoring model to decide whether to issue a loan in practice, it is necessary to pay greater attention to the total cost indicator. The multi-objective optimization results of the MOPSO were also compared with the NS-GA-II (Deb, Agrawal, Pratap, & Meyarivan, 2000), PAES (Knowles & Corne, 2000), and mi-croGA (Coello & Pulido, 2001), they are common multi-objective optimization algorithms. The experimental results of performance comparison are shown in Table 8.
It can be seen from the experimental results that the results based on MOPSO had the largest AUC, while the total cost is significantly lower than the other multi-objective optimization methods. Therefore, it can be seen from the above indicators that determining the optimal solution based on MOPSO produced the best results for the constructed multiobjective credit scoring model.
As this paper's focus was on the total cost and AUC indicators, this section uses the box plot to show the results of the 10-fold cross-validation. The total cost and AUC box plots for the MOPSO and the other multi-objective optimization methods are shown in Figures 14  and 15. As can be seen from Figure 14, the total cost based on the MOPSO was the lowest and had the most stable distribution. Figure 15 shows that the MOPSO optimization also performed better than the other three multi-objective optimization algorithms on AUC.
Further, to verify the robustness of the proposed model, this paper used a dataset from a "lending club" website for the empirical analysis. Lending club data with a credit period of three years were downloaded from the associated website in 2015. After missing value processing and variable selection, 420,000 data points and 17 characteristic variables were obtained, after which 20000 samples; 10000 non-defaulters and 10000 defaulters; were randomly extracted for the empirical research. The empirical results are shown in Table 9.
From the experimental results in Table 9, it can be seen that the AUC for the proposed model in this paper was significantly higher than the decision tree, Bayes, SVM, Bagging, and CS-DT, and there was no significant difference between the LDA and Adaboost. The proposed model was therefore found to have a better classification performance. From a total cost perspective, the total cost for the model constructed in this paper was significantly lower than all other comparison models; therefore, the model proposed in this paper had better classification results for the lending club dataset.    Figures 16 and 17 show the results for the ten-fold cross-validation, from which it can be seen that the AUC indicator was on a higher level, the total cost indicator was on the lowest level, and the distribution was the most stable. Therefore, the proposed model in this paper was shown to have a certain robustness.

Conclusion and future work
Credit scoring is an important process for P2P lending companies as it determines the probability of loan applicant defaults. Traditional credit scoring models only focus on classification error rates and the different classification errors are assumed to attract the same costs. However, this is unrealistic, there is an inherent cost-sensitive problem in the credit scoring process. Therefore, in this paper, a new cost-sensitive logistic regression credit scoring model based on multi-objective optimization approach was proposed, in which there were two objectives in the logistic regression: maximizing the AUC and minimizing the total classification error costs. An empirical analysis applied the proposed model to the credit scoring of a Chinese famous P2P company, from which it was found that compared to traditional credit scoring models, the proposed model was able to significantly reduce the type II error rate and the total classification error costs. This paper then compared the proposed model with singleobjective optimization methods and other multi-objective optimization approaches, which clearly showed that the MOPSO was the best multi-objective optimization approach for costsensitive logistic regression credit scoring model. In future research, we plan to integrate cost-sensitive problems into other base classifiers to develop other direct cost-sensitive credit scoring models. Ensemble classifiers have become more popular in recent years because of the unstable and inaccurate results from individual classifiers. We will also incorporate costsensitivity into an ensemble model to develop a cost-sensitive ensemble learning approach.