COMBINING B&B-BASED HYBRID FEATURE SELECTION AND THE IMBALANCE-ORIENTED MULTIPLE-CLASSIFIER ENSEMBLE FOR IMBALANCED CREDIT RISK ASSESSMENT

. An ideal model for credit risk assessment is supposed to select important features and process imbalanced data sets in an effective manner. This paper proposes an integrated method that combines B&B (branch and bound)-based hybrid feature selection (BBHFS) with the imbalance-oriented multiple-classifier ensemble (IOMCE) for imbalanced credit risk assessment and uses the support vector machine (SVM) and the multiple discriminant analysis (MDA) as the base predictor. BBHFS is a hybrid feature selection method that integrates the t-test and B&B with the k -fold cross-validation method to search for a satisfactory feature subset. The IOMCE divides majority samples into several subsets and then combines them with minority samples to construct several training sets for constructing a multiple-classifier ensemble model. We conduct main experiments using a 1:3 imbalanced corporate credit risk data set with continuous features and extended experiments using a 1:5 imbalanced data set with continuous features and a 1:3 imbalanced data set with discrete and nominal features. We combine no feature selection and five feature selection methods (the pure B&B, the factor analysis, the pure t-test, t-test & correlation analysis, and BBHFS) with single-classifier and the IOMCE to construct SVM and MDA models for an empirical comparison. When all features are continuous, the BBHFS-IOMCE method generally outperforms all the other methods. More specifically, BBHFS provides more stable and satisfactory results than the other feature selection methods, and compared with single-classifier models, IOMCE models can significantly enhance the recognition rate for minority samples while incurring a small reduction in the recognition rate for majority samples and maintaining an acceptable overall accuracy. When the features are almost discrete or nominal, the IOMCE method retains its ability to deal with an imbalanced data set, although the five feature selection methods have no significant advantages over no feature selection. This suggests that BBHFS is effective in retaining useful information when reducing the dimensionality of continuous features and that the BBHFS-IOMCE method is an important tool for imbalanced credit risk assessment.


Introduction
To prevent new financial crises derived from credit crises, commercial banks as well as domain researchers place great emphasis on risk management, particularly credit risk management. In this regard, the assessment of credit risk is a particularly important issue. Research on credit risk assessment has received considerable attention from both scholars and practitioners, who typically employ a wide range of traditional statistical methods as well as new data-mining and artificial intelligence methods for credit risk modeling (Lahsasna et al. 2010) because the use of various techniques is generally expected to produce better modeling performance.
This study addresses three issues that have received little attention from previous research on credit risk assessment: the hybrid optimization of feature selection, the optimization of the feature extraction rate, and imbalanced modeling. The feature selection process entails the identification of an optimal feature subset based on a given data set with multiple features. Feature selection is used to increase model performance, reduce dimensionality, accelerate the speed of modeling, and reduce the computing time and cost. An increase in the size of a bank credit data set is likely to increase the amount of irrelevant information in the data set, and therefore feature selection can play an important role in credit decision modeling. However, previous research on credit risk modeling has focused mainly on the innovation of modeling algorithms (e.g. Zhou et al. 2010a;Wang et al. 2011) or the application of straightforward feature selection methods and thus paid little attention to performance improvements in modeling through feature selection. In addition, performances of some feature selection methods tend to vary with different feature extraction rates. In this regard, an important issue in the feature selection process is the determination of a suitable feature extraction rate for better classifier performance. More importantly, a credit data set is typically imbalanced, which implies that the number of samples with good credit often exceeds that of samples with bad credit. In this case, the results for the overall accuracy of a credit risk model trained and tested using imbalanced data sets may overestimate its performance. A high recognition rate for samples from the majority group may increase the model's overall recognition rate, but the recognition rate may remain low for those samples from the minority group. Therefore, it is necessary to address imbalanced data set to improve the ability of the credit decision model to identify the minority samples more accurately without incurring a significant decrease in its ability to identify the majority samples.
Based on the above discussion, this study proposes the BBHFS-IOMCE credit risk modeling approach by combining the B&B (branch and bound)-based hybrid feature selection (BBHFS) method with the imbalance-oriented multiple-classifier ensemble (IOMCE). The study employs the multiple-classifier ensemble to deal with imbalanced data sets. In addition, the study uses the support vector machine (SVM) and the multiple discriminant analysis (MDA) as the base predictor because the SVM is verified to be one of the most efficient models for assessing credit risk (Yu et al. 2010Tseng et al. 2011;Bellotti et al. 2011;Xie et al. 2011;Xu et al. 2011;Liu et al. 2011;Wu et al. 2010;Lahsasna et al. 2010) and the MDA is a classical credit risk assessment method (Eisenbeis 1997). By combining the t-test and B&B, this study proposes a hybrid feature selection method for determining an optimal feature extraction rate driven by data, not by some subjective determination. The study provides a comparison of credit risk assessment performance between BBHFS and the other three feature selection methods as well as no feature selection and between the IOMCE and the single classifier through experiments with three data sets.
The rest of this paper is organized as follows: Section 1 outlines the theoretical background. Section 2 presents the BBHFS-IOMCE methodology, and Section 3 describes the empirical experiments for the assessment of corporate credit risk. Section 4 provides extended experiments using two additional data sets.

Previous research
This study classifies existing credit risk assessment models into three categories: credit risk assessment based on expert scoring that based on capital market theory or information economics, and that through multivariate classification analysis. Credit risk assessment based on expert scoring focuses mainly on evaluation elements and methods. Because this type of assessment embodies the idea of qualitative decision making, it facilitates subjective credit decisions, and decision outcomes are determined mainly by experts' experience and knowledge. Previous studies (e.g. Stiglitz, Wesiss 1981;Bester, Sreening 1985;Williamson 1986;Besanko, Thakor 1987;Hillier, Lbrahimo 1993) have extended the literature on credit risk assessment to include that based on capital market theory or information economics. This type of modeling is based on rigorous mathematical theories and the principles of information economics and reflects a better theoretical foundation. However, it cannot directly generate credit risk assessment results. Credit risk assessment through a multivariate classification analysis is the most effective and widely used method in the field of credit management. Eisenbeis (1997) employs the MDA for credit risk modeling, and Laitinen (1999) and Boguslauskas et al. (2011) verify the effectiveness of the logistic regression analysis in credit risk modeling. Finlay (2008) takes a utility approach to predict continuous financial measures such as contributions to profits in the context of the credit-scoring problem. Bellotti and Crook (2009) investigate the use of the survival analysis with macroeconomic variables for credit scoring. With the development of information sciences and decision sciences, previous studies have proposed some new techniques that overcome the limitations of traditional statistical models, including the fuzzy k-nearest neighbor algorithm (Laha 2007), neural networks (Khashman 2010;Angelini et al. 2008;Tsai, Wu 2008), the SVM (Chen et al. 2007;Huang 2009), rule extraction techniques based on neural networks (Baesens et al. 2003;Martens et al. 2007), ant colony optimization (Martens et al. 2010), and ensemble approaches (West et al. 2005;Paleologo et al. 2010), among others.
However, although the real-world problem of credit risk assessment is based on imbalanced data sets, the above studies have focused mainly on balanced data sets. In an imbalanced data set, samples with good credit generally exceed those with bad credit. Kennedy et al. (2009) compare the performance of a variety of one-class and two-class classifiers for imbalanced credit scoring, and find that the performance of the two-class classifiers falls off as the rate of class imbalance increases and one-class classifiers outperform two-class classifiers when the data sets are extremely imbalanced with less than 2% default samples. Brown and Mues (2012) make an experimental comparison of classification algorithms such as logistic regression, MDA, neural networks, decision tree, gradient boosting, least square SVM, and random forest for imbalanced credit scoring data sets. They find that the random forest and gradient boosting classifiers perform better in an imbalanced credit scoring context, followed by the logistic regression and linear MDA. Resampling is one of the most widely used methods for dealing with class imbalance in the two-class classification problem. Garcia et al. (2012) empirically show that over-sampling the minority class consistently outperforms under-sampling the majority class when data sets are strongly imbalanced, whereas there are not significant differences for databases with a low imbalance. These studies provide some pioneering evidence for the problem of imbalance-oriented credit risk assessment modeling. However, it is still necessary to make a further study on some new and effective imbalance-oriented modeling methods for credit risk assessment.
In general, objects in data mining have many features, and some noisy and redundant features often occur, increasing the computing cost and the likelihood of misleading results. Guyon, Elisseeff (2003) provide an in-depth analysis on various variables and suggest that the feature selection method has many advantages such as facilitating the visualization and understanding of data, reducing measurement and storage requirements, reducing the training time, and improving prediction performance. However, previous studies of credit risk modeling have generally emphasized the innovation of modeling algorithms. For example, Yu et al. (2008Yu et al. ( , 2009Yu et al. ( , 2010 and Zhou et al. (2010a), employ a multistage neural network ensemble learning approach, the fuzzy group decision making model, the least squares SVM ensemble model and SVM-based multi-agent ensemble learning, respectively, for credit risk assessment. Zhou et al. (2010b) and Dong et al. (2010) propose the nearest-subspace method and a logistic regression approach with random coefficients, respectively. However, only a few studies have paid attention to the feature selection process in credit risk modeling. For example, Liu et al. (2005) and Wu et al. (2009) apply the SVM and the fuzzy integral SVM ensemble, respectively, for credit risk modeling, and include the feature selection method of factor analysis in the modeling process. In addition, Tsai (2009) finds that different feature selection methods result in different feature extraction rates and the corresponding classification performances. In this regard, optimizing the feature selection process and the determination of a suitable feature extraction rate is an important issue in credit risk assessment.

Feature selection methods
This study focuses on three feature selection methods: B&B, the t-test, and the factor analysis. The study considers B&B because it represents an optimization search in feature selection methods and is widely used as a global optimization method (Lawler, Wood 1966). Chen (2003) proposes an improved B&B algorithm for feature selection that can search for the same optimal solution faster than traditional ones. Tsai (2009) compares several feature selection methods for bankruptcy prediction and finds that the t-test and the factor analysis show good performance.
The basic idea of B&B is to divide a given problem (e.g. candidate features for credit risk assessment) into several subproblems (e.g. potential feature subsets). Then the algorithm divides each subproblem continuously until it can no longer be decomposed or produce an optimal solution. B&B-based feature selection methods first construct a search tree of candidate features and then start searching from upper nodes to bottom ones through backtracking. They search the tree from top to bottom and from right to left until they identify an optimal feature subset. This study uses the sum of the estimated Mahalanobis distance as an optimization criterion.
British psychologist Spearman (1904) proposes the factor analysis, a statistical technique for extracting common factors from multiple variables. It extracts fewer factors to describe relationships between multiple variables and classifies variables into various groups according to the dactyl of correlations between them. As a result, correlations between variables in the same group are strong and those between variables in different groups are weak. Here each group generates one factor, and therefore this analysis extracts fewer factors to reflect most information of the original data set. Factor extraction methods include the principal component analysis (PCA), the unweighted least squares method, the maximum likelihood method, and the image factor extraction method, among others. This study uses the PCA to extract important factors and determines the number of main factors and the corresponding eigenvector matrix based on the principle that the eigenvalue should exceed 1.
Based on test objects, the t-test can be classified into various types, including the singlesample t-test, the independent-sample t-test, and the paired-sample t-test. The single-sample t-test is used to test whether there is a significant difference between the mean of a single variable and a given constant. The independent-sample t-test examines the significance of mean differences between two independent samples and analyzes whether these two samples come from two populations with equal means. The paired-sample t-test is used to analyze the significance of mean differences between two paired samples. This study uses the independent-sample t-test to identify any significant mean differences between two groups for each feature and exclude uncorrelated features to reduce dimensionality.

BBHFS-IOMCE methodology
We propose a credit risk assessment model based on BBHFS-IOMCE by comprehensively considering the following: 1) the importance of optimizing feature selection in credit risk modeling; 2) the importance of determining an appropriate feature extraction rate; and 3) the reality that a credit data set typically consists of imbalanced samples. The proposed model uses a multiple-classifier ensemble to treat imbalanced samples and employs hybrid feature selection based on the t-test and B&B to determine an appropriate feature extraction rate and retain useful information.

BBHFS algorithm
For a data set with N features, suppose that the objective of feature selection is choosing D features from the initial N features to compose an optimal feature subset. Heijden et al. (2004) provide a detailed description of B&B, which can search for a globally optimal feature subset. However, the value of D is often determined by humans subjectively, and classification Technological and Economic Development of Economy, 2015, 21(3) Count the frequency of each initial feature in feature subsets FF K (K = 1, 2, ..., 10) If K = 10 If K < 10 K = K + 1 e feature subset FF K e nal feature subset with frequency exceeding three Compare results for two groups and retain the group with the higher evaluation criterion models based on different D values tend to perform differently. In this regard, it is important to determine an appropriate D value or a suitable feature extraction rate (D/N). BBHFS first sets the upper and lower bounds of D according to t-test results, and by using a tenfold cross-validation method to divide the original training data set into 10 parts, it generates 10 groups of training and validation sets. Then, in each group of training and validation sets, it searches the feature subset that maximizes classifier performance. Finally, it calculates the frequency of each feature in the 10 feature subsets selected and reserves those features with frequency exceeding three to obtain a feature subset with an optimal D value. Figure 1 shows the BBHFS algorithm. (1) Select features that have significant differences between the two classes in the t-test to represent the initial training data set, denoted as DS.
(2) Set the upper bound of D as the number of features selected by t-test.
(3) Set the lower bound of D as 1.
(4) Determine the intermediate value of upper and lower bounds of D and denote it as M.
(6) For K = 1, 2, …, 10 1) Construct the training set TR K = DS K , and the validation set 2) Search for the local optimal feature subset that makes the classifier trained on TR K achieve the highest validation accuracy on V K , by varying the value of D from M respectively to both the upper and lower bounds of D. Namely, implement the following processes  and  simultaneously: The left side's local optimal feature subset F ʹ = []. For i = M, (M-1),..., the lower bound of D a) Use B&B on TR K to select i features that constitute the feature subset F K _i; b) Obtain a training data subset TR K _i from TR K according to F K _i; c) Train the classifier CF K _i on TR K _i; d) Obtain a validation data subset V K _i from V K according to F K _i; e) Verify CF K _i on V K _i to obtain the evaluation criterion C K _i; f) Update C ʹ and F ʹ according to the following rule: If C ʹ < C K _i, then C ʹ = C K _i and F ʹ =F K _i;  Search from the right side of M: M → the upper bound of D Initial setting: The right side's local optimal evaluation criterion C ʹʹ = []; The right side's local optimal feature subset F ʹʹ = []; For j = M, (M+1),..., the upper bound of D a) Use B&B on TR K to select j features that constitute the feature subset F K _j; b) Obtain a training data subset TR K _j from TR K according to F K _j; c) Train the classifier CF K _j on TR K _j; d) Obtain a validation data subset V K _j from V K according to F K _j; e) Verify CF K _j on V K _j to obtain the evaluation criterion C K _j; f) Update C ʹʹ and F ʹʹ according to the following rule: If C ʹʹ < C K _j, then C ʹʹ =C K _j and F ʹʹ = F K _j;  Obtain the local optimal feature subset FF k for the k-th iteration according to the following rule: If C ʹ > C ʹʹ, then FF k = F ʹ, else FF k = F ʹʹ; (7) After 10 iterations, collect 10 groups of feature subsets, namely FF 1 , FF 2 , …, FF 10 . Calculate the frequency of each feature in these feature subsets and reserve those features with frequency exceeding three. As a result, we obtain a final optimal feature subset whose corresponding D value is optimal. Technological and Economic Development of Economy, 2015, 21(3): 351-378 From the above algorithm, we can find that BBHFS has the following advantages. After excluding features with no significant differences between two groups based on the t-test, BBHFS selects those features with higher frequency to compose an optimal feature subset by searching 10 groups of feature subsets that maximize classifier performance. In this way, it obtains an optimal D value that is not determined subjectively or arbitrarily. In addition, it uses a parallel searching strategy to search for an optimal D value. That is, it starts from the intermediate value of the upper and lower bounds of D and then searches from both sides simultaneously, reducing the computing time.

IOMCE
Data sets used for credit risk modeling are typically not in equilibrium, which means that samples of some categories account for a large portion of a data set, whereas others, a small portion. If such an imbalanced data set is used for single-classifier modeling, then the model's recognition rate for the majority group is likely to be higher than that for the minority group. However, more attention should be focused on identifying the minority group in some real-world application of credit risk assessment. Therefore, it is necessary to appropriately resolve the problem of imbalanced data. There are two major strategies for addressing this problem. A data-level strategy changes the distribution of data to reduce imbalance degree, and an algorithm-level strategy modifies algorithms so that they produce good modeling results even for imbalanced data (Chawla et al. 2004). From the data-level perspective, useful processes generally include oversampling, undersampling, and partitioning training set. Oversampling improves classification performance for the minority group by adding minority samples (Maloof 2003). For instance, Chawla et al. (2002) propose the synthetic minority over-sampling technique. Undersampling deletes some majority samples to improve classification performance for the minority group. For example, Kubat and Matwin (1997) propose a one-sided selection algorithm. Oversampling and undersampling both have some limitations. Oversampling tends to increase minority samples by copying samples without adding any new information on the minority group and thus may lead to overfitting. Undersampling may cause the loss of some important information because it reduces some majority samples.
The training set partition method addresses imbalanced data by dividing the training data set. It divides majority samples from the training data set into a series of disjointed subsets based on the ratio of majority samples to minority ones and then combines the latter with each of the above subsets separately to generate various training subsets. A combined classification method using multiple classifiers trained independently from various training subsets can outperform oversampling or undersampling methods (Yan et al. 2003). From the algorithm-level perspective, useful strategies include the cost-sensitive learning method and the classifier ensemble method. The cost-sensitive learning method sets misclassification costs that vary with different groups. That is, it sets higher costs for the minority group than for the majority group. In this way, it enhances the recognition rate for the minority group. However, it is difficult to determine misclassification costs for minority and majority samples in credit risk assessment. The classifier ensemble method combines a variety of base classifiers through a certain ensemble rule to generate combined classification results, enhancing overall classification performance for an imbalanced data set. Although there are other methods, including active learning (Constantinopoulos, Likas 2008), random forests , and the subspace method (Ahn et al. 2007), the training set partition method and the classifier ensemble method can better address imbalanced data sets (Yei et al. 2009). Therefore, the present paper uses the IOMCE method, which combines these two methods to address imbalanced data sets for credit risk assessment. Figure 2 shows the IOMCE process for binary classification.

Fig. 2. IOMCE process
Let the ratio of majority samples to minority ones be θ. The partition ratio of majority samples k is set according to the value of θ. In detail, if θ is odd, then the partition ratio of majority samples is equal to θ, that is, k = θ, and simple majority voting can be used as the ensemble rule. On the other hand, if θ is even and k = θ, then simple majority voting may result in the same votes for the two groups, causing a decision dilemma. To avoid such a situation, we can employ two strategies. First, we can let k = θ -1. In this case, the number of base classifiers in the ensemble is still odd, but the number of minority samples and that of majority samples in the training data set are still imbalanced to some extent, although this imbalance is much less pronounced (i.e. close to a balanced data set). Second, we can use weighted majority voting instead of simple majority voting as the ensemble rule. In this case, training accuracy can serve as the weight for the vote from each base classifier, which can make the weighted votes for the two groups different in most cases. Even when the same weighted votes are obtained for the two groups, which happens with little possibility, the target sample can be recognized as having bad credit for the purpose of controlling credit risk. Technological and Economic Development of Economy, 2015, 21(3): 351-378 3. Empirical experiments for the assessment of corporate credit risk

Experimental design
For feature selection, we consider no feature selection as well as the following four feature selection methods: pure B&B, the factor analysis, the pure t-test, and BBHFS. We randomly divide each group of samples into two parts so that two thirds of samples are used for training and the rest, for testing. Because any difference in the partition of a data set can lead to different experimental results, we repeat the whole experiment for totally 100 times to avoid a biased analysis. That is, we partition the whole data set into two subsets (one for training and the other for testing) for totally 100 times. We construct a credit risk assessment model for each training data set and test it by using the corresponding testing data set.
Because training and testing data sets are imbalanced, we use the IOMCE method for credit risk modeling. In the experimental data set, the ratio of highly risky firms to normal ones is 1:3. Therefore, for the training set, we trisect normal firms into three parts and combine each of them with highly risky firms to generate three training subsets. We use them to train three base classifiers and then integrate their outcomes by majority voting.
We conduct empirical experiments by using Matlab with LibSVM (Fan et al. 2005) and PRTool (Heijden et al. 2004) and use SPSS to replace missing data and to conduct the factor analysis and the t-test.

Sample data and feature selection
We collect publicly traded firms that are highly risky in the years 2001-2010 and show the sample with financial ratios for year t-2. Here t denotes the year in which a publicly traded firm receives special treatment (ST) from the Shanghai Stock Exchange or the Shenzhen Stock Exchange. We define those publicly traded firms in China that receive ST for a bad financial condition in year t and once increased their long-term bank loan in year t-2 as highly risky samples. We match highly risky samples with low risky ones. We select three matched samples for each highly risky sample based on the following criteria: 1) they belong to the same (or similar) industry as the highly risky sample; 2) they have no ST experience; and 3) they show good operating performance. The matching ratio is 1:3. Finally, we obtain 107 highly risky firms and 321 normal ones for credit risk assessment (a total of 428 sample firms). All these firms are medium-sized or large firms because their registered capital is at least CNY 50 million. We use the credit risk assessment model to predict these two types of firms. If the model can successfully predict highly risky firms, then banks can adjust their credit decisions and reduce their loss from credit risk.
The initial feature set consists of 16 financial ratios. All these ratios belong to continuous features and indicate a firm's solvency, profitability, operating capability, or development capacity (Table 1). We collect values of the 16 financial ratios for year t-2 for each sample and obtain the original experimental data set. Missing values require careful handling in credit risk assessment (Florez-Lopez 2010). Missing values for each feature are replaced with the average value of all samples in the same group. The factor analysis extracts six factors from the original 16 features. The features selected by other methods are marked in Table 1. The results of independent-sample t-test show that 11 features have significant differences between the high-and low-risk groups, and only five features are excluded. With the number of selected features set as 5, the five features selected by pure B&B (B&B5) include cash to current debt ratio, return on total assets, operating profit ratio, profit to cost ratio, and net profit growth rate. BBHFS automatically selects the following 5 features: cash to current debt ratio, return on total assets, profit to cost ratio, total assets turnover, and net profit growth rate, which all show significant differences in the t-test. Furthermore, we utilize the Pearson correlation analysis (CA) to validate the correlation of each pair of financial ratios selected by the pure t-test, B&B5, and BBHFS. As a result, only three pairs of financial ratios selected by the pure t-test show the correlation coefficients higher than 0.6. Namely, that of current ratio (V1) and debt to equity ratio (V4) is 0.65, that of return on equity (V6) and return on total assets (V7) is 0.61, and that of current assets turnover (V12) and total assets turnover (V13) is 0.72. However, none pair of financial ratios selected by B&B5 and BBHFS has a correlation coefficient over 0.6. Therefore, the pure t-test tends to retain too many features with redundant information, and we can combine it with CA to form the feature selection method of t-test & CA to further eliminate redundant ratios. Although the pure B&B does not select features that show high correlation between each other, it has difficulty in determining the number of selected features. Simultaneously, 3 out of 5 features selected by the B&B5 method are profitability ones and they do not cover operating capability at all, which may affect the performance of credit risk assessment. By contrast, BBHFS can not only select a feature subset with an optimal feature number, but also remove the redundant and relatively less important features, which have high correlation with some other important features. The features selected by BBHFS cover the whole dimensions of a company's solvency, profitability, operating capability, and development capacity, which should be emphasized by firms for healthy development and good credit rating.

Basic model
We apply the SVM and the MDA to train classifiers because they respectively belong to the new machine learning classification algorithm and the classical statistical classification method, respectively. The SVM, proposed by Vapnik (1996), is a machine learning algorithm based on statistical learning theory. The SVM, which is based on statistical VC-dimension theory and the structural risk minimization principle, outperforms other artificial intelligence methods in terms of modeling and generalization and can better deal with small samples. The SVM is a very useful tool for credit risk assessment (Crook et al. 2007;Bellotti, Crook 2008;Cimpoeru 2011). Previous studies have indicated that the SVM algorithm with the radial basis function (RBF) has better generalization ability (Huang et al. 2004), and therefore we employ this type of SVM algorithm. The RBF SVM has two parameters, namely the tuning parameter C and the kernel parameter γ, and the choice of parameter values is critical to its classification performance. We combine the fivefold cross-validation method with the grid search technique to select optimal parameter values. The optimal values for C and γ are searched in the ranges [2 -8 , 2 -7 , 2 -6 , …, 2 9 , 2 10 ] and [2 -10 , 2 -9 , 2 -8 , …, 2 7 , 2 8 ], respectively. The MDA, proposed by Fisher (1936), is widely known as a fast and effective classification method. No parameters need to be set in advance for MDA modeling, and it can produce stable and acceptable performance in credit risk assessment (Chijoriga 2011;Hui, Sun 2006).

Evaluation measures
Accuracy, sensitivity and specificity are used as evaluation measures of classification performances. To define these evaluation measures clearly, we use the confusion matrix as a tool for analyzing the recognition ability of a classifier. The confusion matrix for binary classification is illustrated in Table 2. Here the positive group includes highly risky minority samples, and the negative one, low risky majority samples. In Table 2, TP (true positive) denotes the number of samples predicted as the positive when they are truly positive; FN (false negative) denotes the number of samples predicted as the negative when they are truly positive; FP (false positive) denotes the number of samples predicted as the positive when they are truly negative; and TN (true negative) denotes the number of samples predicted as the negative when they are truly negative. The performance evaluation measures are defined in Table 3.
Accuracy measures a classifier's overall classification capacity, sensitivity measures the capacity of a classifier to identify positive samples, and specificity measures the capacity of a classifier to identify negative samples.  Table 4 shows the means of accuracy, sensitivity, and specificity for single-classifier and IOMCE models using different feature selection methods for the corporate credit risk data set. B&B5 and B&B10 indicate the use of pure B&B for selecting 5 and 10 features, respectively, for credit risk modeling.  Table 5 shows the results of mean comparison between BBHFS and the other feature selection methods for the corporate credit risk data set. In the table, Y (N) indicates that BBHFS has a higher (lower) mean of certain evaluation measure than the other feature selection methods. We conduct the nonparametric Wilcoxon test for two related samples to determine whether there are significant differences between BBHFS and the other methods. The values for Z-statistic and corresponding significance levels are also listed in Table 5.  By comparing the values in Table 4 among different feature selection methods, we can find that no feature selection method always outperforms the others for each evaluation measure and for each classification algorithm. Therefore, the purpose of feature selection study is finding a satisfactory method instead of an optimal one, for a specific domain. As shown in Table 5, in the 72 times of comparison with the other feature selection methods, BBHFS performs significantly better for 46 times and worse for 12 times, with no significant difference for the remaining 14 times. This suggests that BBHFS is a satisfactory feature selection method for credit assessment modeling. For a more objective evaluation, more detailed comparison between BBHFS and each of the other feature selection methods are explained as follows.

Comparison between BBHFS and the other feature selection methods
(1) BBHFS vs. no feature selection: In the 12 times of comparison with no feature selection, BBHFS obtains significantly better performance for 9 times and significantly worse performance for 3 times. Therefore, BBHFS significantly outperforms no feature selection.
(2) BBHFS vs. pure B&B: In the 12 times of comparisons with B&B5, BBHFS obtains significantly better performance for 5 times and significantly worse performance for 6 times, with no significant difference for 1 time. In addition, in the 12 times of comparisons with B&B10, BBHFS obtains significantly better performance for 8 times and significantly worse performance for 2 times, with no significant difference for 2 time. This suggests that the performance of pure B&B varies widely according to the number of selected features and that properly setting the number of selected features for pure B&B can produce more satisfactory results. Because the optimal number of selected features is unknown beforehand, this can be a key bottleneck when applying the pure B&B feature selection method to real-world problems. By contrast, BBHFS can determine the number of selected features automatically in a data-driven manner. Although it cannot ensure optimal performance, it shows very satisfactory performance. Together, these results suggest the superiority of BBHFS over pure B&B.
(3) BBHFS vs. the factor analysis: In the 12 times of comparison with the factor analysis, BBHFS obtains significantly better performance for 9 times and significantly worse performance for 1 time, with no significant difference for 2 times. Therefore, BBHFS significantly outperforms the factor analysis.
(4) BBHFS vs. the t-test: In the 12 times of comparison with the t-test, BBHFS obtains significantly better performance for 8 times, with no significant difference for 4 times. Therefore, BBHFS significantly outperforms the t-test.
(5) BBHFS vs. the t-test & CA: In the 12 times of comparison with the t-test & CA, BBHFS obtains significantly better performance for 7 times, with no significant difference for 5 times. Therefore, BBHFS significantly outperforms the t-test & CA.
Stability comparison among different feature selection methods are shown in Table 6. We calculate the coefficient of variation for accuracy, sensitivity, and specificity of singleclassifier and IOMCE models for the corporate credit risk data set and make a comparison of them between B&B-based hybrid method and the other methods. In Table 6, Y indicates that BBHFS has a lower or equal coefficient of variation compared with other feature selection methods, and N indicates that BBHFS has a higher coefficient of variation. The results show that the features selected by BBHFS have better stability of credit risk assessment than no feature selection, B&B5, B&B10, factor analysis, and pure t-test, and almost equal stability to t-test & CA. More specifically, BBHFS produces lower or equal coefficients of variation for at least 6 times in the total 12 times of comparison with each of the other methods. By contrast, the features selected by no feature selection, factor analysis, and pure t-test show relatively worse stability in credit risk assessment. These results suggest that BBHFS is also a desirable feature selection method in terms of performance stability.

Comparison between IOMCE and single-classifier models
Based on Table 4, we calculate the difference between specificity and sensitivity for the corporate credit risk data set and show the results in Table 7. For a clearer understanding of the relationships between single-classifier and IOMCE models in terms of their performance evaluation measures, we provide graphs in Figure 3 based on the results in Table 4. As shown in Table 4 and Figure 3, single-SVM and single-MDA show relatively high accuracy and specificity but low sensitivity. In addition, Tables 4 and 7 show large difference between specificity and sensitivity for single-SVM and single-MDA models with different feature selection methods, which can be explained by the imbalanced experimental data set. More specifically, single-SVM and single-MDA models trained on an imbalanced data set tend to produce high recognition rates for majority samples, which results in high degree of specificity and drives up the overall accuracy. However, single-SVM and single-MDA models produce relatively low recognition rates for minority samples, corresponding to relatively low degree of sensitivity. As shown in Table 7 and Figure 3, this classification phenomenon for an imbalanced data set is more pronounced for single-SVM models than for single-MDA models. That is, single-SVM models show even lower degrees of sensitivity and even larger differences between specificity and sensitivity than those for single-MDA models.
With the employment of IOMCE to process imbalanced data set, the differences between specificity and sensitivity reduce significantly, as shown by Table 7. The differences between specificity and sensitivity are significantly lower for IOMCE than for single-classifier models, and the reductions are even more evident for SVM models than for MDA models.
As shown in Table 4, the employment of the IOMCE method does not increase overall accuracy for most SVM and MDA models with different feature selection methods, and in some cases, it even reduces it slightly. However, the employment of the IOMCE method sharply increases sensitivity with acceptable reductions in specificity for all models. Table 8 shows the differences in accuracy, sensitivity, and specificity between IOMCE and single-classifier models for the corporate credit risk data set. The results clearly indicate that processing an imbalanced data set with the IOMCE method substantially improves the classifier's ability to identify minority samples (the group with a high level of credit risk) while incurring a smaller decrease in its ability to identify majority samples (the group with a low level of credit risk) for all models except for MDA with the factor analysis and the t-test.  These results suggest that the IOMCE method should be used to address imbalanced data sets in credit risk modeling. In particular, combining BBHFS with the IOMCE can produce stable and satisfactory performance in credit risk modeling based on an imbalanced data set. Overall, the BBHFS-IOMCE method is more effective than all other methods, suggesting that it is a useful credit risk assessment tool for publicly traded firms in China.

Extended experiments for financial distress prediction
and personal credit risk assessment

Data
We conduct extended experiments using two additional data sets to validate whether the proposed BBHFS-IOMCE approach is effective for other data sets. The first data set for financial distress prediction (FDP) is 1:5 imbalanced. It includes 384 samples, among which 64 are distressed samples and 320 are healthy ones. This data set is an imbalanced version processed from the original balanced data set in Sun and Li (2011) and has 42 initial features that are all continuous. This data set is similar to the above corporate credit risk data set in that all their features are continuous, and the difference is that the FDP data set has much more initial features. The second data set is derived from a UCI data set (German personal credit risk) provided by Hofmann and Stat (1994). The original data set consists of 300 samples with bad credit and 700 samples with good credit. By randomly duplicating 200 samples with good credit, we obtain a new 1:3 imbalanced data set with a total of 1,200 samples. This data set is different from the corporate credit risk data set and the FDP data set in that it includes many discrete and nominal features. Table 9 shows the means of accuracy, sensitivity, and specificity for single-classifier and IOMCE models using different feature selection methods for the FDP data set. B&B10 and B&B20 indicate the use of pure B&B for selecting 10 and 20 features, respectively, for FDP modeling.  Table 10 shows the results of mean comparison between BBHFS and the other feature selection methods for the FDP data set as well as those of the nonparametric Wilcoxon test.

Comparison between BBHFS and the other feature selection methods
In the 72 times of comparison with the other feature selection methods, BBHFS performs significantly better for 54 times and worse for 4 times, with no significant difference for the remaining 14 times. More specifically, BBHFS shows significantly better performance than no feature selection, B&B10, B&B20, the factor analysis, the pure t-test, and t-test & CA for 11, 4, 8, 11, 10, and 10 times, respectively, and worse performance for 0, 3, 1, 0, 0, and 0 times. These results provide support for the conclusion drawn from the corporate credit risk data set that BBHFS is a satisfactory feature selection method for continuous features. Table 11 shows the coefficient of variation for accuracy, sensitivity, and specificity for single-classifier and IOMCE models using different feature selection methods for the FDP data set, as well as the result of stability comparison between BBHFS and the other methods. BBHFS produces lower coefficients of variation for at least 7 times in the total 12 times of comparison with each of the other methods. These results provide support for the conclusion that BBHFS is more stable than the other feature selection methods.

Comparison between IOMCE and single-classifier models
Based on Table 9, we calculate the difference between specificity and sensitivity for the FDP data set and show the results in Table 12. Table 13 shows the differences in accuracy, sensitivity, and specificity between IOMCE and single-classifier models for the FDP data set. For the visualization of the relationships between single-classifier and IOMCE models in terms of their performance evaluation measures, we provide graphs in Figure 4 based on the results in Table 9. Continued Table 11 Technological and Economic Development of Economy, 2015, 21(3): 351-378 Consistent with the results for the corporate credit risk data set, the differences in specificity and sensitivity are much larger for single-classifier models than for IOMCE models (Table 12), and the reductions in accuracy and specificity are generally smaller than the increases in sensitivity except for MDA with no feature selection (Table 13). In addition, the relative locations of the curves in Figure 4 are very similar to those in Figure 3, providing further support for the argument that the IOMCE method can dramatically improve the recognition rates for the minority group while incurring only small reductions in overall accuracy and specificity. These results and those in Section 4.2.2 together suggest that the BBHFS-IOMCE method can still produce satisfactory performance for a more imbalanced (1:5) data set with more initial features (42). It is also indicated that processing an imbalanced data set with the IOMCE method is even more necessary for the SVM than for the MDA, which is consistent with the results for the corporate credit risk data set. This phenomenon may be due to the fact that the decision function of the SVM is determined only by support vectors, whereas that of the MDA is determined by all samples, which brings more effect of data imbalance on the SVM than on the MDA.

Experimental results and analysis for the personal credit risk data set
The results for different feature selection methods for the personal credit risk data set indicate no significant superiority of BBHFS over the others and no significant advantages of the four feature selection methods over no feature selection, which is inconsistent with the results for the corporate credit risk data set and the FDP data set. This may be due to the fact that the personal credit risk data set has much more discrete and nominal features than continuous ones and that this study's feature selection methods are more suitable for the latter. For example, Dash and Liu (1997) indicate that nominal features require special handling for feature selection because it is not easy to assign real values to them and B&B is not applicable to such feature selection. In addition, the t-test may not be valid when discrete data sets are analyzed (McElduff et al. 2010), and the factor analysis may not be appropriate for non-orderable discrete indicators with more than two categories (Ender 2005). In this regard, we only provide the results for no feature selection for the personal credit risk data set in Table 14 and chart mean accuracy, sensitivity, and specificity for different models in Figure 5.  The results indicate that the IOMCE method can improve the recognition rates for minority samples while incurring acceptable reductions in accuracy and specificity when the data set contains many discrete and nominal features. In addition, the role of the IOMCE method in processing an imbalanced data set is more evident for the SVM than for the MDA, which is consistent with the results for the corporate credit risk data set and the FDP data set. This suggests that the MDA is more robust than the SVM when a single-classifier model is used for an imbalanced data set.

Conclusions
This paper proposes an integrated credit risk assessment method that combines BBHFS with the IOMCE and uses the SVM and the MDA as the base predictor. We use BBHFS to select a feature subset with an optimal feature extraction rate and the IOMCE to address an imbalanced data set. For comparative analysis, we also used no feature selection and three other feature selection methods (pure B&B, the factor analysis, and the pure t-test) in single-classifier or IOMCE modeling. We compared their results with those of single-classifier and IOMCE models based on BBHFS. The empirical results for the corporate credit risk data set and two extended data sets show that credit risk modeling based on the BBHFS-IOMCE method generally produces the best classification performance when all features are continuous.
BBHFS is able to retain useful information for credit risk assessment when reducing the dimensionality of continuous features, but it is not suitable for discrete and nominal features. With either continuous or discrete/nominal features, using the IOMCE method for an imbalanced data set can significantly reduce the differences between specificity and sensitivity by increasing sensitivity sharply and reducing specificity only slightly. Taken together, these results suggest that both BBHFS and the IOMCE are necessary for credit risk modeling and that the BBHFS-IOMCE method can produce competitive performance for imbalanced data sets with continuous features.