GROUTABILITY PREDICTION OF MICROFINE CEMENT BASED SOIL IMPROVEMENT USING EVOLUTIONARY LS-SVM INFERENCE MODEL

Permeation grouting is a widely used technique for soil improvement in construction engineering. Thus, predicting the results of the grouting activity is a particularly interesting topic that has drawn the attention of researchers both from the academic field and industry. Recent literature has indicated that artificial intelligence (AI) approaches for groutability prediction are capable of delivering better performance than traditional formula-based ones. In this study, a novel AI method, evolutionary Least Squares Support Vector Machine Inference Model for groutability prediction (ELSIM-GP), is proposed to forecast the result of grouting activity that utilizes microfine cement grout. In the model, Least Squares Support Vector Machine (LS-SVM) is a supervised machine learning technique that is employed to learn the decision boundary for classifying high dimensional data. Differential Evolution (DE) is integrated into ELSIM-GP for automatically optimizing its tuning parameters. 240 historical cases of grouting process for sandy silt soil have been collected to train, validate, and test the inference model. Experimental results demonstrated that ELSIM-GP can overcome other benchmark approaches in terms of forecasting accuracy. Therefore, the proposed approach is a promising alternative for predicting groutability.


Introduction
In construction industry, soil improvement by means of permeation grouting is a process that is often carried out to reduce the water movement through soils (Zebovitz et al. 1989). Particularly for underground construction, inflow of groundwater has always been a critical issue for engineers (Butron et al. 2009). Incidents of inflow can bring about construction delay, and even cause serious damages for the quality of structures. As a consequence, permeation grouting is a crucial task that is needed to be accomplished in a majority of excavation and tunnel projects.
Among the grouts used for permeation grouting, microfine cement has been increasingly applied in the industry. The reason is that it usually provides improved groutability of the target geomaterial and it does not cause groundwater pollution in surrounding environment (Perret et al. 2000;Zebovitz et al. 1989). Furthermore, the microfine cement based grout is proven to have the capacity of filling cracks with small openings as well as penetrating fine soils with very low permeability (Perret et al. 2002).
Noticeably, grouting activity is considered to be successful if the grout can be sufficiently injected into the soil. According to Liao et al. (2011), the grouting activity succeeds if the injected grout is at least two times the volume of the void space under the split pressure. In practice, an accurate prediction of this activity using microfine cement is by no means an easy task (Akbulut, Saglamer 2002). It is because the suitability of conventional predictive formulas, which are mostly based on the grain-size of the soil and the grout, is unreliable for seminanometer scale grout (Liao et al. 2011). Furthermore, the grain-size is not the sole factor that affects the consequence of the grouting process.
Various research works have investigated grouting predictability. Akbulut and Saglamer (2002) and Ozgurel and Cumaraswamy (2005) found that in addition to the size of the soil and the grout, the water-to-cement ratio of the grout (w/c), the void size in soil, and the fines content (FC) of the total soil should be taken into account (Akbulut, Saglamer 2002). Liao et al. (2011) pointed out that inclusion of soil gradation information, namely the coefficient of uniformity (Cu), which measures the particle size range, and the coefficient of gradation (Cz), which characterizes the particle size curve, can boost the overall predictive performance. Needless to say, it is beneficial to take into account these factors for estimation of the grouting process (Tekin, Akbas 2011).
Because construction projects are highly uncertain and inherently context-dependent, artificial intelligence (AI) methods may provide viable alternatives for groutability prediction problem. The inference model, which composes of various AI techniques, can be utilized to derive new facts from historical data (Cheng, Wu 2009). The inference process changes adaptively in response to alteration in historical data. Used nomenclature is presented in Figure 1.

w/c
Water-to-cement ratio of the grout γ Notably, the problem of groutability prediction, in essence, can be modeled as a classification task that contains two class labels ("success" and "failure"). Therefore, AI based classifiers, such as Classification and Regression Trees (CART), Artificial Neural Network (ANN), and Least Squares Support Vector Machine (LS-SVM) can be feasible in coping with the aforementioned problem. The main motivation of selecting CART, BPNN, RBFNN, and LS-SVM is that they are popular and effective approaches for classifying data in high dimensional space (Brown et al. 1993;Olson et al. 2011). Furthermore, these techniques represent different learning mechanisms that can be worth investigating in the case of groutability prediction.
CART (Breiman et al. 1984) is a popular machine learning technique which utilizes historical data to construct decision trees. One major advantage of the decision tree based model is its ability to mitigate the negative effect of outliers because the model is capable of isolating the outliers in a separate node. However, one disadvantage of CART is that it may produce unstable decision trees (Timofeev 2004). The reason is that insignificant modification of learning sample could result in radical changes in the model. Moreover, as a replacement of stopping rules, CART generates a sequence of sub-trees by growing a large tree and pruning the tree back until only the root node is left (Loh 2011).
Artificial Neural Network (ANN) has been applied to deal with groutability prediction as well as with other problems in construction industry Kalinli et al. 2011;Liao et al. 2011;Tekin, Akbas 2011). This approach eliminates the requirement to find a mapping relationship that mathematically describes the problem of interest. Studies have shown that ANN is a viable substitution for traditional deterministic methods in coping with sophisticated problems (Cheng et al. 2010;Wong et al. 1997). When the influence factors and the structure of ANN are all specified, the task boils down to collecting a reasonable number of data to train the ANN. There are two types of ANN that are commonly utilized for classification: Back-propagation Neural Network (BPNN) and Radial Basis Function Neural Network (RBFNN).
The implementation of BPNN for solving machine learning tasks is exposed to several drawbacks. This method suffers difficulties in selecting a large number of controlling parameters such as number of hidden layers, number of neurons in a hidden layer and learning rate (Bao et al. 2005). Notably, the training process of BPNN is notoriously time-consuming. Furthermore, one major disadvantage of this approach is that its training process is achieved through a gradient descent algorithm on the error space, which can be very complex and may contain many local minima (Kiranyaz et al. 2009). Thus, training process is likely to be trapped into a local minimum and this definitely hinders the forecasting performance.
RBFNN is also an efficient learning algorithm for carrying out pattern classification. In this learning mechanism, radial basis functions (RBF) are embedded into the network structure (Pendharkar 2011). The training process of RBFNN comprises of two stages which can be accomplished with low computational expenses (Bishop 1995). In the first stage, an unsupervised learning algorithm, such as the orthogonal least squares (OLS) method, is used to determine a suitable set of RBF centers from a large set of training data points (Chen et al. 1991). The second stage requires the determination of the network's weights which is achieved by solving a linear system. Notably, for this approach, back-propagation computation is not required in specifying the parameters of the hidden layer (Liao et al. 2011). Therefore, RBFNN can be trained much faster than BPNN (Bishop 1995;Wu, Liu 2012).
Proposed by Suykens et al. (De Brabanter et al. 2010;Suykens et al. 2002), LS-SVM is an advanced machine learning technique which possesses many advanced features reflected in its good generalization and fast computation. In LS-SVM's training process, a least squares cost function is proposed to obtain a linear set of equations in the dual space. Consequently, to derive the solution, it is required to solve a set of linear equations that can be efficiently solved by iterative methods such as conjugate gradient (Wang, Hu 2005). Despite of its superiority, the application of this approach in construction engineering is still very limited. Surprisingly, no studies have investigated the capability of LS-SVM on predicting the groutability.
Furthermore, the mechanism for setting models' control parameters is an important problem in the field of AI. This crucial issue has been widely recognized by many scholars in a variety of disciplines Lean et al. 2009;Zhou et al. 2011). In practice, identifying the most suitable set of model's parameters, in essence, represents an optimization problem. Therefore, hybridizing the AI based techniques with an evolutionary optimization algorithm is a prevalent research direction Cheng, Wu 2009;Yu 2011).
Evolutionary computation is characterized by iterative progresses used to guide the randomly initialized population to the final optimal solution. Among evolutionary optimization techniques, DE, proposed by Storn and Price (Price et al. 2005), is a population-based stochastic search engine which is efficient and effective for global optimization in the continuous domain. It uses mutation, crossover, and selection operators at each generation to move its population toward the global optimum. Superior performance of DE, in terms of accuracy and fast operation, has been verified in many reported research works (Price et al. 2005;Storn, Price 1997).
Thus, this article proposes to fuse LS-SVM and DE to construct an inference model for groutability prediction using microfine cement. The remaining part of this paper is organized as follows. The second and the third sections of this paper review related literature on LS-SVM and DE. The framework of the proposed model ELSIM-GP is depicted in the fourth section. The fifth section demonstrates the experimental results. Conclusion on our study is mentioned in the final section.

Least Squares Support Vector Machine for classification
Given a training dataset where N is the number of training data points, n is the data dimension, and the corresponding class labels is denoted as { 1, 1}, k y ∈ − + the LS-SVM formulation for classification is presented as follows (Suykens et al. 2002;Suykens, Vandewalle 1999): where n w R ∈ is the normal vector to the classification hyperplane and R b∈ is the bias; k e R ∈ are error variables; 0 γ > denotes a regularization constant.
The Lagrangian is given by: where: k α are Lagrange multipliers; ( ) k x ϕ represents a kernel function. The conditions for optimality are given by: After elimination of e and w, the following linear system is obtained: The resulting LS-SVM model for function estimation is expressed as: where k α and b are the solution to the linear system (5).
The kernel function that is often utilized is Radial Basis Function (RBF) kernel. Description of RBF kernel is given as follows: where σ is the kernel function parameter.
In the case of the RBF kernel, there are two tuning parameters ( , ) γ σ that are needed to be determined in LS-SVM. The regularization parameter ) (γ is used to weight the importance of classification errors. Meanwhile, the kernel parameter ( ) σ affects the kernel width. It is worth noticing that proper setting of these tuning parameters is required to ensure desirable performance of the prediction model (Suykens et al. 2002).

Differential evolution
This section describes the algorithm of Differential Evolution (DE) proposed by Storn and Price (Price et al. 2005;Storn, Price 1997). The algorithm (Fig. 2) consists of five main stages: initialization, mutation, crossover, selection, and stopping condition verification. Given that the problem at hand is to minimize a cost function f(X), where the number of decision variables is D, we can describe each stages of DE in details.  i = 1, 2, …, NP and g represents the current generation. In DE algorithm, NP does not change during the optimization process (Storn, Price 1997). Moreover, the initial population (at g = 0) ought to cover the entire search space in a uniform manner. Thus, we can simply generate these individuals as follows: where: ,0 i X is the decision variable i at the first generation; rand[0,1] denotes a uniformly distributed random number between 0 and 1. LB and UB are two vectors of lower bound and upper bound for any decision variable.

Mutation
A vector in the current population (or parent) is called a target vector. Hereafter, the terms parent and target vector are used interchangeably. For each target vector, a mutant vector is produced via the following equation (Storn, Price 1997): where: r1, r2, and r3 are three random indexes lying between 1 and NP. 1, r g X , 2, r g X , and 3, r g X are three random vectors in the current generation g. These three randomly chosen integers are also selected to be different from the index i of the target vector. F denotes the mutation scale factor, which controls the amplification of the differential variation between 2, r g X and 3, r g X . , 1 i g V + represents the newly created mutant vector.

Crossover
The purpose of the crossover stage is to diversify the current population by exchanging components of target vector and mutant vector. In this stage, a new vector, named as trial vector, is created. The trial vector is also called the offspring. The trial vector can be formed as follows: , , 1 , , 1 , , where: Uj,i,g+1 is the trial vector; j denotes the index of element for any vector; randj is a uniform random number lying between 0 and 1; Cr is the crossover probability, which is needed to be determined by the user. rnb(i) is a randomly chosen index of {1, 2,..., } NP which guarantees that at least one parameter from the mutant vector (Vj,i,g+1) is copied to the trial vector (Uj,i,g+1).

Selection
In this stage, the trial vector is compared to the target vector. If the trial vector can yield a lower objective function value than its parent, then the trial vector replaces the position of the target vector. The selection operator is expressed as follows: where: Xi,g denotes the target vector in generation g; Xi,g+1 represents the target vector in the next generation g+1; Ui,g is the trial vector.

Stopping criterion verification
The optimization process terminates when the stopping criterion is met. The type of this condition can be specified by users. Commonly, maximum generation (Gmax) or maximum number of function evaluations (NFE) can be used as the stopping condition. When the optimization process terminates, the final optimal solution is readily presented.

Evolutionary Least Squares Support Vector Machine Inference Model for Groutability Prediction (ELSIM-GP)
This section describes the proposed model, named as ELSIM-GP, in detail. The model (Fig. 3) is established by a fusion of LS-SVM and DE. ELSIM-GP employs LS-SVM as the supervised learning algorithm for learning the decision boundary for carrying out classification task. Furthermore, the model incorporates the DE for automatically identifying the optimal values of tuning parameters. The construction of the prediction model is dependent on two tuning parameters: the regularization parameter (γ) and the RBF kernel parameter (σ).

Fig. 3. Evolutionary Least Squares Support Vector Machine Inference Model for Groutability Prediction (ELSIM-GP)
(1) Input Data: The historical database used in this article contains 240 on-site permeation grouting data samples collected by Liao et al. (2011). 192 data cases are used for training (80%) and validating (20%). Meanwhile, 48 data cases are used for testing. All of the grouting activities were executed in the cities of Taipei and Kaohsiung, Taiwan. A mixture of microfine cement and micro-slag in equal proportions was utilized as the injected grout. The diameters through which 95%, 90%, and 85% of the total grout passes are 7.4 µm, 6.4 µm, and 4.5 µm, respectively. Moreover, the diameter through which 70% of the total grout passes is less than 1 µm. Thus, the grout is considered to be semi-nanometer material.
According to the previous research (Liao et al. 2011;Tekin, Akbas 2011), utilizing only the grain-size ratio of the soil to predict the groutability is not able of completely describing the behavior of the grouting mechanism. Furthermore, experimental results have indicated that using other parameters, including the w/c, the e, the FC, the Cz and the Cu can deliver superior prediction performance in both training and testing cases (Liao et al. 2011). Hence, in the current study, seven influencing factors (Table 1) are considered to determine the outcome of the grouting activity. The diameter through which 10% of the total soil mass passes D10 (µm) IF2 The diameter through which 15% of the total soil mass passes D15 (µm) IF3 Void ratio e IF4 The fines content of the total soil mass FC (%) IF5 The coefficient of gradation Cz IF6 The coefficient of uniformity Cu IF7 Water-to-cement ratio of grout w/c Moreover, in this research, the grout size is not taken into account as an influencing factor. It is because previous researches have demonstrated that the performance of AI based approaches is not affected by the size of the grouts (Liao et al. 2011;Tekin, Akbas 2011). Furthermore, for each data case, the corresponding output is either +1, which means that the grouting is successful, or -1, which indicates unsuccessful grouting. Table 2 provides descriptive statistics of the influencing factors of the historical data.
Before the training the model, the data set has been normalized into a (0, 1) range which helps prevent the situation in which inputs with greater magnitudes dominate those with smaller magnitudes. The historical data is illustrated in Table 3. The function used for normalizing data is shown as follows: where: Xn is the normalized data; Xo is the original data; Xmax and Xmin denote the maximum and minimum values of the data, respectively.  (2) Tuning Parameter Initialization: The aforementioned tuning parameters of the model are randomly generated within the range of lower and upper boundaries (Table 4). (3) LSSVM Training: In this step, LS-SVM is deployed to learn the decision boundary to separate input data into two classes of groutability (-1 and +1).
(4) DE searching: At each generation, the optimizer carries out the mutation, crossover, and selection processes to guide the population to the optimal solution.
(5) Fitness evaluation: In ELSIM-GP, in order to determine the optimal set of tuning parameters, the following objective function is used in the step of fitness function evaluation: where: RTR denotes the accuracy rate of classification for training set; RVA represents the accuracy rate of classification for validating set. The accuracy rate of classification is calculated as the number of correct classification divided by the number of all data instances within a data set.
(6) Stopping condition: The DE's optimization process terminates when the maximum number of generation is achieved.
(7) Optimal prediction model: When the program terminates, the optimal set of tuning parameters has been successfully identified. The ELSIM-GP is ready to predict new input patterns.

Experimental result
As stated earlier, ELSIM-GP uses 192 data cases for training as well as validating and 48 data cases for testing. This means that 20% of historical data is reserved for testing process. However, due to the randomness in selecting testing cases, the evaluation of model error can be biased (Bishop 2006). To avoid such issue, the whole dataset is divided into five subsamples in which each subsample in turn serves as testing cases; and the model performance can be appraised via average predictive results of the five subsamples.
The process depicted above is the k-fold cross validation which is commonly used for model selection. In this process, the value of k is an unfixed parameter and it depends on the size of the data set at hand (Geisser 1993). For a large data set, even a small value of k (e.g. 3) can bring about relatively accurate result. Reversely, for a very sparse data set, a leave-one-out validation should be utilized in order to have as many training data as possible.
In general, with a large number of data folds, we can obtain a very good estimate of model performance. Nevertheless, the computational expense can be very costly especially for hybrid intelligence model like ELSIM-GP. Thus, it is beneficial to choose a value of k that can both estimate the model performance fairly and necessitate an acceptable computing effort. Arlot (2010) suggested that the number of folds should be between 5 and 10 because the statistical performance does not improve much for a larger one. Additionally, previous studies showed that 5-fold cross validation can bring about reliable estimates of prediction error (Breiman et al. 1984;Burman 1989).
Thus, considering the efficiency and the effectivity in evaluating model prediction performance, 5-fold cross validation is utilized. The whole data set, which includes 240 instances, is randomly separated into five folds. In each run, one data fold serves as testing set and the rest of the data folds are used for training the model (Fig. 4). Since all of the subsamples are mutually exclusive, this approach can diminish the bias in model assessment and can estimate the generalization property of each model.
After the training process, the proposed model, ELSIM-GP, is utilized to predict new input patterns from the testing data set. Furthermore, in order to verify the capability of ELSIM-GP, its performance is compared to results obtained from other benchmark approaches: Classification and Regression Trees (CART), Backpropagation Neural Network (BPNN), and Radial Basis Function Neural Network (RBFNN).
In the learning process, CART (Breiman et al. 1984;Loh 2011) dissipates the data into two subsets so that the records within each subset are more homogeneous than in the previous subset. Therefore, the algorithm operates by choosing a split at each node such that each sub-node created by the split is more pure than its parent node (Bevilacqua et al. 2003). Herein, purity refers to similarity of values of the target field. CART measures the impurity of a split at a node by defining an impurity measure. Since groutability prediction is a classification problem, Gini's diversity index is chosen in this study for measuring impurity. Another parameter that controls the tree structure is the tree depth. This parameter can be determined via the pruning algorithm (Timofeev 2004;Vega et al. 2009).
When using BPNN, it is needed to specify the number of hidden layers, the number of neurons in the hidden layer, and the learning rate (Samarasinghe 2006).

Fig. 4. Illustration of 5-fold cross validation
These parameters of BPNN are selected via repetitive trial-and-error processes. The network configuration is described as follows: the number of hidden layers is set to be 1; the number of neurons in the hidden layer is 14; and the learning rate is 1; and the number of training epochs is selected to be 2000. For RBFNN, there are several parameters needed to be specified: the number of center points, the center locations, and the standard deviation of the RBF. Herein, the utilized RBF is the Gaussian function. In our study, at the first stage, the orthogonal least squares (OLS) method is used to determine the number and the locations of centers (Chen et al. 1991). The second stage requires the determination of the network's weights which is achieved by solving a linear system (Liao et al. 2011). Table 5 provides the prediction results for ELSIM-GP as well as other approaches. Average prediction outcomes of RBFNN (90.83%), BPNN (89.58%), and CART (90.42%) are less accurate than that of ELSIM-GP (95%). CART seems to suffer the problem of over-fitting since it fits the training set very well, but performs poorly on the new data set. Noticeably, all of the three benchmark models have testing results in which classification rates are below 90%. This, to some degree, indicates unstable performances of these models.
Meanwhile, the average classification accuracy acquired from ELSIM-GP for training and testing sets are 95.73% and 95%, respectively. It can be observed that the proposed model has successfully overcome the issue of over-fitting since it yields relatively balanced performances between training and testing data sets. Notably, in the third fold, the accuracy rate for testing data can reach roughly 98%. Furthermore, predictive results obtained by ELSIM-GP of all data folds surpass 90%. These results demonstrate that the newly proposed approach is capable of delivering accurate predictive performance. The classification result of ELSIM-GP for one testing data fold is shown in the Table 6.

Conclusions
This paper has presented a new prediction model, named as ELSIM-GP, to assist construction engineers in assessing the possibility of a grouting process which utilizes microfine cement. The proposed model was developed by a fusion of LS-SVM and DE. ELSIM-GP utilizes LS-SVM to classify high dimensional input data so that the model can make prediction whenever a new input pattern is available. Meanwhile, DE searching algorithm is implemented to identify the most appropriate tuning parameters. Continued Table 6   IF1  IF2  IF3  IF4  IF5  IF6  IF7  YA  YP  25 0 Note: YA -actual output; YP -predicted output.
Since ELSIM-GP is a hybrid intelligence model, the approach can be quite complex for practical engineers. However, considering that permeation grouting is a complicated process and predicting its outcome is by no means an easy task. Therefore, it is very challenging to construct a simple model that yields highly accurate forecasting performance.
Although the proposed model is complicated to establish, ELSIM-GP has the advantage of operating autonomously because the model does not require any expertise in parameter setting. Model tuning parameters are all determined by DE. Hence, with more effort on software engineering, a user-friendly interface can be integrated into the model, and this enables ELSIM-GP to be a promising tool for practical engineers for dealing with groutability prediction problem.
In the current study, the soil type which is subjected to permeation grouting is sandy silt soil. Although simulation results have demonstrated that ELSIM-GP can deliver superior forecasting accuracy for this type of soil, more historical cases in which the targeted objects of the grouting process involve different soil types should be incorporated to enhance the generalization of the prediction model.
Additionally, if another type of microfine grout is applied, the characteristics of the grouting process can be changed. Hence, the current inference model may encounter difficulty in groutability forecasting. The reason is that geotechnical engineering is inherently uncertain and highly complex. Thus, it is required to collect new observations and to construct new prediction models by training processes. Nevertheless, the procedure of collecting new data cases are of great effort and time-consuming. Hence, we would like to consider this to be a promising future research direction.