Hybrid forecasting system based on case-based reasoning and analytic hierarchy process for cost estimation

Cost estimating of highway projects with high accuracy at the early stage of project development is crucial for planning and feasibility studies. Various research have been attempted to develop cost prediction models in the early stage of a construction life cycle. This study uses the hybrid estimating tool to provide an effective cost data management for highway projects and accordingly develops a realistic cost estimating system. This study focused on the development of a more accurate estimate technique for highway projects in South Korea at the early stage using hybrid analytic hierarchy process (AHP) and case-based reasoning (CBR). Real case studies are used to demonstrate and validate the benefits of the proposed approach. It is expected that the developed CBR system is to provide decision-makers with accurate cost information to asses and compare multiple alternatives for obtaining the optimal solution and controlling cost.


Introduction
Successful management within the limited budget is an important concern in any construction project. Lack of information and reliable methods that support estimating process made it difficult to initiate estimating report during the project planning stage (Chou, O'Connor 2007). In order to control the cost within an acceptable level, it requires appropriate and accurate measurement of various project related determinants and the understanding of the magnitude of their effects. As such, the importance of early estimating cannot be over emphasized. Number of cost estimating models, however, has been limited in road and bridge construction.
Several studies have demonstrated focus on highway construction cost estimating in the past. Owing to the lack of detailed design information and drawings during the early stages, several technical methods have been developed to estimate construction costs based on limited information (Chou 2009). Although MRA (Multiple Regression Analysis) has been used to cost estimating based on statistics many times, it is not appropriate when describing non-linear relationships, which are multidimensional, consisting of a multiple input and output problem (Tam, Fang 1999). Chou et al. (2005) suggested heuristic simulation models to improve the accuracy and efficiency of highway budgeting estimates based on useful data from the TxDOT (Texas Department of Transportation). Parametric cost estimating models were developed using ANNs (Artificial Neural Networks) for reasons of its limitation, and the models were demonstrated that they were very useful at the early stages of a project life cycle (Hegazy, Ayed 1998;Al-Tabtabai et al. 1999;Wilmot, Mei 2005). However, ANNs can lose their effectiveness when the patterns are very complicated or noisy, knowledge representation and problem structuring are illdefined, and training is trapped in local minima (Hegazy et al. 1994).
CBR (Case-based reasoning) is a relatively recent problem solving technique that is attracting increasing attention because it seems to resemble more closely the psychological process humans follow when trying to apply their knowledge to the solution of problems. CBR is problem solving technique that reuses past cases and experiences to find solution to the problems. While other major AI (Artificial Intelligence) techniques rely on making associations along generalized relationships between problem descriptors and conclusions, CBR is able to benefit from utilizing specific knowledge of previously experienced, evaluate the proposed solution and update the system by learning from this experience (Kolodner 1993;Shin, Han 1999;Kim, K. J., Kim, K. 2010).
Especially CBR systems have been proposed as effective alternatives to the support of decision-making. Several studies have demonstrated potential applications of CBR in construction areas. K. J. Kim and K. Kim (2010) proposed a preliminary cost estimation model using CBR and GAs (Genetic Algorithms) for determining the important weights of attributes. Kang et al. (2010) developed quantity-based construction cost estimating system using CBR and GAs. Ji et al. (2010) suggested CBR model for improving cost prediction accuracy in multifamily housing projects. Yau and Yang (1998) confirmed that CBR is a quite effective for selec-ting a retaining wall system at the project planning stage. Dzeng and Tommelein (2004) developed a generic CBR system to facilitate schedule reuse and the new CBR system to develop a decision-support system to aid a project manager in seeking subcontractor registration. Luu et al. (2005) approached to procurement criteria selection and modeling bridge deterioration (2002), and suggested the ways to reduce the problem of hazard identification. CBR has also been used for cost-estimating of construction projects Doğan et al. 2008), bid mark-up estimation (Dikmen et al. 2007), and cost budgeting for pavement maintenance (Chou 2009). Learning from previous researches and applications CBR can make very reasonable estimating without using specific experts and rules. For example, the cost of a construction project is influenced by a number of factors including the duration, the location, the year of construction, and the size of a project. The problem to be investigated is whether using the values for these factors, collected from completed previous projects, realisation cost for future projects can be reliably estimated. Therefore, the new innovative CBR approach was used to express the concept of the system developed in this study.
Problem of estimating future highway construction cost with regard to both 4 main divisions and total construction cost is discussed in the paper. The study is organized as follows. The next section describes the objectives and methodology of this study. The following section shows how this data was analyzed to verify its consistency and completeness and to obtain the knowledge required for the highway application. Then, 48 actual cases of highway project data constructed in South Korea, from 1996 to 2008, have been used as the source of cost data and in developing a CBR application for systematic highway project cost estimation. The next section briefly presents the CBR system that was developed specifically to generate CBR applications for modelling cost estimates and the steps followed in developing an application. Finally, the testing procedures and the validation results are discussed.

Objectives and methodology
The major aim of this study is to develop the hybrid CBR decision support system for estimating of highway project costs. The study goals included: (1) estimation of highway project costs at the early stage by 4 main divisions and total cost as well; (2) extracting significant CFs (Cost Factors) based on previous studies and interview with experts; (3) developing weighs values for CF using AHP (Analytic Hierarchy Process). As a result, the developed system provides a useful benchmark which is capable of assisting in identifying the CFs which demonstrated a strong relationship with highway project costs.
CFs are very complicated which requires intelligent processing to get a precise view of the effects of the cost attributes on project cost (Boussabaine, Elhag 1999). The data is required which corresponds to all the CFs which are known from previous studies. First of all, this study summarized literature review and identified significant CFs which affect a highway project costs. Furthermore, industrial interviews were conducted to assist with selecting these factors. When potential CFs were identified, the weights of data were calculated by AHP. In addition, appropriate CBR system was developed and examined, and preliminary testing of developed system was carried out, using a relatively small number of data sets. The system is developed by means of an MS Excel-Based Visual Basic Application.

Case-based reasoning
CBR is a not a kind of computerized tool that imitate the analogical reasoning of human brains in problem solving (Rivard et al. 1998). The principle of CBR is based on the assumption that similar problems have similar solutions. According to Riesbeck and Schank (1989), CBR solves problems by capturing previous experiences and matching the important features of new problem to those of the old cases that have been successfully solved. The main source of knowledge in CBR is the case that can be reused even if it is partially matching the problem in hand (Yang, Yau 1996). Especially, CBR can deal efficiently with both numerical and nominal data, and can handle effectively cases that have incomplete data or variable data structures (Arditi, Tokdemir 1999). Furthermore, CBR has powerful learning capabilities that do not require time-consuming training and testing operations (Yang, Yau 1996). Table 1 lists CBR applications in various domains. Aamodt and Plaza (1994) call the top level task of CBR problem solving and learning from experience which directly matches two phases, maintenance and application, as shown in Fig. 1. In the six-Re processes, changes initiated from outside of the CBR can be model-

Selection of cost factors
The factors affecting the project cost were selected as the attributes that would be used as the input data for prediction CBR system. The data came from application of the selecting procedure presented in Fig. 2. At first, the literature review was conducted to identify which CFs were used in order to accomplish the cost estimating of highway projects. While there are only a few studies available on CFs which are suspected to influence construction cost, numerous studies have taken place in the highway construction projects. They are listed in Table 2. A questionnaire survey was designed to obtain the primary data for this study. A pilot survey was first carried out to test the relevance and comprehensiveness of the questionnaires before a full scale survey was conducted. The respondents were given a choice of being  Marir and Watson (1995) Estimates the cost of refurbishing houses ELSIE Planning and Scheduling Lee et al. (1998) Scheduling of apartment construction FASTRAK-APT Tommelein (1997, 2004) Scheduling of power plant boilers CasePlan   Table 3.

Time Standardization
A cost index represents the relative scale of cost for a fixed quantity of goods or services between different periods, and provides a good means for forecasting future construction costs that change over time in response to changing demand, economic conditions, and prices (Ostwald 2001). The data collected for developing a CBR system have diverse characteristics and differences, such as when and where the projects were constructed. Such differences may cause incorrect prediction results (Kim, Kang 2003;Kim et al. 2005). A cost index ought to be a reliable tool for estimating future costs of construction activities, where construction activities are conducted months or years after costs were estimated (Huang 2007). The developed CBR cost estimation system includes therefore appropriate means which allow to reflect the change in overall highway construction costs over time. At first, the data used to establish the CBR system were collected from projects completed in 1996 and 2008. The data had to be converted to the identical time reference point defined by the Korea Institute of Construction Technology (KICT). The cost data of all the reference cases were converted to May 2005 cost level using the road cost index provided by the KICT. The detailed source of the road cost index of South Korea, which is announced monthly, was used for the construction cost adjustments.

Determining the weight
The "weight" indicates how much attention should be paid to the factor during the matching process in CBR cycle (Kolodner 1993). It reflects the importance of that factor relative to other factors. It was found that considered values of weights influence the project cost prediction at most (Arditi, Tokdemir 1999;Chua et al. 2001;Luu et al. 2005;An et al. 2007; Kim, K. J., Kim, K. 2010). The determination of an appropriate CF weighting method is a major issue for effective case retrieval and indexing in CBR cycle (Park, Han 2002;An et al. 2007). The major issue in CBR is to retrieve not just a similar past case but a usefully similar case to the problem. Previous approaches used GAs, gradient search, and feature counting. The problem with GA applications comes from difficulty in identification of appropriate fitness function which would successfully incorporate problem specific information (GA 2008). Gradient search may stagnate at local optima and fail to find the optimal global solution for certain starting solutions (Albright, Windston 2007; Kim, K. J., Kim, K. 2010). It is very difficult for feature counting to reliably state that one feature is more or less important than another based solely on human intuition (Arditi, Tokemir 1999).
For this reason, the integration of domain knowledge into the case retrieving and indexing process is highly recommended in developing a CBR system. This section utilizes a hybrid approach using AHP to case base retrieval process in an attempt to increase overall cost accuracy. If this hybrid approach is carried out well, the CBR system can deliver better estimation of costs (Shin, Han 1999). It can operate more accurately or at a lower cost level, it will be able to provide a better understanding of the effects of CFs interaction and variation.

Analytic Hierarchy Process
AHP is a multi-factor decision-making method that uses hierarchical structures to represent a decision problem and then delivers priorities for the decision-maker throughout judgments (Saaty 1986(Saaty , 1987(Saaty , 1990Dyer 1990). Many previous studies (Dyer, Forman 1992 (Saaty 1990) Intensity of Relative Importance Definition Explanation 1 Equal importance Two activities contribute equally to the objective 3 Moderate importance of one over another Experience and judgment slightly favour one activity over another 5 Essential or strong Experience and judgment strongly favour one activity over another 7 Very strong importance An activity is strongly favoured and its dominance is demonstrated in practice 9 Extreme importance The evidence favouring one activity over another is of the highest possible order of affirmation 2, 4, 6, 8 Intermediate values between the two adjacent judgments When compromise is needed

Reciprocals of above nonzero number
If activity i has one of the above nonzero numbers assigned to it when compared with activity j, then j has the reciprocal value when compared to i -AHP methodology to be well suited for decision-making due to its role as a synthesizing mechanism in decisions. For example, An et al. (2007) compared three different weighting methods and concluded that the AHP was more accurate, reliable, and explanatory than decent gradient methods for determining the relative important weights for making preliminary estimates of new construction costs. Once the hierarchy is built, the decision-maker systematically evaluates its components, which represent considered factors, by comparing their importance in a pair-wise manner. This study applies the AHP to calculate the weights of the aspects and the attributes within each aspect. Pair-wise comparisons of importance of the factors at each level of an AHP are made in terms of importance when comparing factors with respect to their relative importance (Zahedi 1986;Harker, Vargas 1987;Podvezko 2009;Medineckiene et al. 2010) (see Table 4). The last step is devoted to the measurement of the overall consistency of provided AHP judgments by means of the CR (Consistency Ratio) proposed by Saaty. The CR provides a way of measuring errors introduced during elicitation of expert opinions. The value of consistency index is applied with this regard Eq. (1) (Chen et al. 2010): where n is the number of compared factors, and λ max is the maximum eigenvalue of a judgment matrix which corresponds to the group of compared factors. The CR value is given by division of the CI value by the Random consistency index value. The RI value depends on number of compared factors. RI values for different numbers of factors are presented in Table 5. Appropriate CR value justifies extracting expert knowledge that can guide effective retrievals of useful weights. The weight values expressing importance of each CF are presented in Fig. 3. They will be assigned to the considered attributes for case based retrieval of the most similar process plans due to the effective similarity function in the proposed application area.  (Saaty 1990) n 1 2 3 4 5 6 7 8 9 10 RI 0 0 0.58 0.9 1.12 1.24 1.32 1.41 1.45 1.49

Case study
Expert knowledge can be applied to assess importance weights. The expert is expected to have the required knowledge and experience to decide which model or system makes good predictions. CBR applications can be created using the hybrid AHP-CBR application development tool. The CBR system searches for matched cases contained in the case base and summarises them into a set of acceptable solutions. Decision-makers select then one of the recommended solutions. The system's interface is organized following the basic process used to construct the AHP-CBR application. The reasoning structure of proposed system is presented in Fig. 4. The following six steps are involved in CBR application: Step 1. Case base definition: the first step is used to define the initial components of the system. The names and value types for CFs are defined. The selected CFs should provide the best description of relevant construction cost influencing attributes which result from prior experience. Table 2 presents an illustrative example of a case based library contents.

Fig. 4. Hybrid AHP-CBR system
Step 2. Similarity definition: the step deals with a way the similarity between a new problem description and the case based library items is assessed. The methodology and various metrics for determining similarity during case base retrieval are defined. SI (Similarity Index) is assessed both at the case level (comparing cases against each CF) as well as at the CF level (comparing the value of each CF value to the new entered CF values). Weighted case similarities between the new problem and cases included in the case base library are estimated according to the following formula: In Eq. (2), SI is a calculated numerical value which demonstrates the degree of similarity between a case in the case based library and the investigated problem case (Yau, Yang 1998). SI is normalized into a scale from 0 to 1 for easy comparison. Weights (W) of each CF can be either assigned by the decision-makers or AHP. SS (Similarity score) is calculated on the basis of values of the CFs: numerical and nominal. For the nominal factor, the SS equals 1 when the two values are identical and 0 otherwise.
For the numerical factor, SS is calculated by Eq. (3). In Eq. (3), V case based represents value of a factor for the cases stored in a case based library. V problem value corresponds to the target case for predicting highway costs. A more detailed classification method is applied to improve the accuracy in this study when decision-maker selects one of retrieved cases. It is possible to select the best matching case from the case based library. Consequently, the new SS formula has been developed and proposed here which not only expresses the difference of compared cases but also makes verification of the minimum and maximum relationship of the cases possible. The similarity score in the developed formula is referred to as SS revised to distinguish it from the SS used to retrieve similar cases: Finally, SI is calculated due to Eq. (2).
Step 3. Case definition: this step is used to fill in the case information for each case to be stored in the case based library. A case collection interface is then applied for introducing data for the real highway project cases into the library. CF values which describe the cases must conform to the defined types. 48 highway cases are included in the case based library in the prepared CBR system.
Step 4. Rule definition: rules are used to compute SI and to adapt a retrieved similar case to better meet the needs of the new problem. Rules are used to address the differences that exist between a new problem case (target case) and the retrieved similar case. The rules are applied to account for the differences and advise on what the plausible outcomes of a comparison might be. Rules can be used to change CF values based on comparison.
Step 5. Application interface: after case retrieval is complete the system returns a list of cases with SI values indicating their similarity to the target case. Their scores indicate their relevance to the problem at hand. The decision-makers can apply the selected case to help decide how to solve the current problem. The selected case can be then adapted to better assist in making a decision.
Step 6. System validation: to determine whether the predicted project cost provided by AHP-CBR is a good estimate of the problem case, three methods that have been reported by Yau and Yang (1998), Arditi and Tokemir (1999) and Koo et al. (2010) are used. Each of these methods makes use of the overall case SI for each retrieved case. These methods are as follows: 1. The problem case is compared to the characteristics of the retrieved case that has the highest overall SS; 2. The problem case is compared to the most frequent characteristics in the top ten retrieved cases, or fewer if ten are not available, that have an overall SS greater than or equal to 0.75 (75%); 3. The problem case is compared to the average characteristics of the top five retrieved cases, or fewer if five are not available that have an overall SS greater than or equal to 0.75 (75%). The average of the predicted condition is weighted using the overall SS to magnify the importance of the retrieved cases which have higher SS. According to the CBR concept, the case with the highest SI in the case base library may be considered to have the most similar project characteristics compared to the test case in this study. Also, each of the 4 division costs (see Table 7) is identifiable from the selected case based on retrieved SI. These results may be used as references in the decision-making process as well.

Result for the sample CBR system application
As mentioned above, the research was carried out by employing AHP method to assign importance weights to each CF. Different error calculation formulae have been used by previous studies. The Mean Absolute Estimation Error (MAEE) calculated due to Eq. (4) is applied for expressing the system performance: where: Cost CBR represents output for CBR application; Cost ACT expresses actual cost, and n denotes the number of testing cases. The n-fold cross-validation was adapted in the next phase to evaluate the performance of the AHP-CBR system and to reinforce the reliability of results. The 6-fold approach can be considered the effective form of reliability analysis of the measurement system. For example, MAEE of 9.17% with 62.5% of the estimates within 10% of the MAEE correspond to the results of the AHP-CBR system application, while 87.5% of the estimates within 15% are obtained for a 1-fold approach application. The results as listed in Table 6. And, the mean error (difference) rate of 1-fold compared to 6-fold is equal to 9.09%. The corresponding output accuracy of the established AHP-CBR system meets the fifth class requirements with regard to carrying out project screening and feasibility study due to the definition of American Association of Cost Engineers (AACE). Test cases allowed not only to predicted total construction cost estimation error rate but also to predict estimation error rate for each of the four division cost. Obtained results are shown in Table 7. Retrieved similarity index values obtained for the selected case based problems comprise therefore valid reference points for the decision-making process. Table 6. Cost reasoning errors for 6-fold cross-validation of the AHP-CBR system

Conclusion
Cost estimating system has become an integral part of any advanced cost management modelling. Such systems make estimation of the accurate project cost and improvement in cost prediction rate possible. Presented research therefore focused on developing the hybrid AHP-CBR system which provides accurate predictions of the future cost of different highway projects. The contribution of this research pertains to four areas. At first, obtaining the higher predictive accuracy of cost estimate and guide to decision-maker at the early planning stage is addressed. Developed AHP-CBR system reduces the time required to build a cost list for project activities and makes reduction of processing time and cost possible. At second, the extracted CFs for highway projects significantly improve system performance with regard to the cost estimation. This finding contributes to the current body of knowledge on approximate cost estimating, and may serve as a useful guide for future data collection efforts and cost estimation system development. At third, this research proposes the identification of an alternative similarity score measuring formula. The introduced similarity measure makes investigation of contrast between the developed similarity measure and the classical SS measures possible when CFs are used to describe a case. And finally, the weights of CFs are calculated using AHP.
In order to enhance the capabilities of the CBR approach in cost estimating, numerous problems should be explored in the future research. The problems include: development of proprietary indices for adjusting the cost due to difference in a project location, development of more justified weights using different weight estimation methods, collection of more project cases into the case based library for improving accuracy, and identification of important CFs in accordance with different phases of the project planning and realisation.