CONCEPTUAL COST ESTIMATIONS USING NEURO-FUZZY AND MULTI-FACTOR EVALUATION METHODS FOR BUILDING PROJECTS

During the conceptual phase of a construction project, numerous uncertainties make accurate cost estimation challenging. This work develops a new model to calculate conceptual costs of building projects for effective cost control. The proposed model integrates four mathematical techniques (sub-models), namely, (1) the component ratios sub-model, (2) fuzzy adaptive learning control network (FALCON) and fast messy genetic algorithm (fmGA) based sub-model, (3) regression sub-model, and (4) multi-factor evaluation sub-model. While the FALCONand fmGA-based sub-model trains the historical cost data, three other sub-models assess the inputs systematically to estimate the cost of a new project. This study also closely examines the behavior of the proposed model by evaluating two modified models without considering fmGA and undertaking sensitivity analysis. Evaluation results indicate that, with the ability to more thoroughly respond to the project characteristics, the proposed model has a high probability of increasing estimation accuracies more than the three conventional methods, i.e., average unit cost, component ratios, and linear regression methods.


Introduction
Accurate cost estimation is a challenge for the project estimator during the conceptual phase of a construction project. Early cost estimates are necessary to compare design alternatives and select the most economical technical solution (Petroutsatou et al. 2012). An excessively low estimate can result in project overruns, while an excessively high estimate results in insufficient budget to address other critical user needs (Oberlender, Trost 2001;Wen 2010;Asmar et al. 2011).
This study defines the conceptual phase as the phase at which approximately 30% of the design is completed, and at which cost estimate is generally considered a budget estimate or baseline cost for a construction project (Lai et al. 2008;Cheng et al. 2010;Petroutsatou et al. 2012). Furthermore, this estimation challenge is difficult because only conceptual design drawings and specifications are available, and the estimations involve numerous assumptions (Asmar et al. 2011).
The construction industry has used several conventional conceptual cost estimation methods, such as average unit cost, cost indices, cost-capacity factors, and parametric estimation, to rapidly calculate total project cost (Barrie, Paulson 1992;Hendrickson, Au 2003;Hong et al. 2011). These early cost estimations at the conceptual phase are reasonably precise ±25% (Petroutsatou et al. 2012). However, their accuracy must be increased to support an improved budget control (Wen 2010). Many project estimators thus would rather use a complex semidetailed estimation method (i.e., a method for breaking down project cost into cost items as much as possible according to the available data) to enhance the estimation accuracy instead of applying the above conventional methods. Unfortunately, using the semi-detailed method is comparatively time-consuming, and may be ineffective in the early project phase, during which design changes frequently arise.
Since an easy-and-quick conventional method cannot provide reliable cost estimations, numerous more complex models for conceptual cost estimation have been designed, such as artificial intelligence techniques, statistical based analysis techniques (Lowe et al. 2006;Trost, Oberlender 2003), and case-based reasoning techniques (Chou 2009;Koo et al. 2011;Hong et al. 2011;Jin et al. 2012;Kim 2013). Because of the uniqueness of each construction project, a generalized conceptual cost estimation method is not readily available for all construction projects. Additionally, a proposed conceptual cost estimation method should respond to the characteristics of a new project to gain management confidences (Clark, Lorenzoni 1997). Related research thus strives continuously to develop feasible models to improve both the evaluations of the cost effects of project characteristics and the early estimation accuracies for particular project types (Wen 2010).
Precise estimation of the project costs in the early phase must deal with three obstacles (Hsiao et al. 2012): (1) limited available data, (2) difficult to define relationships between the available data and total project costs, and (3) the need to capture as much as possible of the unique characteristics of a new project. To systematically deal with the three cost-estimation obstacles, this work proposes a new model that estimates building project conceptual costs using the component ratios, fuzzy adaptive learning control network (FALCON), fast messy genetic algorithm (fmGA), regression, and multi-factor evaluation methods.

Textbook models for conceptual cost estimations
This section reviews the conventional conceptual cost estimation methods, including the average (project-level) unit cost, cost indices, cost-capacity factors, and parametric estimation methods. When the average (project-level) unit cost method is applied, the total cost of a project is the product of the average project-level unit cost and the total floor area of the project (Hendrickson, Au 2003). Cost indices focus on cost changes over time, while costcapacity factors apply to changes in project size, scope, or the capacity of similar projects (Barrie, Paulson 1992).
The parametric estimation method adopts certain cost-relevant parameters (such as floor area, cubic volume, electricity generating capacity, steel production capacity, etc.) to describe a cost function in the screening estimate of a new facility (Hegazy, Ayed 1998;Hendrickson, Au 2003;PEH 2008). A cost function is developed according to one or more cost estimating relationships between the project cost (the dependent variables) and the cost-governing parameters (the independent variables) (Hegazy, Ayed 1998;PEH 2008). Several parametric estimation methods based on regression analysis and neural networks have also been suggested to increase the accuracy of conceptual cost estimates (Sonmez 2008;Gunduz et al. 2011). Notably, cost indices can also be incorporated into the above estimation method (Barrie, Paulson 1992).
Current textbook methods focus on total project cost, and usually do not examine cost divisions or cost item details. Although these methods can quickly obtain cost estimates by considering specific cost parameters, they often do not comprehensively deal with the three aforementioned obstacles to cost estimation.
For instance, Creese and Li (1995) designed a backpropagation NN application for timber bridge cost estimation. Moreover, Boussabaine and Elhag (1997) devised a neuro-fuzzy system related to FL for predicting construction project cost and duration. Meanwhile, Kim et al. (2005) established a cost approximation model for residential construction projects using GAs to optimize the parameters and weights of the back-propagation NN. Yu and Lin (2006) combined NN and FL to develop a Variable Attribute Fuzzy Adaptive Learning Control Network (VaFALCON) able to handle cost estimation problems involving missing attributes. Cheng et al. (2009) integrated GAs, FL and NN technologies to establish a construction cost estimation model with high predictive power. Generally, NNs enable learning from past data and generalization of solutions for future applications; FL allows tolerance for real world imprecision and uncertainty; and GAs can be applied to globally optimize certain parameters (Cheng et al. 2009).
Based on the advantages of AI techniques, Hsiao et al. (2012) developed a neuro-fuzzy cost estimation model for semiconductor hookup construction. Their work combined the FALCON and fast messy genetic algorithm (fmGA) to build a training algorithm to deal with complicated relations among cost parameters in historical cost data, and applied a three-point cost estimation method to assess the uncertainties related to the inputs (for executing the training algorithm) in predicting new project cost. However, their study indicated that three-point estimation method they proposed to deal with uncertainties involved in semiconductor hookup construction projects may not be suitable for capturing the characteristics of building construction projects.

Related techniques used in the proposed model
This section reviews the techniques related to the proposed model, including the component ratios method, FALCON, and fmGA.

Component ratios method
The component ratios method (also called the equipment installation cost ratios, plant cost ratios, or ratio estimating methods) assumes the existence of a ratio between major division costs and the total project cost (Barrie, Paulson 1992). Hence, when the major division costs and the ratio (= major division costs divided by total project cost based on historical data) are known, the total project cost can be calculated by dividing the major division costs by the ratio (smaller than 1.0). Focusing on certain major division costs can not only produce acceptably accurate estimates, but can also save estimation effort and time (Yu 2006).

FALCON
FALCON is a neuro-fuzzy system that uses a learning algorithm derived from neural network theory to deter-mine its parameters by processing data samples (Lin, Lee 1991). Initially, FALCON was designed to solve system control problems in electronics and manufacturing engineering (Lin, Lee 1991). FALCON is currently utilized to obtain construction knowledge owing to its diverse features, including the ability to handle uncertainties and its trace-back functions in problem solving (Yu, Lin 2006;Yu 2007).
Although FALCON has an excellent ability to learn from historical data, it also suffers a local optimality problem, namely, the solutions it obtains are optimal (either maximal or minimal) only within a neighboring set of solutions (Yu, Skibniewski 1999). A global optimum, which is the optimal solution among all possible solutions, is often preferred. Yu (2007) and Cheng et al. (2009) suggested adopting the messy GA (mGA) or fmGA to overcome such local optimal problems. Hsiao et al. (2012) applied the fmGA mutation and cut-splice operators to revise the fuzzy membership functions and fuzzy logic rules of FALCON to improve the cost estimation accuracy in semiconductor hookup construction projects. This investigation adopts the approach developed by Hsiao et al. (2012) to train historical cost data.

fmGA
GAs are search algorithms and search decision spaces for optimal solutions using methods based on natural selection and genetics (Holland 1975). Unlike simple GAs which use fixed length strings to represent possible solutions, Goldberg et al. (1993) developed the fmGA to apply messy chromosomes to form strings of various lengths. That fmGA allows variations in chromosome lengths benefits FALCON structural revision because the optimum precondition and consequence links structure for the fuzzy rule base may be obtained through fmGA evolution (Hsiao et al. 2012).

Levels of project costs
The costs of a building construction project are generally organized based on different levels of detail (Hendrickson, Au 2003). The highest level is the total project cost, while the second level is the cost division (or cost category) level, which summarizes the various cost divisions (for example, foundation and structure). The total cost of a second-level cost division itself is the sum of the costs of several third-level cost items. The cost of an item equals its unit cost (called "item-level" unit cost) multiplied by the item quantity, where the item-level unit cost is the fourth level cost. "Cost division" here means a secondlevel cost division.

Total project cost
In this investigation, total project cost, C Tot is derived by where C 1 -C 10 are the costs of cost divisions (1)-(10), respectively. The value t denotes tax, which is a percentage (constant value, usually 5% in Taiwan) of the construction cost. The cost of a cost division equals its "division-level" unit cost (that is, the cost required to complete a unit of work associated with a cost division) multiplied by the total floor area. That is: where U i is the division-level unit cost of cost division i, and Q represents total floor area. By integrating Eqn (2) into Eqn (1), Eqn (1) can be rewritten as: Finally, Eqn (3) is rewritten as follows to reflect the effect of inflation on total construction costs: where CCI year represents the construction cost index of the analysis year for a new project, and CCI base denotes the construction cost index of the base year. In this study, CCI base is set to 100 for the year 2006 based on current practice in Taiwan. Notably, Eqn (4) considers cost indices because it reflects cost changes over time. Alternatively, if project-level unit cost (U total ) of the total project (i.e., the cost required to complete a unit of work associated with a project) is available, then Eqn (4) can be rewritten as follows:

Cost database of historical projects
Forty-six residential building projects located in northern Taiwan provide a historical database. All projects were completed by a single general contractor during 1991-2004. The projects had the following major characteristics: (1) all were RC structures; (2) average total cost (including markup) was about NT $363,728,004 (roughly US $12.12 million; US $1 New Taiwan $30); and (3) average total floor area was 12,615 m 2 . Figure 1 displays the framework of the proposed model, which integrates four main sub-models. While the data of historical projects are trained using the FALCON-and fmGA-based sub-model, three other sub-models (i.e. component ratios, regression, and multi-factor evaluation) are designed to systematically guide cost estimators in order to assess the cost effects of the characteristics on estimating new project costs. The modelling steps are described as follows: 1.

Framework of the proposed model
Step 1: During the conceptual phase of a building project, usually only limited data are available for new project estimation. Thus, using the component ratios method to focus on certain major cost divisions is suggested. This study uses 46 historical projects to indicate the major cost divisions of building projects. Furthermore, the division-level unit costs of the major cost divisions (U i s) and the project-level unit cost (U total ) for each historical project are identified in this step (i.e., output-1 in Fig. 1). 2.
Step 2: The relationships between U i s and U total are complex. It is recommended that the FALCON-based training sub-model learn these relationships from historical projects. Since the FALCON operations may generate only a local optimum solution, the fmGAbased sub-model is used to optimize the FALCON solution to enhance the cost estimation accuracy. Following this step the training process for the proposed model is finished.

3.
Step 3: New project estimation should reflect the characteristics of a new building project. A regression sub-model (built on the 46 historical projects) is proposed to determine the regressed U i (RU i ; output-2 in Fig. 1) of each major cost division of a new project where the total project floor area (Q) has been predicted. 4.
Step 4: A multi-factor evaluation sub-model is utilized to assess the effects of the project characteristics on the RU i of each major cost division. A suggested U i (SU i ; output 3 in Fig. 1) thus is generated for each major division. 5. The SU i s of the major cost divisions are then treated as inputs of the trained FALCON-and fmGAbased sub-model to forecast U total of the total project (namely, output 4 in Fig. 1). Finally, the total project cost (C Tot ) can be obtained using Eqn (5) since U total is available.
The proposed model significantly differs from the neuro-fuzzy cost estimation model of Hsiao et al. (2012) in that the former applies the regression method and multi-factor evaluation method (rather than three-point cost estimation method used in Hsiao et al. 2012) to evaluate the cost effects of project characteristics for a new project. Using the multi-factor evaluation method facilitates the detailed assessment of project features to gain additional management confidence. Table 1 lists the average cost and percentage contributions of individual cost divisions to total cost for 46 historical building projects. Notably, the costs are given in New Taiwan dollars. Based on the component ratios method, four cost divisions are identified to have the highest cost percentages, namely 28.66%, 19.17%, 13.87% and 9.32% for the structure, internal finishes, MEP and foundation divisions, respectively. Since the sum of these percentages of cost account for a high portion (about 71.02%) of the total project cost, these four divisions are termed the major cost divisions. The division-level unit costs of the four major cost divisions (inputs) and the corresponding project-level unit cost (output) of each historical project serve as the training data in the FALCON-and fmGAbased training sub-model.

Step 2-1: Applying FALCON
A FALCON network structure comprises five layers of nodes and two links (Lin, Lee 1991). This study applies FALCON as follows (see the right part of Fig. 2): 1. Layer 1 (input linguistic nodes): The nodes in this layer merely transmit the input values (i.e., divisionlevel unit cost data or U i s) to the next layer directly. That is, the U i s of the four main cost divisions for each historical project are transmitted directly into the network. 2. Layer 2 (Input term nodes): The nodes in this layer calculate the membership functions. That is, this layer fuzzifies the input values (namely, U i s) from Layer 1. Fuzzy partitions are determined based on the clustering relationships of both U i s and U total (i.e., the major division-level and project-level unit costs).  links are represented as numeric values 0 (disconnected) or 1 (connected). 6. Layer 4 (output term nodes): The nodes in this layer perform two functions, right-left (only performed in the training stage) and left-right (performed in both the training and usage stages) transmissions. 7. Layer 5 (output linguistic nodes): The nodes in Layer 5 also perform right-left (only in the training stage) and left-right (performed in both the training and usage stages) transmissions. In right-left transmission the nodes in Layer 5 perform identically to those in Layer 1; that is, they feed training data (i.e., actual U total ) into the network. In left-right transmission the nodes at Layer 5 defuzzify fuzzy sets to provide a definite output value (i.e., estimated U total ).

Step 2-2: Applying fmGA
Following the FALCON operations, the fmGA mutation and cut-splice operators were used to optimize the FAL-CON parameters, including fuzzy membership functions and fuzzy logic rules, to improve cost estimation accuracy. This requires using the fmGA variable-length chromosome to revise the fuzzy partitions (namely, the number of input term nodes in Layer 2) and fuzzy decision rules (namely, the consequence links of Link 2) of FALCON.
The fmGA global search capability optimizes the parameters (means and spreads) of the membership functions in the FALCON input and output term nodes (Hsiao et al. 2012).
As presented in the left part of Figure 2, fmGA includes two operation loops, namely, an outer loop and inner loop. The completion of each outer loop marks an epoch, while each inner loop marks an era. As suggested by Feng and Wu (2006), this study defines the maximum number of eras (era_max) as four. Additionally, the maximum number of epochs (epoch_max) is defined as a preset criterion for terminating the fmGA evolution process, and is set to thirteen in this study. Furthermore, an inner loop consists of three phases (Goldberg et al. 1993): (1) the initialization phase -a population with sufficient strings is created to contain all possible building blocks (BBs) of the order k, where BBs refer to partial solutions to a problem; (2) primordial phase -bad genes are filtered out to maintain only chromosomes with good fitness (i.e., those containing only "good" alleles fitting to BBs); and (3) juxtapositional phase -good alleles (BBs) are rebuilt using cut-splice and mutation operations to form a high quality generation that generates optimal solutions. Figure 2 shows that the fmGA starts with the outer loop and generates a competitive template (CT). After completing one era, the CT is replaced by a new CT (with new alleles) that has the best fitness (i.e., the highest estimation accuracy) found in that era. The operational details for the three phases can be found in Hsiao et al. (2012).
After evolution, the chromosome with highest fitness is fed back to FALCON to calculate the cost estimates of the new input data. Best-fit chromosomes (with opti-mum fitness) are also maintained via fmGA to provide the population and CT of the next epoch. Steps 2.1 and 2.2 (FALCON and fmGA operations) are then repeated iteratively until the fitness value converges or has reached a preset maximum era number. The total evolution process runs 52 generations (= 4 eras × 13 epochs). Finally, the fmGA operations stop and the final optimal solution is obtained.

Regression sub-model
The regression technique permits simple analysis to determine the influence of parameters on project cost (Lowe et al. 2006). A regression sub-model using the aforementioned 46 historical projects is proposed to consider the impact of total floor area during estimation. This submodel generates the regressed division-level unit cost (RU i ) of each major cost division for a new project. Taking the example of the foundation cost division, Figure 3 displays the regression sub-model of the RU i for the foundation cost division. Table 2 lists the regression equations developed to calculate the RU i s of the four main cost divisions. In the equations, Q denotes the total floor area of a new project. Since the data of U i for the 46 historical projects spread  widely, the value of the R square of each regression equation is very low; for example, R square for the foundation cost division is only 0.0495. Nevertheless, the RU i derived from regression analysis can capture an important project characteristic, total floor area of a new project.

Multi-factor evaluation sub-model
Many factors influence the of a cost division. Based on the suggestions of Wang et al. (2012), Table 3 lists the factors that affect the RU i s of the four major cost divisions. Take the foundation cost division as an example.
The RU i of this division is dominated by four factorsground improvement (F1.1), retaining wall (F1.2), excavation method (F1.3), and soil type (F1.4). Each factor is classified into different factor conditions. For instance, the ground improvement factor (F1.1) indicates whether a project site requires ground improvement. Five factor conditions are identified: no ground improvement (no cost effect); improvement via compaction; improvement by well-point dewatering; improvement via consolidation; and improvement by soil replacement (high cost effect). This sub-model assumes the factors in a cost division i to be independent. The importance of each factor j is pair-wisely compared with other factors to obtain the weight (W i(j) ) of each factor j. The evaluation result of a factor j for a given cost division i is a qualitative or quantitative value (e.g., factor condition) that is mapped to a corresponding effect value ((E i(j) ) to represent the effect of a factor on the RU i of a cost division i. Multiplying the effect value ((E i(j) ) by its weight (W i(j) ) yields a weighted effect value (S i(j) = W i(j) ×(E i(j) )) of a cost division i. Summing the weighted effect values of factors obtains the expected effect value (S i = ΣS i(j) ) of a cost division i. This process is repeated for each major cost division.
Next, considering the factor effects on the RU i of a cost division i, a suggested division-level unit price (SU i ) is calculated, as follows: where S i -0.5 = 0 indicates that factors have an average effect on the RU i , and therefore suggests the SU i is the regressed unit cost (namely SU i = RU i ). Similarly, S i -0.5 above zero leads to SU i higher than RU i , and vice versa.

Steps involved in new project estimation
The left side of Figure 4 illustrates the detailed steps of applying the proposed model to estimate the conceptual cost of a new project, while the right side shows the related sub-models involved in each step. For instance, the algorithms of multi-factor evaluation sub-model are applied in Steps 5.1-5.3 and Step 5.5 to calculate the influence of the expected effect value on the regressed division-level unit cost (RU i ).

Computer implementation
The

Description of case projects
The proposed method is applied to three residential housing projects (namely projects I, II, and III) that were also used in the study of Wang et al. (2012). These three projects and the aforementioned 46 historical projects shared a common contractor. Table 4 lists the main characteristics of the three case study projects. For example, project I, made of RC, has 14 floors and three underground floors, and a total floor area of roughly 7,363 m 2 . Completion of Project I, achieved in mid-1999, took 20 months. The following subsections examine the evaluation results.

Evaluation results for project I
First, Steps 5.1-5.3 are conducted, as shown in Figure 4. Table 5 lists the calculated weights, effect values, weighted effect values and expected effect values for each major cost division by applying the multi-factor evaluation sub-model. For instance, the table shows that the expected effect value (S i ) for the foundation cost division is 0.8002 for project I.
Next, Step 5.4 is performed. Restated, given the total floor area (7,363 m 2 ) of project I, the regressed U i (RU i ) of each major cost division i is generated using the regression sub-model. See the left part of Table 6. For example, RU i of the foundation cost division is $3,118/m 2 , as listed in Table 6.
Step 5.5 then calculates the suggested U i (SU i ) for each major cost division. See the right part of Table 6. For where: U total = $34,492/m 2 ; Q=7,363 m 2 ; CCI year = 77.06; CCI base = 100; t = 0.05. Notably, the actual cost for project I was $186,492,943. Thus, the total cost approximated by the proposed model is 10. 18% (= (205,485,071−186,493,943) / 186,493,993) higher than the actual project cost. Eqn (8) defines the estimation accuracy. That is, the proposed model achieves estimation accuracy of approximately 89.82% (= 1-10.18%):

Evaluation results for projects II and III
Similarly, the evaluation steps are also applied to case projects II and III. Table 7 summarizes the evaluation results. Restated, the proposed model achieved estimation accuracies of 89.82%, 92.68%, and 92.96% for projects I, II, and III, respectively.

Sensitivity analysis
To more thoroughly elucidate the behavior of the proposed model, sensitivity analysis (Park 2011) is conducted to evaluate how much the estimation accuracy changes in response to a given change in an input variable (i.e. expected effect value; S i ) of the proposed model in the above case projects. Notably, S i value represents the combined cost effects of all factors (i.e. project characteristics) on each major cost division i. Figure 5 plots the sensitivity graphs based on the calculated estimation accuracies of various scenarios for the three case projects. The sensitivity analysis begins with a base-case scenario, which is developed using the initially-estimated values of S i of four cost divisions. For instance, Table 5 lists the S i values of the base-case scenario for project I, in which the resulting estimation accuracy is 89.82%. The value of S i is then changed by several specified percentage points (ranging from -40% to 40%) above and below the initially-estimated value for each of the four cost divisions; meanwhile, the other variables remain constant. Then, project cost and estimation accuracy based on the changed S i values of four cost divisions are calculated.
On average, when the S i value is underestimated, the accuracy diminishes (as shown in the left portion of Fig. 5). Generally, the slope of each line in Figure 5 indicates that the estimation accuracy is sensitive to changes in the S i value, implying that considering the effects of project characteristics (i.e. multi-factor evaluation submodel) is vital to the estimation accuracy of the proposed model.

Analysis of the benefits of using fmGA
As mentioned earlier, despite its learning capability (based on neural network theory) from historical cost data to pre- Fig. 5. Sensitivity graphs for the three case projects dict U total , FALCON fails to optimize its learning parameters (including fuzzy membership functions and fuzzy logic rules) in order to improve U total . Thus, the proposed model incorporates fmGA (with its ability to optimize the learning parameters) into FALCON in order to resolve a local optimality problem of FALCON.
Notably, the proposed FALCON-and fmGA-based sub-model forecasts U total (not the total project cost), which is then used to estimate the total project cost (C Tot ) by using Eqn (7). Hence, the merits of using fmGA in this study are validated from the perspectives of examining the accuracy of the resulting C Tot and assessing the training error rate (i.e. error rate during the training process) of the predicted U total .

Examining the accuracy of resulting C Tot
Two modified models (without fmGA) are designed. The first modified model uses only FALCON, while the second modified model incorporates simple GAs (i.e. with the training ability of sGA) into FALCON. Notably, the proposed and two modified models have the same three sub-models: component ratios, regression, and multi-factor evaluation. Table 8 compares the proposed model and the modified models in the three case projects with respect to the evaluation results of C Tot . Based on those results, the following observations are made: In sum, accuracy of the resulting total project costs for incorporating fmGA into FALCON slightly excels that of using only FALCON and FALCON+sGA in the three case projects.

Assessing the error rate of the predicted U total
FALCON in the first modified model lacks training ability, explaining why an estimated U total of new project estimation is fixed and cannot be further improved. Thus, the advantage of the training ability of fmGA is illustrated by comparing the proposed model with the second modified model (with the training ability of sGA). Figure 6 displays the average error rates of estimated U total using the proposed model and the second modified model for 46 historical projects during 200 training generations. Notably, the error rate equals 1 minus the estimation accuracy of U total , and it is represented as follows: Error rate (%) = ABS (Estimated -Actual ) 100%. Actual total total total U U U × In each generation, the value of U total for each historical project is estimated and an error rate is derived using Eqn (9). The average error rate is the average value of the error rates for all 46 historical projects. Figure 6 indicates that the average error rates of the resulting U total for both models (either using fmGA or sGA) decrease as the models are trained for more generations. However, the proposed model using fmGA performs better than the second modified model (using sGA). Namely, its error rate decreases by 0.32% (= 3.53% -3.21%) over that of the second modified model after 200 training generations.
Moreover, according to Figure 6, training the model for 33 generations can achieve a stable average error rate. Restated, conducting additional training generations negligibly affects the ability to further diminish the average error rates.
While Figure 6 examines the training error rate of U total for 46 historical projects, Table 9 compares the training error rates of U total for the new projects (i.e. the three case projects). Restated, according to Table 9, the average Average accuracy of C Tot : 91.21% error rate of U total for the three case projects for the proposed model decreases by 1.99% (= 9.72% -7.73%) and 1.24% (= 8.97% -7.73%) over that of the first modified model and the second modified model, respectively. In summary, estimation accuracy of the resulting U total for incorporating fmGA into FALCON exceeds that of using only FALCON and using sGA+ FALCON in the three case projects.

Comparisons with three conventional methods
The conventional average (project-level) unit cost method, the component ratios method, and the linear regression method are applied to the case projects for comparing the proposed model. Project I also illustrates the workings of these three conventional methods.  Average error rate of U total : 8.97% The average unit cost method obtains the estimated average unit cost of the total project of $28,588/m 2 (calculated according to the 46 historical projects). Given that the total floor area of project I equals 7,363 m 2 , the estimated project cost (before considering the tax and cost index) equals $210,493,444 (= 28,588 × 7363). The estimated project cost based on Eqn (5) then equals $170,315,060. Since the actual project cost was $186,492,943, the estimation accuracy of the average unit cost method is 90.58% using Eqn (8).
The component ratios method (Barrie, Paulson 1992) identifies the foundation, structure, internal finishes, and MEP as the major cost divisions. The averaged ratio between the sum of the costs of these major divisions and the whole project cost is 71.02% for the 46 historical projects. The average i U of each major cost division can then be used to calculate the estimated cost of each division, as shown in Table 10.
For instance, the average i U and estimated cost of foundation cost division are $2,868 and $21,117,084 (= 2,868 × 7363), respectively. Next, since the sum of the estimated costs of these four cost divisions is $142,893,741, the total cost (before considering the tax and cost index) where the value of R square of the derived regression line is 0.8538. Following consideration of the tax and cost index for project I, the total project cost is calculated as follows: Since the total floor area (Q) of project I equals 7,363 m 2 , the estimated project cost equals $166,201,535 according to Eqns (10) and (11), resulting in an estimation accuracy of 89.12% in which Eqn (8) is used. Table 11 summarizes the evaluation results using the proposed model and the three conventional methods in the three case projects. Based on those results, the following observations are made: 1. The proposed model achieves an average estimation accuracy of 91.82% for three projects. 2. The average estimation accuracies of the three case projects are 90.58%, 86.57%, and 88.87% using the average unit cost, component cost ratios, and linear regression methods, respectively.
3. The proposed model increases the average estimation accuracy by approximately 1.24% (= 91.82% -90.58%), 5.25% (= 91.82% -86.57%), and 2.95% (= 91.82% -88.87%), over that of the average unit cost method, component ratios method, and linear regression method in these three case projects, respectively. 4. The three conventional methods tend to under-estimate project costs because their estimated project costs are lower than the actual project costs in all three case studies.
To further verify the proposed model, a 16-fold cross-validation is performed. Restated, the 49 projects (the abovementioned 46 historical projects plus three case projects) are divided into 16 groups. Of the 16 groups, a single group (consisting of three projects) is retained as the validation projects for testing a particular method, and the remaining 15 groups (46 projects) are used as training projects. The cross-validation process is repeated 16 times (i.e., 3 × 16 = 48 testing projects). Evaluation results of the 48 testing projects are then averaged.
However, only the average unit cost method, component ratios method, and linear regression method are implemented in this cross-validation process, owing to the availability of the data required for incorporating these methods. The proposed model is not adopted in this crossvalidation process because detailed data of the other 46 historical projects are unavailable in order to apply the multi-factor evaluation sub-model of the proposed model.
The lower portion of Table 12 summarizes the evaluation results using the three conventional methods in this cross-validation process (i.e. 48 testing projects). Based on those results, we conclude the following: 1. In the 48 testing projects, average estimation accuracies of the three case projects decrease by 15.38% (= 90.58% -75.20%), 13.81% (= 86.57% -72.76%), and 15.18% (= 88.87% -73.69%) when using the average unit cost, component cost ratios, and linear regression methods, respectively. 2. The three conventional methods yield an average estimation precision of around 73.88%, which is similar to that (±25%) of Petroutsatou et al. (2012). 3. Again, the three conventional methods tend to underestimate project costs in this cross-validation process.