TECHNICAL COMPARISONS OF SIMULATION-BASED PRODUCTIVITY PREDICTION METHODOLOGIES BY MEANS OF ESTIMATION TOOLS FOCUSING ON CONVENTIONAL EARTHMOVINGS

Planners in construction accordingly have been trying to predict productivity which is a significant criterion for construction performances prior to commencement of operations. Many various methods solely based on deterministic calculations, simulation techniques, statistic methods, or other decision making tools, have been introduced so far. In terms of application, however, these methods depending on one estimation tool have several limitations of each method. The present study presented new predictive models: 1) Model A, combining simulation and a multiple regression (MR) technique, a general estimation technique based on statistic concepts and 2) Model B combining simulation and an artificial neural network (ANN) technique, a powerful tool for prediction in engineering basis. Quantified reliability comparisons between actual and predicted productivity data by the presented models were conducted in this study. It found that a predictive result by Model B was closer to actual productivity data than that by Model A was. Model B based on the ANN analysis, however, showed the difficulty in technical implementation with a view of practical applications. These comparisons revealed the reliability of the predictive results and the implementation efficiency of each model. This study addresses basic characteristics and technical comparisons of each methodology simulation-based MR or ANN techniques. The findings allow researchers to create or develop a new predictive methodology for specific operations with shortage of actual datasets collected from jobsites. Technical performance comparisons of results between MR and an ANN, representative estimation tools, enable users to select a more appropriate tool considering specific situations. The suggested methodology in this study can also be extended to apply to not only earthworks but also other construction operations.


Introduction
Productivity in construction is considered an important criterion to evaluate operational performance by specific construction activities. Productivity prediction prior to actual commencement of operations is an important task that planners or managers in construction have made a top priority from the viewpoint of management (Capachi 1987;Schaufelberger 1998;Kandil and El-Rayes 2005).
When basic planning is conducted, planners refer to their own experiences or historical data in order to predict productivity as accurately as possible prior to commencement of site work. Reference manuals representing historical data of cost and productivity provide basic information that allows planners to predict the productivity. However, the information, which is comprised of average values, provided by the reference manuals is not easily applied to various site conditions where numerous unexpected factors are at play (Schaufelberger 1998).
The need for reliable prediction of construction productivity has long motivated researchers to investigate appropriate methods. However, many methods created thus far have limitations such as unreliable prediction and, difficult implementation (Han 2005;Han and Halpin 2005;Han et al. 2006). This study is conducted on the basis of the previous researches by Han and Halpin (2005), Han et al. (2006), Han et al. (2008) in order to resolve the problems and limitations on the suggested methodology combining simulation and multiple regression (MR) techniques.
This study suggests new methods for productivity prediction with the use of construction simulation as a tool for data generation, and a MR analysis and an artificial neural network (ANN) analysis as tools for easy and ISSN 1392-3730 print/ISSN 1822-3605 online reliable prediction. This study is also capable of providing different characteristics and technical performance comparisons by two different estimation techniques, a MR and an ANN.
An earthmoving operation was chosen as the construction activity used for the target operation in this study. The reason for selecting an earthmoving operation is that it is a fundamental operation of civil and architectural construction projects. In addition, it is simple and easy to collect data, since it is composed of relatively fewer different activities than are other construction operations. Over the past 100 years, earthmoving operations have involved the same basic work procedures (i.e., surveying, staking, excavating with an excavator or other equipment, hauling by a hauler, filling, and compacting by a compactor or other equipment). These work procedures have not changed over time, although there have been minor updates to specifications of some equipment. Despite that similar or even identical procedures have been used for a lengthy period of time, it remains difficult to predict the productivity of this simple operation (Han and Halpin 2005;Han et al. 2006;Han et al. 2008).
This study created and developed a new prediction methodology that combines several tools: construction simulation and either MR analysis or an ANN analysis. Several steps are carried out: construction data collection, data generation, and productivity prediction based on estimation tools. For generation of data that serves as input data for implementing an estimation tool, a construction simulation was used in both estimation tools. MR and an ANN were employed as estimation tools using the generated data. Quantified comparisons of the prediction accuracy between the MR and the ANN techniques were also presented. A diagram illustrating the research method employed in this study is presented in Fig. 1.

Method for productivity prediction
Planners have relied upon three methods to predict productivity based on: 1) historical data; 2) references, such as RS Means cost data by Reed Construction Data, Inc. and equipment performance handbooks; 3) methods such as construction simulation or statistic analysis. Methods based on historical data or references are typically referred to as deterministic analysis (Kannan et al. 1997;Kannan 1999).

Deterministic analysis
Deterministic analysis was developed for simple calculation of the productivity of earthmoving operations based on equipment characteristics, equivalent grades, and the haul distance provided by performance handbooks published by most manufacturers. A deterministic model primarily focuses on the use of time duration, which is a fixed or constant value, with the assumption that any variability in the task duration is ignored (Halpin and Riggs 1992). Authors described an example of a simple deterministic model for earthmoving operations, consisting of a scraper for hauling and a pusher dozer for loading. Deterministic analysis tends to overestimate actual field productivity.

Simulation techniques
With rapid advances in computer technologies, researchers have tried to create simulation models to help construction engineers predict construction productivity prior to commencing actual activities. Simulation models have been extensively developed and broadly used as management tools within manufacturing and business industries. The CYCLONE (CYCLic Operation Network) system approach was developed in the early 1970s. This system demonstrated potential for modeling and simulation of repetitive construction processes. In 1982, Lluch and Halpin developed a microcomputer version of CYCLONE named MicroCYCLONE. Many improvements to MicroCYCLONE have been developed in the past two decades. In general, a construction simulation is conducted in several steps (i.e., site observation, duration and resource data collection, modeling using CYCLONE, running simulation, and sensitivity analysis) (Kannan 1999;Wang and Halpin 2004). Martinez and Ioannou created STROBOSCOPE (State ResSorce Based Simulation of COnstruction ProcESSES), which adopts the CYCLONE methodology such as normal, queue, and combi activities (Martinez and Ioannou 1994;Ioannou and Martinez 1996). WebCYCLONE, another variation of the CYCLONE methodology, simplifies the simulation modeling process and makes it accessible to construction practitioners with limited simulation experience (Halpin and Riggs 1992).
Simulation techniques are currently improved through many researches for overcoming practical limitations to be applied to real operations. Symphony, one of simulation systems, developed by Hajjar and AbouRizk (2002) was the unified modeling technique under an integrated development environment (Hajjar and AbouRizk 2002;Mohamed and AbouRizk 2005). Based on this technique, AbouRizk and his colleagues presented the developed simulation methodologies based on intelligent decision supports for easy usage by practitioners in fields AbouRizk 2005, 2006;van Tol and AbouRizk 2006). Another effort pursuing more reliable predicting results was presented as a form of situation-based simulation models based on the cause-and effect relationships by Choy and Ruwanpura (2006). These all research accomplishments were mainly focused on improvement of simulation techniques to be applied to construction field with more efficiency. The basic elements used in the CYCLONE method are shown in Table 1.

Multiple regression analysis
Regression analysis is the most commonly performed statistical procedure for prediction of certain tendencies based on observed datasets. The ultimate goal of a regression analysis is not only to find the values of parameters, but also to determine what type of mathematical function fits best. Using this tool, researchers have been able to investigate and understand the relationships between explanatory variables and a result called a response variable (Devore 2000). Smith (1999) presented stepwise MR techniques to investigate the relationships between earthmoving operation conditions and productivity and to develop a deterministic model allowing earthmoving operations to be planned for many different situations. This MR model using input data taken from four different highway construction projects demonstrated that there is a strong linear relationship between operation conditions and productivity (Smith 1999;Han et al. 2008).

Artificial neural network technique
An ANN is an extremely powerful tool that provides a computing environment in the form of a highly interconnected network of many simple processing units capable of acquiring, representing, and applying mappings from one space of information as inputs to another space as outputs. An ANN is composed of simple processing elements, called neural network artificial neurons, an architecture comprised of connections between the elements, and weights associated with each connection. The ANN performs computations by propagating changes in activation between its processing elements over weighted connections (Tsoukalas and Uhrig 1997).

Function Node
It is inserted into the model to perform special function such as counting, consolidation, marking, and statistic collection.

Accumulator
It is used to define the number of times the system cycles.

Arc
Indicates the logical structure of the model and direction of entity flow. Shi (1999) demonstrated the use of an ANN to predict earthmoving production and presented an easy method for a user who does not have a background in computer simulation to predict the productivity of earthmoving operations. However, the results of the neural network system were not validated through a comparison with actual data collected from job sites. In addition, there is a lack of information about the detailed components, including the architecture of the network (Shi 1999). Schabowicz and Hola (2007) and Hola and Schabowicz (2010) investigated recently the productivity of earthworks using ANN. These researches presented the efficiency of the ANN as a feasible tool capable of the productivity estimation in construction. This study suggests the additional methodology of the input data generation using a simulation technique in case of the shortage of the collected construction data unlike other researches mentioned previously.

Limitations of the conventional methods for productivity prediction
Many studies have presented the limitations of existing productivity prediction methods. A deterministic analysis does not present actual productivity based on real situations such as idleness and loss of productivity due to random variation in the system activity duration (Halpin and Riggs 1992). While simulation methods are able to overcome these limitations, there are still considerable complexities involved in making necessary models reflecting actual operational situations. Mathematical relations between productivity and operating conditions can be determined through a MR analysis, and such relations would then be more easily applied than other techniques. A large amount of input datasets covering various actual conditions necessitates a reliable regression model. However, in reality, acquiring a large amount of actual datasets from various construction job sites presents practical challenges. Implementation by an ANN has the same limitation mentioned above in practical application caused by insufficient input datasets (Han et al. 2006;Han et al. 2008). It noted that limitations of the conventional methods were mainly caused by the difficulty of actual data collection from jobsite.

Data collection and data generation
In compliance with the need of a new methodology enabling straightforward prediction of productivity, this study suggests a methodology that combines a simulation method and an estimation tool, either a MR analysis or an ANN analysis. The simulation method is used for gener-ating a large amount of data that is then used as input data in creating a MR model or an ANN model. The methodology of MR and an ANN respectively based on a construction simulation provides a means of predicting productivity as well as establishing the relationship between operating conditions and productivity.

Data collection
As the first phase, actual raw datasets were collected from construction sites where earthmoving was conducted in West Lafayette and Lafayette, Indiana. Table 2 describes the six construction projects where data collection was conducted (Han 2005;Han et al. 2008). From the projects described in Table 2, raw datasets were collected for four or five hours in two or three consecutive days at each jobsite. A total of 23 separate hourly data including a series of multiple cycles were collected. Each dataset represents a remarkable sample of earthmoving operations involving both a two-link system composed of an excavator and trucks and a three-link system composed of an excavator, a dozer, and trucks. Video of the earthmoving operations in the jobsites was recorded, providing consistent observations for the analysis of the event times of each piece of equipment. The event times analyzed in the video tapes made it possible to determine the cycles times of each activity using a stop watch analysis, interviews, and field measurement (Everett et al. 1998). Sieve analysis using soil samples taken from the jobsites provided basic information regarding the soil characteristics. The travel time, loading time, machine break time, and resurveying time were acquired through observations and analyses. Interviews with site personnel and field measurements provided the basic conditions of the jobsite, such as hauling distance, equipment capacity and the number of pieces of equipment and probabilities of machine break and resurveying (Han 2005;Han et al. 2008). Table 3 summarizes the data collected from the selected jobsites.

Simulation
WebCYCLONE, a construction simulation tool, was run using the collected raw datasets. The data obtained from the simulation are used as preliminary data that are expanded to a large number of datasets to be utilized as input datasets for implementing a MR or an ANN analysis. Fig. 2 demonstrates one of the simulation models based on a dataset collected from the construction site for Project A. This simulation model was designed to measure the productivity in terms of truck-dumps per hour. It was noted in the simulation model that 4.55% of interruptions by the on-site surveyor were observed during the excavation process. These interruptions were due to restaking the knock-down stacks. This kind of interruption is generally observed in all sites where earthmoving is conducted. The result of the simulation model, which reflects actual situations, indicates that this interruption causes a delay of the cycle time and eventually lowers productivity. The duration associated with various cycle times, such as loading the earth to truck, trucks' traveling and returning were assumed to fit a beta distribution. According to a study by AbouRizk and Halpin (1992), these distributions could be used in modeling random input processes of construction duration periods for simulation studies.

Comparison of actual data and simulated data
In order to establish a reliable prediction method, the collected raw data were replaced by the data obtained from the construction simulation, since it is difficult for users to collect a sufficient amount of data by actual measurement and site observation from jobsites.
The reliability and confidence of replacing the actual data with the simulated data could be verified by statistical analyses. The Wilcoxon signed rank test is a method for checking the similarity of two samples. It tests the median difference between pairs of datasets in two samples where a normal distribution is not assumed. Since the difference between two samples is calculated, the simulated data can be measured on an interval scale that corresponds to with the degree of difference from the actual data (Devore 2000).
When the data consists of pairs of (X 1 , Y 1 )…, (X n , Y n ), the differences D 1 = X 1 -Y 1 ,…, D n = X n -Y n are checked with testing hypotheses on the expected difference µD, by using the Wilcoxon signed-rank test on the Di's (Devore 2000).
A Wilcoxon signed-rank test of the difference between the actual data and the simulated data was conducted using the SAS program. Based on the test assumptions, the null hypothesis and rejection regions for a level α test are as follows: Null hypothesis: H0: D (X i -Y i : Absolute magnitude between X i , the actual measurement, and Y i , the simulation models) = 0; Alternative hypothesis: Ha: D (X i -Y i ) ≠ 0. The UNIVARIATE procedure provided by the SAS program was conducted to test the statistical values. The P values were used for investigation of acceptance or rejection of the null hypothesis. Halpin and Riggs (1992) illustrated that productivity values vary with the means by which those values are obtained. According to their study, the productivity value obtained through actual measurement has approximately 10% points of loss in deterministic productivity due to bunching caused by random travel times. In contrast with deterministic productivity, simulated productivity is estimated with consideration of the bunching effect and variances in travel times, and it generally has a higher value than the productivity value obtained through actual measurement. The value of the simulated productivity in this study was between that of the deterministic productivity and that of the actual productivity (Halpin and Riggs 1992).
It is assumed that simulated productivity locates in five percentage points, which can be a criterion located between zero and 10% of the average range of differences by the deterministic productivity and the actual productivity, higher than actual productivity based on the information produced from the study by Halpin and Riggs (1992). A Wilcoxon signed rank test was conducted to compare two groups of datasets: the value of the simulated productivity and the 5% higher value than the actual productivity (Han 2005;Han et al. 2008). Table 4 shows the results of the Wilcoxon signed rank test for the pairs of data described above.

Data generation
The comparison of simulated data and actual data based on a statistical methodology presented in the previous section showed that the simulation data could be used as a substitute for the limited amount of raw data collected from jobsites. The next step is to generate datasets using a simulation methodology. The generated datasets by the simulation serve as input data in estimation tools such as a MR and an ANN analysis. A guideline must be established prior to input data generation (Han 2005).
Interviews were conducted with site personnel and site observations were carried out to identify the main factors, which varied depending on actual site conditions and influenced productivity significantly. The following four factors among 17 factors listed in Table 3 were selected: 1) the probability of resurveying, 2) the number of trucks, 3) the number of excavators, and 4) the resurveying time. All the other factors were assumed to have been invariable in a single dataset collected within one hour. Variable durations, such as the loading time and the travel time, were implemented using duration input modules in the simulation methodology. The probability of machine breakdown was excluded from the main variable factors, because the probabilities of this event were so low that they would not have influenced productivity (Han et al. 2008).
− Several guidelines, listed below, for input data generation based on the simulation methodology were determined: − The low and high levels of the numbers of trucks and excavators in each dataset were determined by analyzing the collected datasets and through site observations; − The specific ranges of the low and high levels of the probability of resurveying/checking and the resurveying/checking time were determined from the actual values of the collected data and the mean values of distribution of all datasets in each system; and − The numbers of generated datasets derived from one actual dataset must be identical so that all the datasets were evenly reflected. To determine the low and high levels of the probability of resurveying and the resurveying time, the best-fit distributions were investigated to find the mean value, which was assumed to function as the low or high level for data generation.  The mean values, which were derived from the bestfit distributions shown in Figs 3 and 4, are listed in Table 5. The number of resources associated with the simulation methodology was determined from the range of availability of such resources in the jobsites. This information was determined through interviews with site personnel. The low or high levels of the number of equipment were determined depending on the minimum or maximum number of equipment available at the jobsites. Based on the guideline described previously, one dataset collected from the actual jobsites generated 192 datasets (i.e., combinations of 2 x 2 x 3 x 16 for cases under the two-link system or 2 x 2 x 2 x 3 x 8 for cases under the three-link system). This process, therefore, generated 4,416 datasets (i.e., 23 actual datasets x 192 simulated datasets / one actual dataset) (Han 2005;Han and Halpin 2005).  As stated previously, a total of 17 factors that were presumably considered to affect the productivity were determined by interviews and site observations, as listed in Table 2. During the interviews with site personnel, it was noted that data correlated with several factors among 17 factors can be seldom collected depending on actual site conditions. Some factors that could not be identified before commencing actual operation were also included in these 17 factors. Owing to these problems the established methodology would not be appropriate for predicting the productivity, the ultimate goal of this study. Three model types were therefore considered and investigated in order to resolve these problems: 1) Model I: a full model with 17 factors, 2) Model II: a reduced model with 10 factors, and 3) Model III: a reduced model with 7 factors (Han 2005).
Model I was associated with all 17 factors, which were regarded to affect the productivity. Accordingly, Model I was expected to yield the most reliable prediction results. However, the factors that were included in Model I, such as the probability of resurveying and resurveying time, the probability of machine break time, machine break time, and so on, could not be identified before actual operations started or resumed. Thus, Model I was limited as a prediction tool. On the contrary, the reduced models, Models II and III, were expected to yield prediction results, because they were composed of only factors that could be identified prior to actual operations. The reduced models were separated into one model with sufficient information, named Model II, and one model with insufficient information, named Model III. The criterion determining sufficient or insufficient information was whether three specific factors were included in the models or not. These three variables were excavator operator experience, excavator age, and truck age, which are considered in Model II. On the other hand, these three factors are not considered in Model III. These three factors may be identified or not, depending on different management levels (Han 2005;Han et al. 2008). The factors used in each model are shown in Table 6.

Modeling by MR analysis
A MR model provides the prediction of specific results, demonstrating the relationship between a response variable, i.e., in the present study, the productivity of each dataset, and the explanatory variables, which are the factors (i.e., travel times, loading times, and hauling distance) affecting the productivity. In order to achieve the best-fitted regression model, three steps were conducted  (Devore 2000;Neter et al. 1996;Han et al. 2008). Table 7 shows the finalized MR models (I, II, and III) obtained through the three steps mentioned above. They present mathematical relationships between the explanatory variables, denoted as predictors, and a response variable. These mathematical relationships allowed the user to predict the productivity when input data reflecting actual situations is provided prior to actual commencement of site work (Han et al. 2008).

Modeling by ANN analysis
A well-trained ANN with sufficient input data can provide appropriate estimation results (Tsoukalas and Uhrig 1997). The researches by Hola 2007, Hola andSchabowicz 2010) introduced the usage of ANN for productivity prediction based on a conjugate gradient algorithm (BPNN-CGB) with five input data; number of excavators, number of trucks, excavator bucket capacity, truck loading platform capacity, and type of road surface.
As stated previously, the shortage of raw data, one of problems for usage of the ANN, was resolved by data generation based on a simulation. The architecture of the network used in this study was a multi-layer "feedforward" network. The ANN model in this study was designed with two hidden layers with 50 neurons and 20 neurons, respectively, through numerous experiments. Two "tansig" functions were adopted as the first two transfer functions of the two hidden layers and one "purelin" function was adopted as the function of the last output layer. As a training algorithm, "resilient backpropagation (tainrp)" was adopted as it provides useful functionality for multi-layer networks. "Sigmoid" transfer functions compress an infinite input range into a finite output range. Most backpropagation algorithms tend to have small changes in the weights and biases even though the weights and biases are far from their optimal values. The purpose of the resilient backpropagation (Rprop) training algorithm is to eliminate these limitations (Demuth and Beale 2001). Resilient backpropagation allows the network to approach the goal, denoted by the differences between a target value and the output with a steep gradient.
In addition, functions of pre-processing and postprocessing, named "premnmx and postmnmx" were added in this study. These functions are useful to scale the inputs and targets such that they always fall within a specified range (Han 2005). Fig. 5 shows a basic diagram of the network, which was optimally designed for accomplishing the goal of this study. For selection of datasets for training and validation, one-tenth of the datasets generated by the simulation models were used for validation and the remaining were used for training. For instance, a total of 4,416 datasets was divided into 3,975 datasets and 441 datasets for training and validation, respectively. Model I is reviewed in Figs 6 and 7, which show the procedures and results based on an ANN as an example. Model I was trained based on resilient backpropagation with pre-processing and post-processing with a 0.001 error goal and 20,000 maximum epochs (Han 2005). Fig. 6 shows a goal graph showing 0.001 as the range of the errors, which were the difference between the optimal target value and the output reaching the goal as 0.001. Fig. 7 shows that the validation results were wellfitted with the optimal target value. The R value of 0.997, shown in Fig. 7, is close to 1, which also indicates that the trained model reliably estimates the optimal result. 5. Comparison of results by two prediction models

Comparison of results by the fitted predictive model A: MR analysis
The fitted predictive model A, a predictive model using a MR analysis, employs procedures based on the construction simulation, data generation, and a MR analysis. A comparison between raw data collected from jobsites and the results yielded by the fitted predictive model A is presented in this section. A comparison of these two values provides an assessment of the fitted predictive model. The comparison rates shown in Table 8 represent the percentage rates of the predicted productivity by the fitted predictive model A to the actual productivity measured directly from jobsites (Han et al. 2008).
According to Table 8, the average comparison rates of model I, model II, and model III were 99.06%, 91.23%, and 90.89%, respectively. The differences among the average comparison rates of each model also indicated that the factors that were included in model I and excluded in models II and III, i.e., the probability of resurveying and resurveying time, the probability of machine break time, machine break time, and so on, significantly influenced the predicted results. The factors that were included in model II and excluded in model III, such as experience of excavator's operator, age of excavator, and age of trucks, did not have a significant influence on the predicted results (Han et al. 2008).

Comparison of results by the fitted predictive model B: ANN analysis
The results by the fitted predictive model B using ANN analysis were compared to actual productivity calculated based on raw collected data. Table 9 presents a comparison of the actual productivity and the predicted productivity by the fitted predictive model B.
As listed in Table 9, the average comparison rates of model I, model II, and model III were 103.06%, 98.80%, and 99.28%, respectively. Unlike the fitted predictive model A, there were not significant differences among the average comparison rates of model I, II, and III. Focusing solely on the average comparison rates of each model, the average comparison rate of model III was closer to 100% than was that of model I. The standard deviation of model I, however, is clearly less than those of model II and III. This observation indicates that the predictive results by model I, which was composed of all 17 factors, was more precise and stable than those of the other models.

Comparison of the predictive results between the fitted predictive models A and B
As presented on previous, this study provided two comparisons of the fitted predictive models A and B. According to these comparisons, it noted that the predictive results of predictive model A in model I, which included all 17 factors, were more reliable than those of model B. The predictive results in models II and III, however, showed that predictive model B provides more reliable results than model A. This analysis indicates that model B would be more useful in productivity prediction, since models II and III composed of factors that can be identified before commencing actual operation, could be used for productivity prediction under actual site situations. However, the standard deviations of the comparison rates in Tables 8  and 9 show that further improvement of both models is required.
There are performance differences in the two estimation tools in terms of implementation. The MR analysis included in the fitted predictive model A eventually provided a mathematical relationship between the factors and the predictive productivity. This model would enable a user to obtain the predictive result by merely inputting the factors which is the information under specific site conditions. However, implementation of the fitted predictive model B, which includes an ANN analysis, is difficult compared to predictive model A, since professional skill for running the MATLAB program (Demuth and Beale 2001) is required for implementation (Han 2005).

Conclusions
Productivity prediction is an important issue to construction managers and planners. A literature reviews conducted in this study revealed that many studies have been performed to date with the goal of improving productivity prediction results. Most methodologies developed thus far were based on one of various methods, and present several limitations in terms of practical applications. This study accordingly presented a new methodology that combines methods that function correlatively. The methods used in this study were actual data collection, data generation using construction simulation, and estimation tools, that is, MR and ANN analysis. Two reliable estimation tools, MR and ANN analysis, which have been widely used for prediction results in engineering, serve as the last step correlated to data collection and data generation. This study also presented the differences of basic characteristics and comparisons of technical performance yielded by MR and ANN analysis.
The first step to produce the fitted predictive model was data generation, which was based on actual data collection from jobsites. This step enables the user to secure a sufficiently large quantity of input data to run the estimation tools, i.e., MR and ANN analysis. A construction simulation technique was used to overcome difficulties in acquiring raw data and a Wilcoxon signed-rank test was conducted to replace the actual productivity calculated based on raw data with the simulated productivity. The next step was implementation of estimation tools using the generated data as input data. This study provided fitted predictive models A and B using either MR or ANN analysis, respectively. Each predictive model was composed of models I, II, and III, which varied according to the factors included or excluded.
Comparison between the actual productivity and the results yielded by fitted predictive model A showed that the average comparison rates were 99.06%, 91.23%, and 90.89% of models I, II, and III, respectively. In contrast with the results obtained by fitted predictive model A, the average comparison rates of fitted predictive model B were 103.06%, 98.80%, and 99.28% of model I, II, and III, respectively. These results indicated that predictive model B was better fitted to the actual data than was model A. Implementation of predictive model B, however, is difficult in that running the MATLAB program demands specific skill. Implementation of predictive model A was relatively easier than that of model B, since the user can obtain predictive results by merely inputting the information for each factor or explanatory variable.
The fitted predictive models suggested in this study enable planners who presently are faced with the insufficient actual datasets, to carry out reliable productivity prediction by means of combination of the simulation either MR or ANN. This study also contributes to the research community by providing a new methodology that combines various methods and produces more reliable prediction results than conventional predictive methods.