BENCHMARKING PROJECT LEVEL ENGINEERING PRODUCTIVITY

The benchmarking of engineering productivity can assist in the identification of inefficiencies and thus can be critical to cost control. Recognizing the importance of engineering productivity measurement, the Construction Industry Institute (CII) developed the Engineering Productivity Metric System (EPMS) composed of a series of hierarchical metrics with standard definitions suitable for measuring engineering productivity at various levels. While the EPMS can be used to assess engineering productivity at multiple levels within a discipline, it cannot produce an overall project level productivity measurement due to the underlying method of defining productivity. Previous studies have attempted to develop other metrics to assess engineering productivity at the project level; however, these methods did not create metrics suitable for benchmarking. To overcome these limitations, this study developed a standardization approach using “z-scores” to aggregate engineering productivity measurement from actual data collected from 112 projects provided by CII member companies. This method produces a metric with a project level view of engineering productivity. It allows owners and engineering firms to summarize engineering productivity at both the discipline level and at the project level. The method illustrates a comprehensive and innovative procedure to develop a metric for summary of productivity metrics with different underlying outputs, thus laying the foundation for future analyses and studies.


Introduction
Reliable engineering productivity measurement is essential for monitoring engineering performance and assessing whether change in the current engineering approach is warranted (Shouke et al. 2010). Although engineering costs have increased in recent years, in some cases as much as 20% of the total project cost, engineering productivity remains less understood than construction labor productivity (Kim 2007). A reason for this lack of understanding is based upon difficulties in measuring engineering productivity. Compared to construction, engineering has many intangible outputs (models, specifications, etc.) making it even more difficult to assess and track.

Measuring engineering productivity
While challenges remain, there have been many attempts to define engineering productivity in various ways: hours per drawing (Chang, Ibbs 2006;Thomas et al. 1999), hours per engineered element (Song 2004;Song, AbouRizk 2005), and hours per engineered quantity (CII 2001(CII , 2004Kim 2007). The Construction Industry Institute (CII) suggests that engineering productivity measured in hours per engineered quantity (quantity-based measures) is superior to the other approaches due to its direct relationship to engineering activities and also because it requires less subjective manipulation of intermediate deliverables. As an added advantage, it is based upon similar definitions to construction productivity (CII 2001).
CII, an industry and academic consortium, worked closely with its industry members to develop a standardized system, the Engineering Productivity Metric System (EPMS) for benchmarking engineering productivity (Kim 2007). The EPMS uses engineering productivity metrics with standardized definitions developed through consensus reached between industry and academia. It broadly incorporates input from numerous workshops and other industry forums. Many CII engineering firms now employ the EPMS for benchmarking allowing them to better understand their competitive position and improve performance.

The engineering productivity metric system (EPMS)
The EPMS defines engineering productivity as the ratio of direct engineering work-hours to issued for construction (IFC) quantities. While it does not include all project engineering work-hours, the EPMS consists of six major disciplines for concrete, steel, electrical, piping, instrumentation, and equipment, which account for the majority of project engineering work-hours. The system tracks engineering productivity at three levels using a hierarchical metric structure shown in Fig. 1: Level II (discipline), Level III (sub-category), and Level IV (element). As noted previously, the current EPMS does not provide a means of aggregating measurement to the project level which would be a Level 1 metric.

Fig. 1. The EPMS metric hierarchy
In the hierarchical EPMS, each discipline level metric is composed of underlying sub-category and element metrics. For instance, the discipline level metric "total concrete" includes three sub-category level metrics for "foundations", "slabs", and "concrete structures." The sub-category metrics are further divided into element metrics such as foundations ">= 5 cubic yards" and foundations "< 5 cubic yards". The major advantage of a hierarchical EPMS is that engineering productivity data can be collected flexibly at various levels of details, and can be aggregated from the element level to the discipline level because they have identical units for measurement (Kim 2007). Concrete metrics for instance, are all measured in hours per cubic yard. In order to make engineering productivity comparable across different organizations, standard metric definitions for engineering work-hours and quantities were established (Kim 2007).
Discipline level metrics include all hours directly charged for the discipline for engineering deliverables, including site investigations, meetings, planning, constructability activities, and request for information (RFIs). For concrete, this includes all engineering hours for embedments for slabs, foundations, and concrete structures. Engineering hours and quantities for piling are not included.
The EPMS collects productivity data via a secure web-based system developed by CII for its member companies. Using this system, organizations can input their engineering productivity data and submit it to the EPMS system for validation and benchmarking after attending and completing training in use of the online system. The EPMS provides a major breakthrough for benchmarking engineering productivity; however, given the different units of the various disciplines it poses a challenge to summarize metrics from the discipline level to the project level (Level I). Lacking a project level engineering productivity metric (PEPM), it is impossible to assess overall engineering productivity performance; an issue frequently noted in the feedback from CII companies. Furthermore, the lack of a PEPM hinders analyses of the relationships among engineering productivity, overall performance at the project level, and performance improving best practices.

Historical background and development of the PEPM
Although many approaches have been adopted for performance evaluation in the various industry sectors such as software industry, manufacturing industry, and even the construction industry (Bang et al. 2010;Benestad et al. 2009;de Aquino, de Lemos Meira 2009;Fleming et al. 2010;Issa et al. 2009;Niu, Dartnall 2010;Ren 2009;Yang, Paradi 2009), there are limited suitable approaches for the development of a project level engineering productivity metric. The Project-Level Productivity (PLP) index was developed by Ellis and Lee (2006) for monitoring multidiscipline daily labor productivity. They employed an equivalent work unit (EWU) to standardize and aggregate the outputs of different construction crafts. Nonetheless, this approach standardized installed quantities of different crafts without considering their variations. Therefore, such an approach may panelize productivity of some crafts and lose precision on the assessment of project level productivity.
A general approach which first standardizes metrics individually and then aggregates them caught the attention of numerous researchers. Standardization generally involves rescaling the variables or removing their variations for aggregation. Maloney and McFillen (1995) standardized job characteristics with different scales into a range from 0 to 1 in order to make comparisons. The z statistic (z-score) is another common method to standardize variables with consideration for both sample mean and standard deviation (Agresti, Finlay 1999). The standardized metrics are then aggregated, by applying the weighted-sum of underlying metrics to develop a summarized measure. For example, Ibbs (2005) suggests cumulative project productivity metric in terms of the sum product of change-impacted, change-unimpacted productivity and their corresponding work hours divided by total work hours (see Eq. 1).
These methods proposed feasible solutions to either standardization of different variables or aggregation of variables with the same measures; however, none of them provided a complete approach considering both aspects. Therefore, this study developed an approach addressing both standardization of the different variables and their aggregation to summarize engineering productivity at the project level.

Methodology
The CII Productivity Metrics (PM) team established criteria for the desired PEPM. These criteria required that any metric developed would have to first be comprehensible; it would also have to satisfy a condition termed homogeneity, and finally, it would have to be suitable for trending. To be comprehensible the PEPM should be readily interpretable by both industry and academia. Homogeneity refers to the accuracy of the PEPM for summarizing the underlying engineering productivity. And to be useful for tracking engineering productivity, the PEPM must be capable of being measured and reported over time.
Initially, three candidate approaches for the aggregation of engineering productivity of various disciplines were considered. Using data collected via the EPMS, the three approaches were used to calculate a PEPM and each PEPM was separately evaluated using the criteria above. Characteristics of the three PEPM were compared and assessed systematically for comprehensibility, homogeneity, and trending ability. In this process qualitative evaluations were converted into quantitative weightings using the Analytic Hierarchy Process (AHP) for approach selection. The approach with the highest overall score was considered the most suitable and thus ultimately selected for the PEPM.

The EPMS database
A total of 112 heavy industrial projects with engineering productivity data were submitted to the EPMS database from 2002 to 2008. The total installed cost of all projects is US$ 4.5 billion. Table 1 presents the distribution of these projects by respondent type, project type (process or non-process), project nature (addition, grass roots, or modernization), and also project size.
Contractors submitted the majority of data with a total of 92 projects whereas owners submitted only 20. Based on the observation of the PM team, the data disparity by respondent is primarily because contractors are better staffed to track engineering productivity and more readily have access to the data. All projects submitted were heavy industrial projects which are further classified into two major categories: process and non-process. Process projects include projects such as chemical manufacturing, oil refining, pulp and paper and natural gas processing projects. Non-process projects include power and environmental remediation projects. This taxonomy was developed based on Watermeyer's definition, which defined non-process projects as those that yield products that cannot economically be stored (Watermeyer 2002). Process projects comprise the majority of the productivity dataset with a total of 77, and the remaining 35 are nonprocess projects. An analysis of project nature reveals that 37 are additions, 53 are modernizations, and 22 are grass roots. In accordance with CII convention, a project with a budget greater than five million dollars is categorized as a large project. Accordingly, 68 projects were categorized as large projects (greater than five million dollars) and the remaining 44 projects were categorized as small ones (less than five million dollars). A distribution of direct engineering work hours by discipline was also produced and is presented in Fig. 2. The piping discipline accounts for the majority of work hours with 45%, a substantially higher percentage of the total hours than other disciplines. This distribution may not be typical of most projects but is reasonable since these are industrial construction projects.

Software used in data preparation and analyses
Data preparation is the essential foundation for effective data analysis. In this research, engineering productivity data were first collected and stored in a secured Microsoft SQL Server 2000 ® database. Next, data tables were exported and saved as Microsoft Access ® files for ease of access and query. The data tables were further exported to Microsoft Excel ® because of its high compatibility with statistical packages. Minitab ® and SPSS ® were utilized to perform data analyses.

Development of the approaches to calculate PEPM
Three approaches were proposed to calculate PEPM; these included the earned-value method, the z-score method, and the max-min method (Liao 2008). The earned-value method calculates project level productivity with total actual work hours divided by the total predicted (baseline) work hours of all disciplines. The z-score method converts the raw engineering productivity metric of every discipline to a dimensionless measure for further aggregation. Work hours were subsequently employed to weight the z-scores of various disciplines during aggregation to the project level. The max-min method first standardizes discipline productivity metrics by subtracting the minimum productivity value of the discipline and then dividing it by the range of the metric (maximum-minimum). Similar to the z-score method, max-min standardized discipline productivity metrics are weighted by their work hours for aggregation.
The authors worked closely with the PM team for evaluation of the three approaches using the criteria previously presented. Applying the Analytical Hierarchy Process (AHP), the z-score method was selected and only this approach is presented in detail in this paper. The details of the other approaches as well as the entire assessment and selection procedure are documented in Liao (2008).

The Z-score method
The z-score approach uses a statistical procedure for standardization which allows comparison of observations from different normal distributions. It includes three main steps: transformation, standardization, and aggregation of the underlying metrics.

Transformation
At the outset, the authors assessed the normality of data using the quantile-quantile probability plot (Q-Q plot) to determine if transformation was necessary before standardization. The Q-Q plot provides a visual assessment of "goodness of fit" of the distribution of the data and what the distribution should look like if it were normally distributed. For instance, Fig. 3 demonstrates a Q-Q plot for concrete engineering productivity (work hours per cubic yard, Wk-Hr/CY). If the data were normally distributed, the data points should reasonably fall along a straight line. As a further check the mean value is compared to the median as a quick check for evidence of skew as shown in Table 2. Here the mean value exceeds the median by 2.77 (156%), illustrating that the distribution of concrete engineering productivity data is positively skewed. Similarly, engineering productivity metrics for the other disciplines in the table are all positively skewed. This finding coincides with the scientific nature discovered in previous research (Zener 1968). Because the zscores of raw engineering productivity metrics are not normally distributed, the PEPM developed from the linear combination of such z-scores can be misleading when interpreting project level engineering productivity compared to the grand mean.
To address the skew of the data, natural log transformations were employed. After the natural log transformation, the data tends to scatter around a straight line as shown in Fig. 3. As confirmation, the difference between mean and median values is reduced to only 0.07 (12%) as illustrated in Table 2. Consequently, all metrics were transformed using natural log transformations and then the distributed were ready for standardization.    (7) -0.12 Calculations (3)

Standardization and aggregation
All projects in the EPMS were categorized by the year of the midpoint of the project, though actual projects usually spanned two or more years. The year with the most projects was chosen as the base year because sample means theoretically converge to population means as the sample size increases. In these data, 2004 was selected as the base year with a total of 32 projects. Using this reference dataset, each individual metric was standardized by subtracting its mean and then dividing it by its standard deviation. Thus, the variability of the different discipline metrics was neutralized and calibrated in the same scale, suitable for aggregation. Next, the standardized discipline productivity is aggregated using work hours as the weights since "work hour" is the common parameter amongst different disciplines. In terms of workload, it represents the disciplines relative importance in engineering productivity performance.

An example for calculation of a PEPM
Using the z-score method, Table 3 presents an example of the steps for calculation of a PEPM for a single project. For ease of understanding, only concrete and steel disciplines are included in this example. It begins with the natural logarithm transformation of engineering productivity metrics to address data skew, thereby making the distributions more normal. For instance, concrete engineering productivity is 3.06 (Wk-Hr/CY) as shown in the column with note (2). The natural logarithm transformed value shows in the column with note (3). Next, the transformed engineering productivity metrics are standardized using the mean (note 4) and standard deviation (note 5) of the reference dataset to account for the variations in the distributions. Lastly, all of the standardized concrete and steel engineering productivity metrics (note 6) are weighted by their work hours (note 1) and aggregated to a PEPM shown in the formula at note (7). Given the concrete z-score of 0.38 and its 5200 engineering work hours and the steel z-score of -0.47 with its 7500 hours, their composite score is -0.12. This value, which incorporates both concrete and steel engineering productivity, indicates that the sample project outperforms the overall mean with 0.12 standard deviations. Similarly, all discipline level productivity metrics could be aggregated to a PEPM by using the same approach. In summary, the PEPM can be expressed mathematically with Eq. 2: where: WH ip -work hours of the i th underlying metric on the p th project; z ip -the z score of the i th underlying metric on the p th project.

Characteristics of a PEPM generated by the z-score approach
The authors and the PM team assessed the z-score approach using the pre-established criteria of comprehensibility, homogeneity, and trending ability. The approach when compared with the other two approaches performs the best among the three candidate approaches and thus was selected. Characteristics of the z-score approach are presented as follows.

Comprehensibility
The z-score method produces a PEPM in which the mean value approximates zero and its data ranges from -3 to 3 standard deviations as depicted in Fig. 4. The Kolmogorov-Smirnov (K-S) test was performed to examine if the distribution differs significantly from normal. The p-value was greater than 0.1 suggesting that the null hypothesis cannot be rejected and that the distribution of the PEPM approximates a normal distribution. A negative composite score as illustrated in Table 3 indicates that the project is more productive than the norm of the base year. A positive PEPM score would imply that the indicated project is less productive than the norm of the base year. In addition, the PEPM is interpreted with the same convention as other CII performance metrics (e.g. the smaller the value, the better the performance), such as cost growth, schedule growth, etc.; therefore, the authors and the PM team concluded that this approach creates compatible between the PEPM and existing CII benchmarking metrics. The PEPM was presented to industry representatives in CII benchmarking workshops to gain broad feedback. The benchmarking users easily recognized the PEPM's value for assessing productivity performance and thus it was deemed acceptable. In summary, the zscore approach produces an appropriate PEPM, which is easily comprehended by both industry and as later demonstrated by academia.

Homogeneity
Homogeneity of the PEPM to indicates how accurately it summarizes overall engineering productivity at the discipline level. In this study, homogeneity is defined as the difference between the percentile of the PEPM and percentile of the weighted average of the discipline level metrics in each project. The smaller the percentile difference (PD), the better the homogeneity.
To examine the homogeneity of the PEPM under various benchmarking scenarios, the dataset was first divided into subgroups by project characteristics and then homogeneity was examined accordingly. For instance, projects were divided by the i th characteristic with j subgroups where the k th project in the j th subgroup has l disciplines. Percentile ijkl was calculated for the l th discipline level metric. A summarized percentile of all discipline level metrics was derived from the average weighted by their work hours. This weighted-average percentile is called overall expected performance of discipline level metrics (OEPDLM) for the k th project (Eq. 3): The percentile (P ijk ) of the PEPM of the k th project is directly calculated with the other projects in the j th subgroup. The percentile difference (PD ijk ) of the k th project is defined as the absolute difference between OEPDLM ijk and P ijk (Eq. 4): Lastly, calculating the PD for the j th subgroup of the i th characteristic, PD ij is simply calculated with arithmetic mean because every project was considered equally important (Eq. 5): A grassroots project is used as an example to illustrate how the PD was calculated. First, compared against grassroots projects, the z-score of concrete engineering productivity (5200 work hours) of this project was converted into a percentile, 70%. Similarly, percentiles of the engineering productivity z-scores (illustrated in 5.1.3.) for other disciplines were calculated and presented with their work hours as follows: steel (40%, 7500 work hours), electrical equipment (51%, 1154 work hours), conduit (63%, 577 work hours), cable tray (59%, 981 work hours), wire and cable (55%, 3462 work hours), lighting (58%, 173 work hours), piping (53%, 23077 work hours), instrumentation (64%, 8654 work hours), and equipment (41%, 7500 work hours). The percentile (45%) of the PEPM was also calculated. Second, OEPDLM was derived from averaged percentiles of all discipline level metrics weighted by their work hours, 53.23%. Therefore, the PD (8.23%) of this project was obtained using the absolute value of the difference between the OPEDLM (53.23%) and the PEPM percentile (45%).
Applying a similar logic for generating project level engineering productivity for grassroots projects, the PDs for addition and modernization projects were also calculated. As shown in Table 4, for 37 addition projects, the average PD is 7.2%; for 22 grassroots projects, the average PD is 8.5%; for 53 modernization projects, the average PD is 9.4%.
Besides project nature, the authors also examined PDs by other project characteristics such as project size, type, priority, contract type, and work involvement. Table 4 demonstrates that the average PD of PEPM by different project characteristics with 13 subgroups and the overall average PD equals 8.4%. Compared with the 30% error rate found in previous research (Cha 2003;Jarrah 2007), the precision of the PEPM is reasonably acceptable. The results also indicate that the PEPM represents discipline engineering productivity metrics homogeneously. Trending Ability One of the major advantages of the z-score approach is that it produces a PEPM for trend-tracking because it utilizes a fixed reference dataset in 2004. Data points from 1998Data points from -2000Data points from , 1999Data points from -2001Data points from , and 2007Data points from -2008 were ignored because they included less than 30 projects and thus may not be statistically reliable. Nevertheless, the average PEPM for each year vary dramatically because of limited sample size in each period. In order to level out fluctuations and to observe trends more clearly, a threeyear moving average (3-yr MA) is utilized to demonstrate the trend of engineering productivity. Fig. 5 depicts a 3-yr MA trend line. Given that engineering productivity is defined as input over output in this research, rising values reflect lower productivity. The trend depicts declining productivity from 2000 to 2006. The authors discussed this with the experts of the PM team as well as other practitioners from industry forums. In summary, the authors found that the trend illustrated by the PEPM is consistent with industry expectations of engineering productivity. A more rigorous explanation is the dissemination of 3D computer-aided-design (CAD) and technology. In order to improve field productivity, engineers intensively use 3D CAD to deal with constructability or safety issues beforehand and thus may consume more time than implementations of 2D CAD (Datatech 1994). Poor technology integration also likely hampers engineering productivity improvement as well. For instance, an engineering firm may save work hours on designing piping layouts with 3D CAD; however, a low degree of technology integration with other disciplines results in inefficient data mapping or transformation (Brynjolfsson 1993). The PEPM was developed for benchmarking engineering productivity and identifying problems at the project level. If engineering productivity at the project level is determined to be a major concern, problems at the discipline level can be identified by tracking the corresponding workload (engineering hours) and also by benchmarking results. From Figs 6-8, an oil refining plant is provided as an example for illustration.
An oil refining plant is engineered for modernization of equipment, piping, and instrumentations. The zscore (standardized engineering productivity, SEP) of the equipment discipline is -0.75 with 140 work hours. The z-score of the piping discipline is 0.75 with 1140 work hours. The z-score of the instrumentation discipline is 0.52 with 342 work hours. In order to control project characteristics and benchmark more meaningfully, 11 oil refining plants (engineered for modernization) were selected as the comparison samples. Using the PEPM for benchmarking, as shown in Fig. 6, engineering productivity of the sample project appears to be worse (4th quartile) than most of the comparison samples.
When project engineering productivity is found to be low, the engineering hours (workload) of various disciplines can be prioritized to track engineering productivity because the engineering hours of each discipline represent relative impact on project productivity. For the sample project, piping engineering work hours account for 70% of total, instrumentation work hours account for 21%, and equipment work hours account for 9% (Fig. 7).
Therefore, piping engineering productivity should be of major concern because it accounts for the largest workload among all disciplines. As shown in Fig. 8, the standardized engineering productivity (SEP) of piping discipline resides in the fourth quartile, equipment resides in the second quartile, and instrumentation resides in the third quartile. The figure demonstrates that piping engineering productivity is the worst among all disciplines. Considering the workload and performance of the sample project, the project manager should place the piping discipline as the top priority and allocate major management resources in order to efficiently improve project engineering productivity.  Engineering productivity improvement raises project delivery efficiency and reduces cost. Benchmarking is an effective approach with which the project manager identifies problems. A following strategy is provided as guidance for project managers. First, project characteristics by which the analysis is to be performed should be determined. Second, a similar comparison dataset is selected. Third, if the PEPM demonstrates a less than satisfactory result, the project manager can track engineering productivity by discipline level given their relative importance by work-hour percentage of the entire project. Lastly, improvement plans can be developed and thus resources can be consumed effectively for productivity improvement.
Project managers should be aware that benchmarking results become more meaningful as more project characteristics are identified, implying that more project complexity is controlled. The size of the comparison samples; however, becomes smaller accordingly. If the sample size of a comparison dataset becomes too small to benchmark, less constraints (or characteristics) should be used to gain more comparison samples. The benchmarking result though, may be sub-optimized. Users should prudently select a benchmarking dataset by leveraging comparison sample size and thus obtain meaningful results.

Conclusions and recommendations
Using a dataset of 112 heavy industrial projects, this research developed a z-score approach to produce a Project Level Engineering Productivity Metric (PEPM). This approach consists of the steps: transformation, standardization, and aggregation of various discipline level metrics. Considering the positive skewed distributions of engineering productivity data, the application of the natural logarithm function results in transformed metrics at the discipline level producing an approximately normally distribution. Thus, the transformed metrics are suitable for standardization. Although the transformed metrics were approximately normally distributed, their different central tendencies and variation were recognized and thus their z-scores were calculated for standardization. The different disciplines were weighted by work hours for aggregation into a PEPM for the assessment of engineering productivity at the project level. The PEPM is easily understood by the industry; it represents underlying metrics accurately and can be used to track engineering productivity trends.
From the benchmarking perspective, it is critical to have a PEPM which summarizes the various discipline engineering productivity metrics. It provides project managers a macro-view of engineering productivity. Project managers can either benchmark their engineering productivity against the CII database or historical information collected within their organizations.
In the hierarchical EPMS, the PEPM calculated with the z-score approach allows project managers to identify project productivity problems at a glance. When project engineering productivity appears to be low, the project manager can track problems of the underlying metrics prioritized by their workload (engineering hours). An informed decision can then be made for improving overall engineering productivity.
The development of the PEPM also opens new opportunities for engineering productivity analysis. Further research can be conducted to analyze impacts of best practice use such as front-end planning, constructability, or change management on project level engineering productivity. In addition, the relationship between engineering productivity and project performance can also be explored with the PEPM.