ESTIMATING A UNIT PRICE FOR ROADS MAINTENANCE ACTIVITIES USING EXPONENTIAL ROBUST REGRESSION

. Good road maintenance schemes allow reducing costs and extending the service life of roads. There are several methods to plan these project maintenances but all of them require input information about maintenance costs, which can be very different depending on the geographical zone and the contracted volume. Nowadays agencies do not have written documentation on the entire process for preparing project estimates, and experience plays an important role in price estimation. In this paper, a structured methodology for estimating a unit price for road maintenance activi- ties is proposed, modeling the exponential decay nature of economies of scale. Using an exponential robust regression procedure, curves are adjusted and parameters generated. Additionally, a cost contingency analysis is performed in order to provide cost ranges associated to specific contracted volumes. The validation of the methodology was carried out through its use in the estimation of real unit prices of historic road maintenance projects in Chile. The procedure may be used by road planners as well ascontractors looking for a more confident approach before participating in a bid. Furthermore, this methodology is not limited to road maintenance only, but also to any other field where economies of scale and exponential fitting are needed.


Introduction
An economic analysis allows highway agencies to identify, quantify, and value the economic benefits and costs of highway projects and programs over a multiyear timeframe, being able to generate a cost-effective design and construction (Geiger 2003). Highways and roads are an important part of every country public asset, allowing land occupation and communication among communities. As well as other types of infrastructure, road assets require a proper planning, which involves feasibility studies, design, construction, operation, preservation and rehabilitation.
Cost estimate is considered one of the most important and critical phases of a construction project (Alkass, Jrade 2007). Studies report that inaccuracy in cost estimation ranges from 20.4 to 44.7% (Liu et al. 2010), and despite decades of efforts to reduce project cost overruns, large infrastructure projects still continue to be plagued by delays and large cost overruns (Liu et al. 2010). However, in road and highway projects not only the initial construction costs are important: road management starts from the premise that the road network is an asset which needs to be maintained and improved so as to secure the best performance and value-for-money and the maximum service life (Danielson et al. 1998). Low resources assignment and poor maintenance planning carry important road asset loses (Acevedo, Muñoz 2010).
Traditionally, road project alternatives are characterized by an initial investment followed by proper maintenance in subsequent years (Archondo-Callao 2011). There are several methods to plan these project maintenances (Dakin et al. 2006), but all of them require input information about maintenance costs. These costs can be very different depending on the geographical zone and the contracted volume due to economies of scale (Tighe 2001).
There is no single approach to developing construction unit costs. Typically agencies have developed their own process for preparing their project estimates made to suit their requirements. As a result, highway construction projects employ a number of estimating procedures. Agencies do not have written documentation on the entire process for preparing project estimates (Anderson et al. 2009): unit prices are generally estimated by experience or by manually searching similar projects in databases, which can be time consuming and provides a source of possible errors.
Nowadays in Chile there is not a group of unit prices to be used in road projects assessment, neither any formal procedure for estimating them. However, costs associated to road maintenance activities are essential to perform economic-technical evaluations. Those prices definitely influence the final project portfolio that maximizes social benefits.
A structured methodology for estimating the unit prices (UP) of road maintenance activities (RMA) is proposed, modelling the exponential decay nature of economies of scale. By using an exponential robust regression procedure curves are adjusted and parameters are generated. Additionally, a cost contingency analysis is performed in order to provide cost probabilistic ranges associated to specific contracted volumes.
The methodology has been programmed and applied in MATLAB language. A group of parameters result from this stage and they are combined in a Microsoft Excel sheet to estimate unit prices. The methodology validation was carried out through its use in the estimation of real unit prices of historic road maintenance projects in Chile.

Research description and methodology
This investigation tries to answer the following questions: what variables do affect RMA unit prices? And how can they be estimated? There are some common factors that estimators need to consider when determining the unit prices (Anderson et al. 2009): project location, project size, quantity of materials, time of year, current market conditions, constructability, price-volatile materials, sequence of construction, contractor's familiarity of process, risks to contractors, and inflation. Economies of scale are related to quantity of materials and project size, and it has been suggested that unit prices of RMA present an exponential decay due to economies of scale (Tighe 2001). This paper focuses mainly on contracted volume and incorporates a contingency analysis to deal with the variability. Dismissing cost contingency analysis may causeproject costs overrun (Lawrence 2007).
The methodology consists of a combination of an exponential regression with a robust regression. Exponential regression is used when input data (unit prices) tends to exponentially decay as the independent variable (contracted volume) increases. Robust regression is used when input data presents outliers that would distort the results: big databases inevitably contain mistakes that can affect regression performance. Then, with a combination of both procedures this problem can be addressed.
Historical input data is processed in MATLAB, which generates parameters for each RMA studied. These parameters are handled in an Excel sheet, where the unit price estimation and cost contingency analysis are performed. Figure 1 shows a flowchart of the methodology.

Data description
In Chile, Dirección de Vialidad (DDV) is the institution in charge of improving road connectivity to society by planning, building, and maintaining Chilean road assets.
It is dependent on Ministry of Public Works of Chile. Databases used in this research were provided by DDV and correspond to real maintenance activities performed between 2007 to 2011.
Manual de Carreteras is a normative document created by DDV used as a guide to many different road actions. It establishes policies, criteria, procedures and methods that road projects should comply in relation with planning, assessment, design, construction, security, maintenance, quality and environmental impacts (Arriagada et al. 2001). Volume 7 corresponds to road maintenances, where all RMA and their technical description are included.
In this manual RMA are divided into nine groups: road strip, soil transportation, drainage, asphaltic pavements, concrete pavements, gravel and natural soils, bridges and structures, security, and snow control. For the methodology validation 68 pavement maintenance activities were considered due to their importance in road pavement. They correspond to activities of asphalt road maintenance (codes starting with 7.304), concrete road maintenance (codes starting with 7.305), and gravel and natural roads maintenance (codes starting with 7.306). The database contains extensive historical information regarding bid awarding information, date of contract, people in charge, road information, contracted volume, unit prices in Chilean pesos, and type of work. Data information is stored in an Excel sheet. It contains approximately 100 000 records, but, for purpose of this research, some of them provided null or incorrect data (e.g. negative values), and were dismissed from database. Moreover, many of the activity codes are misspelled or written in a slightly different way (e.g. 7-304-1 b instead of 7.304.1 b). This produces that an important amount of information is useless, unless it is corrected.
In order to be comparable, unit prices are converted into UF units (unidad de fomento, Chilean indexed monetary unit adjusted by inflation) considering the date of each record (Geiger 2003).

Exponential robust regression
Ordinary least square method (OLS) is one of the simplest and most-used methods to solve a multiple linear regression. It consists of minimizing the squared sum of the residuals obtaining the regression coefficients.
In presence of outliers OLS may provide wrong models (Arslan 2011). An outlier is a piece of data that is suspected to be incorrect due to the remote probability that it is in fact correct (Knight, Wang 2009). In other words, outliers are defined as atypical data being outside normal ranges for common values, produced by an external variable (e.g. abnormal price negotiation, wrong typing, and urgent repair). If an outlier is considered in the regression, the output model will probably predict wrong results.
There are several robust regression methods to cope with this problem (Andersen 2008). Some of them are the Danish Method by Krarup et al. (1980) (purely heuristic method with no rigorous statistical theory), Least Absolute Values method or L1-norm by Edgeworth (1887) (minimizes the sum of the absolute weighted residuals), Least Median Squares by Rousseeuw (1984) (minimizes the median of the weighted residuals squared), Least Trimmed Squares by Rousseeuw (1984) (excludes the largest weighted squared residuals from the minimization), R-estimators by Jaeckel (1972) (minimizes the sum of the scored ranked weighted residuals), M-estimators by Huber (1964) (iteratively minimizes a weighted function of the residuals), IGGIII estimators by Yang in (1999) (similar to M-estimators with a different weighting function), S-estimators by Yohai et al. (1984) (minimizes a robust measure of the scatter of the residuals), and MM-estimators by Yohai (1987) (combines the S-estimator and the M-estimator).
In 2009 Knight and Wang tested and compared many of these robust methods to identify which of these have the greatest ability to correctly exclude outliers. He found that no method correctly identifies 100% of outliers in all situations. From the results obtained, as the level of "contamination" increases, the robust methods of MM-estimators and the L1-norm achieved the highest rates of correct outlier exclusion. However they are more difficult to apply and sometimes can be more time consuming. On the other hand, the differences between the success rates are at most of the order of 10% (Knight, Wang 2009).
One of the most widespread and easy to use methods corresponds to the iteratively reweighted least squares method (IRLS), which belongs to the M-estimators and allows dealing with the outlier existence in the dependent variable. It consists of iteratively solving the following objective function: where in each step a weighted least squares problem is solved: Definitions: -β t is the vector of regression coefficients at t step; -y i is the response or dependent variable; -f i is the response regression function; -w i is the weighting factor; -n is the length of the set of data.
In this research, the MATLAB routine robustfit. m was used to perform calculations. By default, the algorithm uses iteratively reweighted least squares with a bisquare weighting function and tuning constant of 4.685.
Although robust regression allows easily detecting and removing unit prices that present an uncommon behaviour, it cannot be immediately applied due to the exponential nature of economies of scale (Hernandez-Sancho et al. 2010). A common relationship in most of RMA is observed: as long as contracted volume increases, unit prices tend to decrease, and in many cases this relationship seems to decrease exponentially. Then a logarithmic transformation can be applied in both axis (unit prices and contracted volume), transforming the exponential point distribution into a linear point distribution (Anja et al. 2010). In this new "space" robust regression can be used.
In Figure 2 unit prices against contracted volumes of RMA 7.304.4c are plotted. Exponential decay while increasing contracted volume can be easily observed and no possible linear regression can be fitted. The final exponential robust regression is also plotted.
Applying a logarithmic transformation (natural logarithm) in both axes generates the graph shown in Figure 3. It is clearly observed that data tends to group in a straight line in this space. Here, linear robust regression can be performed without problems. Furthermore, four outliers are easily recognizable (top of the figure), and they were dismissed from the robust regression (line).
An example of outlier's influence in the regression is shown next. Figure 4 shows the unit prices of activity 7.304.2b -cold mix manual surface patching. Robust regression as well as ordinary least square regression method (OLS) are performed and plotted.
Dashed curve moves downward due to low-value outliers. There is a noticeable difference in unit prices for a wide range of contracted volume (UF 0.20 approximately). For example, contracting 20 000 m 2 of cold mix manual surface patching could lead to a difference of UF 4000 (US$ 185 000 approximately in 2012), which is not a negligible amount.
In order to assess or validate a regression model two statistical tests were performed (Kutner et al. 1996;Montgomery, Runger 2002). The first one corresponds to the regression significance test consisting of verifying the significance by using analysis of variance. The second test corresponds to the assessment of the lack-of-fit of the regression model.

Cost contingency analysis
An economy of scale model provides the unit price in function of the contracted volume. However, it provides a deterministic value, while in practice it behaves like a random variable. Instead of assigning a single value for the cost estimate, the probabilistic approach associates a probability distribution to the cost parameter. This random variable presents higher variability for smaller contracted volumes and tends to stabilize for higher ones.
Cost contingency is defined as the amount needed to add to an estimation in order to reduce the over exceeding risk to an acceptable level (Idrus et al. 2010). If we want to estimate unit prices, but we ignore material suppliers and contractors past performance, it is highly recommended to include a contingency item within the estimation. This is easily done with the regressed model described earlier.
The final purpose is to allow the user of this methodology to choose different probabilities that the unit price will not be exceeded, and thus determine the unit price associated to that level of confidence. Figure 5 shows a typical distribution of the unit price of a RMA.
The reason of this decrease in variability is that for small contracts most of providers use the sale price they want, and contractors have no interest in negotiating this price. However, for big volumes there is an ability to take advantage of economies of scale (Tighe 2001), and estimators are pushed to estimate more accurately.
To cope with this problem, the proposed methodology divides contracted volume range in three levels of contingency. It assumes data is normally distributed around mean unit prices (Osman 2005). For specific RMA, the proposed volume ranges vary according the total historic range and have the same length: (3) where: R i is the total historic volume range for i-activity; ∆R i is the total historic volume range for i-activity divided by three; V i is the vector of volume data for i-activity. Lower and upper volume bounds for each interval are defined as it follows: where j corresponds to the contracted volume ranges: low (1), medium (2) or high (3). Considering the unit prices according to each interval, the average unit price μ UPij and its standard deviation σ UPij are computed. At the same time, the average volume μ M ij for each interval is calculated as reference value for low, mid, and high contracted volume.
For each interval, a variance coefficient is defined as the ratio between standard deviation and mean value. It represents the ratio to consider as standard deviation for a specific unit price: where: σ UPij is the unit price standard deviation of interval j for i-activity; μ UPij is the average unit price of interval j for i-activity; f UPij is the variation coefficient of interval j for i-activity.
After robust regression parameters are generated and mean unit prices calculated, a contingency analysis can be performed. Some variables are defined: -UP: unit price random variable; -UP reg : mean unit price obtained from regression; -UP c : unit price considering a contingency level.
Then, unit price associated to a contingency level is calculated. As example consider 15.9% probability of exceeding UP reg (i.e. adding one standard deviation to the mean): It is worth noting that using a high contingency level does not ensure that real unit prices will be always lower than UP c calculated. In other words, if the "experiment" is repeated infinite times with different sample spaces, similar confidence levels would be obtained.

Example cases
For the example cases, information provided by DDV is considered. This data consisted of thousands of RMA unit prices collected between 2007 and 2010. The different types of RMA available in this database belong to one of the categories proposed in Manual de Carreteras, Volumen 7 (Arriagada et al. 2001), and they are similar but more specific than the activities proposed in HDM-4 software (Dakin et al. 2006). Chile is a very large and narrow country with very different climates and regions, and places with very difficult accessibility. In order to avoid variability due to different geographical areas, this study only considers unit prices from central Chile. Unit prices are shown in UF units (UF = unidad de fomento, Chilean indexed monetary unit adjusted by inflation). The activities considered in the example cases are: 7.304.4c -Asphalt slurry seal (m 2 ). This maintenance activity belongs to the group of bituminous seals. It refers to an asphaltic covering through asphalt irrigation, in combination with some aggregates. This activity includes surface cleaning and preparation, asphalt material implementation, equipment, compaction, and other final actions. It is quantified in squared meters (m 2 ). 7.305.1b -Joint and crack sealing (m). This maintenance activity consists of sealing or resealing existing cracks in concrete pavements. This operation includes crack cavity conformation, cleaning, and sealing with all required procedures. It is quantified in linear meters (m). .306.4a -Gravel road platform reshape (m 3 ). This maintenance activity belongs to the group of activities intended to reshape road's platforms, and it considers replacing missing material. The aim of this activity is recovering initial geometry and serviceability of the road. This operation includes road preparation, material supply and placement, other final actions. It is quantified in cubic meters (m 3 ). Table 1 summarizes the chosen RMA with their input data.

Results
The parameters obtained from the regression are: b1: first beta coefficient; b2: second beta coefficient; s1/m2: variation factor for the first volume range; s2/m2: variation factor for the second volume range; s3/m3: variation factor for the third volume range; Vp1: mean contracted volume for the first range; Vp2: mean contracted volume for the second range; and Vp3: mean contracted volume for the third range.
The following data was automatically dismissed in the robust regression: 7.304.4c: 3.96% of the data (13 outliers); 7.305.1b: 3.70% of the data (1 outlier); and 7.306.4a: 2.51% of the data (7 outliers). However, leaving just a few outliers in the regression can create very bad models due to their different order of magnitude. In general, 5-25% of data is dismissed. Table 2 summarizes parameters obtained from robust regression, including exponential regression coefficients, variation factors for each volume range, and mean contracted volume for each range. Figure 6 shows unit prices for asphalt slurry seal plotted against contracted squared meters, and the exponentially robust fitted curve.
Unit prices for joint and cracks sealing plotted against contracted linear meters, and the exponentially robust fitted curves are shown in Figure 7. Figure 8 shows unit prices for joint and crack sealing plotted against contracted cubic meters, and the exponentially robust fitted curve. Then, equations for unit prices are obtained by performing an exponential transformation: Therefore, UP expressions for the studied maintenance activities are shown next. Their accuracy can be verified by simply replacing example values and comparing them with graphs. According to Table 2 this volume corresponds to a midrange. From the Table 2, variation coefficient for the midrange is 35.12%, which means that the associated standard deviation is 35.12% of the mean. Thus the unit price associated to this contingency level is: This means that, for 5000 meters of sealing, the unit price that has a probability of 15.9% of being exceeded corresponds to 0.0422 UF. This confidence level (84.1%) was arbitrarily chosen due to its simplicity in calculations (simply adding a standard deviation to the mean), but any value between 0.00 and 1.00 can be chosen. Fitted curves show to successfully model price variation within normal volume ranges (i.e. historic data range included in the database). Applying these equations to extreme values (nearly zero volume or infinite volume) may cause anomalies.

Conclusions
This paper presented a methodology for estimating unit prices of RMA. Economies of scale causethe unit prices of RMA to have an exponential decreasing behaviour, which can be easily modelled with the methodology presented. Robust regression (IRLS) in combination with a logarithmic transformation successfully allows to model economies of scale of unit prices. The method showed to be an effective way to avoid price distortions due to outlier existence in data. Ignoring outlier presence can produce erroneous curve adjustment, and performing this kind of analysis requires negligible more effort in comparison to traditional ordinary least squares regression. Moreover, fitted curves showed to successfully model price variation within normal volume ranges. For rarely extreme values of contracted volume (nearly zero or infinity) model might provide abnormal unit prices. In order to correct this, the total volume range was divided into three intervals, and, for each one, a median volume value was assigned. This allows capturing price variation due to economies of scale, and avoids extreme values anomalies. In case a specific low-contracted volume is being looked for, special care must be taken because these exponential functions tend to infinity as they approach to zero. For these cases, comparing with nearby estimations is recommended.
The unit prices of RMA should not be treated as deterministic values, because in practice they behave like random variables. This research showed that the variability of unit prices decreases as contracted volume increases. In other words, for small contracts price variability tends to be considerably high, whereas for big contracts unit prices tend to be more homogeneous. Results suggest this is explained by the ability to take advantage of economies of scale and due to a more detailed price estimation performed by contractors. Cost contingency analysis allows assigning a normal probability cost distribution for each of the three contracted volume intervals.
The model was tested with road maintenance projects developed in Chile during the last years. Future research should consider applying the model in other countries to check its adaptation ability to any kind of data. Furthermore, a study involving geographical data segregation to test the impacts in unit price variability should be performed and an actualization of unit prices due to general inflation and other variables could be considered.
This research considered using robust regression to deal with theexistence of outliers in the dependent variable. However, if the outliers occur in the explanatory variables the performance of robust regression estimators might not be better than the ordinary least squares estimators (Arslan 2011). To investigate this issue, a weighted least absolute deviation regression estimation could be tested.
The methodology properly works for the studied RMA (asphalt, concrete, and gravel roads). However, its applicability for other RMA (e.g. signage, drainage) it is not theoretically limited, and should be a matter of future research. Furthermore, this procedure might be applied to simulate any other problem where exponential fitting is needed and outlier data is frequently found (e.g., mining, construction, or industry). Ingenieros Consultores. Both supported, promoted and sponsored this research, providing information, resources and knowledge. The writers acknowledge the assistance of Priscila Hidalgo and Bárbara Rozas from DDQ, and Miguel Valdés who works as fiscal inspector at Dirección de Vialidad. The contents of this paper reflect the views of the writers, who are responsible for the facts and the accuracy of the data presented herein, and do not necessarily reflect the official views or policies of these institutions.