OPTIMAL SCHEDULING OF WATER NETWORK REPAIR CREWS CONSIDERING MULTIPLE OBJECTIVES

Water main breaks disrupt water services and impact traffic flow along congested city roads. Dispatching water pipe repair crews needs to consider several factors that include: 1) the priority of repair site; 2) the suitability and efficiency of the construction crew in repairing a particular break type; and 3) the time required for crews to travel between break sites. This paper presents a simulation-based multi-objective optimization model to schedule repair crews across water network break sites in an urban setting. Discrete-event simulation models for the water pipe repair process are developed to account for various repair methods. These models are subsequently integrated within a GA-based multiobjective optimization model that considers the following objectives: 1) minimizing the total repair time required to complete all breaks; 2) minimizing the total cost to complete the breaks; and 3) minimizing the cumulative impact of all breaks incident on road users and water customers. A case study for the water network on the City of Damietta, Egypt is used to demonstrate the capabilities of the model. Results show a 21% reduction in repair time and 50% reduction in user impact compared to heuristic crew allocation methods used by the water utility.


Introduction
The deteriorating state of infrastructure systems will continue to have a significant impact on the public for years to come. A recent study by the American Water Works Association (AWWA) estimates that more than one million miles of water pipes are nearing the end of useful life and approaching the age at which they need to be replaced (AWWA 2012). These replacement costs combined with projected expansion costs will cost more than $1 trillion over the next couple of decades. Water pipe infrastructure poses unique asset management challenges because there is no cost-effective and standardized mechanism to determine their condition. In many cases a run-to-failure or reactive management approached is utilized for non-critical water pipes which translates into a large number of water main breaks. It is estimated that in North America 850 major water main breaks occur on a daily basis (Folkman 2012).
In addition to the direct costs associated with the repair work that is needed to repair broken pipe, there are several indirect impacts that can be quite severe. Water main breaks are known to cause service disruption to water customers. Depending on where the break occurred, this disruption may range from a drop in pressure delivered to customers to a total interruption in service. For some water customers this disruption can have significant consequences (e.g. hospitals, industrial production facilities, etc.). In congested urban areas water main breaks also lead to significant disruption to traffic along already congested urban roadways. Depending on the depth and diameter of the pipe, the repair operations may require significant time and equipment to complete which have a direct impact on the extent of traffic disruption.
With the expected aging of water pipes, water utility companies will be required to respond to an increasing number of water main breaks. In order to deliver high levels of service to their customers and minimize impacts to communities by these breaks, utility companies need to improve their response plans to break events. The development of an optimal water break response plan is complicated by a large number of factors that include: 1) The geographical extent of service areas (especially in large cities); 2) The unpredictability of a break event occurrence; 3) The availability of different types of repair methods and equipment that may not be well-suited to all types of breaks; 4) The uncertainty in repair time; and 5) The uncertainty in transit time of a repair break between break sites.
As such, this paper develops a comprehensive framework and working prototype for optimal allocation of water pipe break repair crews that consider these challenges.

Background
Previous work on scheduling and allocation of repair crews has spanned several domains. Hegazy et al. (2004) proposed an approach for scheduling, resource planning, and cost optimization of large construction and/or maintenance programs that involve distributed sites. The model was applied to known school maintenance sites spanning a large urban area. The model was subsequently extended (Hegazy 2006) to consider the option of outsourcing the maintenance/repair operation instead of only depending on in-house resources. Orabi et al. (2009) developed an approach to deal with the challenge of the limited availability of the reconstruction resources that confront postdisaster recovery of damaged transportation networks. The model capabilities include: 1) optimizing the allocation of limited reconstruction resources to competing recovery projects; 2) assessing and quantifying the overall functional loss of damaged transportation networks during the recovery efforts; 3) evaluating the impact of limited availability of resources on the reconstruction cost; and 4) minimizing the performance loss of transportation networks and reconstruction cost. In the field of electrical utility repair, Weintraub et al. (1999) developed a system to support the dispatching of emergency services vehicles to support unplanned electrical problems. Their approach considered the following aspects: 1) service priorities: various service problems have different importance (e.g. dangerous fallen cable versus domestic loss of power); 2) travel time and transportation costs; and 3) probability of new requests as it is considered important to include information about possible breakdowns in the near future to avoid the assignment of vehicles which could leave an area with possible future breakdowns unprotected. Van Hentenryck et al. (2010) proposed an approach which considers the single commodity allocation problem (SCAP) for disaster recovery. SCAPs are complex stochastic optimization problems that combine the problems of: 1) resource allocation; 2) parallel fleet routing; and 3) warehouse routing. The challenge in solving these complex problems is their computational complexity that collides with the need be solved under tight runtime constraints to be practical in real-world disaster situation. Their work introduced a novel multi-stage hybrid-optimization algorithm that utilizes the strengths of mixed integer programming, constraint programming, and large neighbourhood search to overcome this problem. Xu et al. (2007) proposed an approach for strategic integer program to determine how to schedule inspection, damage assessment, and repair tasks so as to optimize the post-earthquake restoration of the electric power system. The objective of the optimization is to minimize the average time each customer is without power. Variables, such as damage state and functionality status of the entities, collectively define the system status. As events take place, the values of variables are updated, modifying the overall system status.
Researches into operational repair optimization for water networks have generally not considered all factors and constraints that impact the repair process. Simão et al. (2004) developed a multiobjective optimization algorithm to locate the best set of isolation valves to close in case of a pipe break event such that user impact is minimized. Alfonso et al. (2010) utilized genetic algorithms to find sets of optimal operational interventions in a water supply network for flushing a contaminant that may occur in the network. The optimization model considered both minimization to adverse public health due to the contaminant and operational costs needed for the flushing operation.
Based on the literature review several gaps were identified in the literature that required a tailored approach to consider the problem of dispatching water network repair crews. These gaps/needs can be summarized as: -Considering the impact on traffic due to the infrastructure failure itself, and due to the need to occupy any portion of the right-of-way during the repair operation. -Lack of models that specifically address the recurring problem of water pipe breaks and their emergency repair needs. As such, the objective of this paper is to build and extend these models through: 1) the consideration of multiple objectives (namely repair time, repair cost, and break impacts on water users/traffic); 2) capturing the various repair construction methods and equipment that are required for various break types considering factors like pipe burial depth, material and available right-ofway; and 3) consideration of the uncertain nature of the repair process by the development of stochastic simulation models for the crew repair and relocation process.

System framework
The proposed system framework relies on several related components as shown in Figure 1 and explained in subsequent sections.

The pipe criticality model
This is a network-level assessment tool for water distribution networks. Pipe criticality is calculated based on the concept of "consequence of failure" which is a common driver of work planning and scheduling within infrastructure asset management guidelines (NAMS 2011). Criticality considers several possible impacts of pipe failure, and is considered in the prioritization of pipe repair crews. An analytical hierarchy process (AHP) was used to better define the concept and drivers of pipe criticality. A series of workshops were held with water utility staff to develop a pipe criticality model that can be used to prioritize pipe break repair. Workshop attendees included participants from operations and maintenance, capital planning, and engineering departments.
The model considers three main criticality impacts: 1) direct economic consequence of failure; 2) impact on water system users; and 3) impact on road users. The model relies on a series of 9 criticality variables shown in Table 1. All variables are readily available from the GIS or hydraulic model for the water network. AHP was used to identify the relative weights for each criticality category and variable through pair-wise comparison during the workshops with utility staff. Resulting weights are shown in Table 2.
The model produces an overall Pipe Criticality Index (PCI) that ranges from 0-100. The PCI can be considered an overall proxy for the impact a pipe break has the water utility, water customers and road users. In the developed model it is considered that it is in the utility's best interest to give the highest priority in pipe break repairs to pipes with the highest PCI values.

Repair estimation model
This module allows the deterministic and stochastic estimation of repair time and cost. Time estimation considers the following aspects: 1) type of crew (and associated repair method); 2) pipe burial depth (deeper pipes will require more excavation and requirements for excavation support will be more stringent); 3) pipe material type and diameter (heavy pipes like large diameter concrete may require specialized lifting equipment). The typical pipe repair process proceeds as follows: 1. Site investigation and clearing: involves identifying the exact location of the break, valve isolation and shutoff, designation of any other buried utilities that may be adjacent to the break site, placing public warning signals, and clearing any obstacles that may be present on site (e.g. parked vehicles, tress, vegetation, etc.). 2. Excavation: involves removing the paved surface, excavation, installation of any needed side-support systems, and dewatering the excavation site.  Pipe Material Some pipe materials consume more time and money to fix. Also some pipes are known to fail catastrophically causing severe surface damage. The most notable example is pre-stressed concrete pipe.

Road Type
Reflects the relative importance of the road within the overall transportation network.

Number of road lanes
Used as a proxy for expected traffic volume as traffic count data was not available on all road segments. Roads with heavier traffic volumes will cause more disruption to traffic operations in case of a pipe breaks.

Land Use
Pipe breaks in certain types of land uses are known to contribute to more severe social impacts. Examples include dense commercial, downtown CBD and high density residential areas.

Serving Critical Customer
Water utility staff identified hospitals, schools, industrial facilities, and large commercial customers as critical customers. The hydraulic model was used to flag pipe segments that would cause service disruption to these customers in case of breakage.

Service Impact
Hydraulic model was used to identify the number of customers that was have disruption in service in the event of a pipe breakage. Disruption was defined as any drop in pressure below the minimum allowable service level (15 bar).

Operational Flag
In collaboration with the operations department, flag any pipes that are known to be difficult to repair. This factor was used to "override" any specific problem areas that were not directly captured through the criticality variables.

Pipe crossings
Failures of pipe crossing waterways, railroads, highways, cause much more damage, disruption and are more costly and time consuming to repair. 3. Pipe repair: two main methods are used for pipe repair depending on the extent of pipe damage. Clamp installation is used for minor breaks/holes that are found in the pipe wall and is effective for cast iron and ductile iron pipes. Clamps allow repair to occur without the full depressurization of the water main. In the case where a pipe segment is severely damaged a segment replacement must occur. In this case a partial/full pipe segment is removed and a new segment installed in its place. This technique is usually more time consuming and requires the full depressurization and disinfection of the line after repair (AWWA 2012). Repair time is usually influenced by pipe diameter and the type of repair. A total of five interviews with operations and maintenance staff from the Cairo Water Company were undertaken. Respondents were asked estimate the typical ranges for repair times based on diameter and repair method for different pipe types. An example is shown in Table 3.
It should be noted that the core contribution of the paper is not collecting the repair time duration but rather developing a comprehensive framework for optimizing the repair process across the network. The developed system allows the user to modify actual repair times based local conditions and constraints. 4. Site backfill and restoration: this involves hydrant flushing to remove any debris, reopening valves, backfilling with appropriate fill material, compaction and surface restoration activities. Impacts on water system users seize to exist once valves are reopened while impacts on road users will continue until surface restoration works are completed. Generally speaking, the water pipe repair process can be characterized by the following: 1) large number of activities taking place by different work crews; 2) significant uncertainty in the durations of many activities and tasks; 3) duration of overall repair job impacted by several external factors that cannot be completely foreseen. Hence, the use of a fully deterministic model to estimate total repair duration may not be suitable for optimal crew allocation. As such, the overall repair process is modeled using discrete event simulation via the STROBOSCOPE construction simulation package (Martinez, Ioannou 1994). STROBOSCOPE has been successfully used to model a wide range of construction operations like bridge construction (Marzouk et al. 2007), road paving operations (Nassar et al. 2003), and tunnel construction (Ioannou, Likhitruangsilp 2005).
The simulation models were created using standard repair sequencing and assuming no interruption of work occurs. The result of the simulation runs are probabilistic repair times that take into account various factors that are known to impact repair duration. Figure 2 shows repair times for a segment replacement in shallow depth for 5,000 simulation runs. In order to allow for real-time support for crew allocation problems, results of the simulation runs covering all cases of pipe diameter, burial depth and repair method were used to populate a database. During real time crew allocation, the optimization model matches the existing break case to the database to obtain the repair duration (both deterministic and stochastic).

Crew relocation estimation model
This module embeds the capabilities of the Google Maps API within ESRIs ArcGIS software. This enables leveraging the following capabilities: 1) site routing and determination of the shortest time between any two sites and 2) calculation of the travel times between pipe break sites. In some cities, Google maps enable adjustment of travel times to include expected traffic conditions which is a vital influence in congested urban areas. This module is able to calculate the relocation time and cost between pipes' break sites. Output from the aforementioned modules is fed into the multi-objective optimization module. The following section describes details of this module.

Optimization model
The optimization model takes into consideration three conflicting objectives: 1) total time to complete all break repairs; 2) total cost to complete all break repairs; and 3) total impact to system users caused by the breaks as measured by the pipe criticality index. In many cases assigning crews based on only one objective may significantly impact other objectives. For example in order to minimize costs and time, critical breaks may be scheduled later in the day in order to repair breaks in a geographically sequential order. In some instances repair time and cost can conflict. This occurs when repair crews that are equipped with large equipment that is typically Probability used for large breaks get assigned to smaller breaks. In this case, repair time will be minimized but costs will be excessive. As such, any comprehensive optimization module needs to consider all objectives simultaneously. The optimization model is capable of performing both deterministic and stochastic optimization. In the case of stochastic optimization, the results of the discrete event simulation of the pipe repair estimation module are fed into the optimization. When stochastic optimization is undertaken, a distinct optimization is solved for each simulation run separately and the overall dominating solution in all optimization trials is considered the preferred solution.
The time objective considers the total time taken by all repair crews to complete all breaks reported during the day. This time includes both repair and relocation times for all crews combined. The optimization model is built on a series of binary decision variables. The first binary variable is X i,j,k which takes on a value of 1 when crew 'i' is assigned to site 'j' utilizing repair method 'k' and takes a value of zero otherwise. As such the total repair time (TRT) can be calculated as follows: , , , where RT i,j,k is the repair time for crew 'i' at site 'j' using repair method 'k' and is calculated via the repair estimation module. CN is the number of available crews, SN is the number of repair sites and MN is the available number of repair methods.
In building the optimization model, the concept of "repair steps" is utilized. A repair step is the order in which a break site is repaired by a crew. The possible number of repair steps ranges from SN/CN (crews evenly distributed across repair sites) to SN (only one crew assigned to fix all sites and other crews idled). The actual number of repair steps O ranges from SN/CN ≤ O ≤ SN.
The second decision variable is , n i j Y which is a binary decision variables that takes the value of 1 when crew i as assigned to work site j during time step n. This decision variable is used to calculate the relocation time and cost. This variable is used to track the movement of crews between sites. As such the total relocation time (TLT) can be calculated as follows: where LT j,k is the relocation time between sites j and k as calculated from the crew relocation estimation module. The total time to complete all repairs by all crews (TT) is the sum of total repair and relocation time: Similarly the total repair cost (TRC) and total relocation cost (TLC) can be calculated in a similar manner: where RC i,j,k is the repair cost for crew "i" at site "j" using repair method "k" and is calculated via the repair estimation module and LC j,k is the relocation cost between sites j and k as calculated from the crew relocation estimation module. The total cost to complete all repairs by all crews (TC) is the sum of total repair and relocation costs: In order to include the objective of pipe criticality, the Cumulative Criticality Index (CCI) is calculated as product of a pipe criticality index and the time between breakage and completion of repair: where SRT j is the total time needed to complete all breaks and crew relocations prior to reaching site j and PCI j is the pipe criticality index for the broken pipe at repair site j as calculated by the pipe criticality model.

Multi-objective optimization
The approach utilizes a goal-optimization based multiobjective optimization procedure for solving the problem at hand. Rather than utilizing the concepts of pareto optimal dominating solutions (e.g. Non-dominated Sorted Genetic Algorithms) that yield a large number of parteo optimal solutions, the goal optimization based approach is utilized. This has two main advantages: 1) does not require a psteriori intervention by the decision maker after the pareto optimal solutions are generated in order to select the final optimal solution; 2) decrease the computational complexity of the optimization problem allowing real-time optimization of repair plans which is vital when dealing with emergency infrastructure repair that is constantly being updated throughout the day like water main breaks. Goal optimization principles are used to structure the optimization problem such that it is sought to minimize deviations from set goals. The goal optimization formulation is able to consider multiple, conflicting and incommensurable objectives, which is the case with the time, cost and criticality objectives (Schniederjans 1995). Goal optimization, sometimes referred to as goal programming (GP) is a mathematical optimization technique, quite similar to linear programming, although it has the capability to handle several conflicting goals. In GP terminology, a set of goals, G i , where i =1, 2, 3, …, n, need to be achieved simultaneously. The objective function is then formulated to minimize the sum of deviations from these prescribed goal values (Atef et al. 2012).
The optimization process proceeds in a two-stage process. First three distinct single-objective optimization problems are solved considering each objective separately followed by a multi-objective optimization where all objectives are considered simultaneously. For the sin-gle-objective optimization each solution is different and yields a time, cost and criticality goal (T G , C G and R G ). T G is the least possible total time that can be achieved and is calculated by minimizing TT in Eqn (3). C G is the least possible total cost that can be achieved and is calculated by minimizing TC in Eqn (6). R G is the least possible cumulative criticality that can be achieved and is calculated by minimizing CCI in Eqn (7). T G , C G and R G are considered the best possible objectives that can be attained. In essence they are the "goal" the multi-objective optimization formulation seeks to meet. The objective function is formulated such that normalized deviations from goals are minimized as per the following equation: Due to the computational complexity of optimization problem and in order to allow for the tool to be easily accessible to operations crews the aforementioned optimization model is implemented in a programmable spreadsheet environment. The structure of the chromosome for each solution and the interaction between the genetic optimization algorithm and other models in the system is shown in Figure 3. The model starts by generating N random solutions based on the number of repair sites (SN) and repair crews (CN). Each solution represents a particular crew allocation order. For each solution the total repair time (TT), total repair cost (TC) and cumulative criticality index (CCI) is calculated using the modules described in the previous section. Based on these calculated fitness values, the solutions are sorted and given a rank that represents the fitness of each solution compared to other solutions. The best solutions are then selected to undergo the genetic operators of crossover and mutation in order to generate a new population of solutions. These steps are repeated until a set convergence criteria is reached. This approach has been adapted from that used by Orabi et al. (2009).

Case study
The model was tested on a portion of the water distribution network for the City of Damietta in Egypt. The total network length is 220 km and is composed of 1,250 pipe segments in GIS. Diameters ranged from 100 to 800 mm and material types included steel, PVC and ductile iron. The portion of the City that was studied had good variation in land use and road types so offered good variability in overall pipe criticality.
The case study area includes older areas of the City where aging water pipes have been known to break at increasingly high rates. The study areas also included 2 major arterial roads and a busy downtown area. As such the studied area offered a good sample to test how emergency repairs, and their impact on the water network and roadway should be addressed when multiple conflicting objectives of time, cost and community impact are considered.
In order to speed up the travel time calculations, the City was divided into 8 zones as shown in Figure 4. Travel times between each zone were calculated based on the Google Maps API. The case study assumed 13 breaks were reported and only 3 repair crews were available to address them. These figures were suggested by the City's water utility to mimic their maximum encountered break rate and minimum available crews. This constitutes a worst-case scenario and was used as the test case. As shown in Table 4, four out of the 13 break sites had relatively high criticality (pipe id 295, 24, 43, and 386). This was mainly due to the fact that they were located in high density areas, major roads, or were large diameter pipes.
Four different optimization problems were solved. First, single objective optimization was conducted for time, cost and criticality. For each optimization problem the resulting time, cost and cumulative criticality index were calculated. Following establishing the time, cost and criticality goals (T G , C G and R G ) the multi-objective optimization problem was solved as per Eqn (8). Each pipe break site was assigned a crew and a step (order) as shown in Table 4. The following observations can be made regarding these results: -Large variations were observed between crew assignments across different optimization problems.  -When only cost was considered, crew 3 was selected to do all repairs due to its lowest unit cost rate and other crews were idled. This resulted in a huge total time for repair and network criticality index. -Making decisions based solely on time and cost optimization can have significant impact on the consequences of the pipe break. For example pipe #295 was considered the most critical piece of infrastructure in the case study network. When considering the cost objective only it was scheduled for repair in step 11 (after more than 40 hours since the break occurred). When considering time objective only it was the last site to repair for crew 1 (more than 15 hours after break occurrence). Both cases are unacceptable. -Making crew allocation decisions based solely on criticality tended to have an unfavourable impact on total time. This can be explained by the fact that crews may not have been allocated based on their suitability to address the type of break nor their proximity to the break site but rather to fix the most critical break first. -The multi-objective optimization resulted in a 2.5% deviation from the total time target, a 5% deviation from the total cost target and a 50% deviation from the criticality target (Table 5). This suggested that the optimization problem will be highly sensitive to the weights assigned for each objective. -The optimization model run time was within the order of 1-3 minutes for each of the four optimization models running on a 2.30 GHz computer. This performance would allow a water utility to obtain real-time optimal crew allocation plans to respond to break events. In the case of new break being reported during the day, the optimization model can be re-run with real-time crew location and the plan adjusted if necessary. Scalability of the optimization model to address larger urban areas is still being studied.

Evaluation
In order to evaluate the results of the model, it is compared to system repair heuristics that are used by some water utilities. The Damietta Water Company utilized a break prioritization rule based on pipe diameter such that crews would always fix pipes in decreasing size. The rationale behind this heuristic is that pipe diameter plays an important role in determining the criticality of a pipe.
Using this heuristic, total time, cost and cumulative criticality were calculated for the case study that was discussed in the preceding section. The optimization model showed significant improvement over the commonly used heuristic with regards to total repair time and impact on system users (as measured by the criticality index). Improvements in cost were marginal as shown in Table 6. Based on the case study data, utilizing a more comprehensive approach to assign repair crews has the potential to improve crew utilization and reduce community impacts of water main breaks.
In comparison to other models related to optimal resource allocation for infrastructure repair the developed model offers several advantages: -Captures infrastructure interdependencies that occur due to the failure event and repair process as manifested in traffic impacts. Although this impact does not directly affect the water utility, the livelihood of congested urban areas can be severely impacted by water main break events and should be considered in the decision making process. -Model considers different infrastructure repair methods and their consequential cost and time impacts. By conducting this trade-off, the model is able to allocate resources in a manner that considers both the impact on water customers (repair time) and the cost of repair. This set improves the ability of the utility to better develop level of service standards within the context of service affordability. -Developed model is fully implemented within commercial software technology (MS-Excel) that is readily available to most water utilities, hence increasing the practicality of the developed model. -The model is capable of undertaking stochastic estimates of the repair process and including this information in the optimization process. The highly uncertain nature of emergency repair of buried infrastructure systems is captured and included in the subsequent decision making process.

Summary and conclusions
Water utilities that are faced with limited capital replacement budgets and aging infrastructure are expected to deal with an increasing number of water pipe breaks. Developing adequate response plans for these events is needed in order to minimize maintenance costs and continue to deliver the highest possible service to its customers. As such, this paper presented an optimization-based framework for allocating limited repair crews to break sites that can be scattered across large areas within cities. When compared to allocation heuristics commonly used by water utilities, the framework was shown to decrease cost, time and impact on users as measured by a criticality index.
The proposed model has several limitations and future work is required to address these issues. First of all for critical pipe infrastructure, the water utility's management approach should be reactive rather than proactive. Failures should not be allowed to occur and hence the model developed in this paper is more applicable to medium -low criticality pipe infrastructure. Secondly, the criticality model that was used relies on a simplified weighting system rather than a more comprehensive analysis of consequences of failure. Future work should integrate the capabilities of water network hydraulic modelling and traffic simulation models to provide a more reliable assessment of the true impacts of pipe failure on these larger systems. Also, integrating this work with pipe deterioration models can allow water utilities to forecast where their breaks are most likely to occur and hence provide a more strategic planning tool for repair crew allocation. Other future enhancements include enhancing the exiting optimization model to be a pareto-optimal multi-objective scheduling model.