SHORT-TERM TRAFFIC FLOW PREDICTION USING A METHODOLOGY BASED ON AUTOREGRESSIVE INTEGRATED MOVING AVERAGE AND GENETIC PROGRAMMING

. The accurate short-term traffic flow forecasting is fundamental to both theoretical and empirical aspects of intelligent transportation systems deployment. This study aimed to develop a simple and effective hybrid model for forecasting traffic volume that combines the AutoRegressive Integrated Moving Average (ARIMA) and the Genetic Programming (GP) models. By combining different models, different aspects of the underlying patterns of traffic flow could be captured. The ARIMA model was used to model the linear component of the traffic flow time series. Then the GP model was applied to capture the nonlinear component by modelling the residuals from the ARIMA model. The hybrid models were fitted for four different time-aggregations: 5, 10, 15, and 20 min. The validations of the proposed hybrid methodology were performed by using traffic data under both typical and atypical conditions from multiple locations on the I-880N freeway in the United States. The results indicated that the hybrid models had better predictive performance than utilizing only ARIMA model for different aggregation time intervals under typical conditions. The Mean Relative Error (MRE) of the hybrid models was found to be from 4.1 to 6.9% for different aggregation time intervals under typical conditions. The predictive performance of the hybrid method was improved with an increase in the aggregation time interval. In addition, the validation results showed that the predictive performance of the hybrid model was also better than that of the ARIMA model under atypical conditions


Introduction
The development of the dynamic freeway traffic management systems has prompted the research for proactive traffic management strategies to mitigate traffic congestion on freeways. Toward this goal, a large amount of studies have applied an extensive variety of time-series models to produce short-term traffic variables forecasting, such as traffic volume, traffic speed, travel time, etc. (Hamed et al. 1995;Vlahogianni et al. 2005;Ghosh et al. 2005Ghosh et al. , 2007Chandra, Al-Deek 2009;Chen et al. 2012;Hamad et al. 2009;Wang, Shi 2013). The short-term traffic-forecasting models were developed to extrapolate traffic variables into the near-term future based on the past observations of the same traffic variables measured with traffic surveillance systems (Smith et al. 2002;Vlahogianni et al. 2005Vlahogianni et al. , 2007Turochy 2006;Zhang, Xie 2008;Zhang, Ye 2008;Dimitriou et al. 2008;Huang, Sadek 2009;Hamad et al. 2009;Min, Wynter 2011;Chen et al. 2012;Dunne, Ghosh 2012;Wei, Chen 2012;Wang, Shi 2013). One of the practical applications of the shortterm traffic-forecasting models is to help travellers select their travel routes or plan their trips in advance based on real-time traffic information. It can also help to develop proactive traffic management strategies for traffic congestion prevention and mitigation.
Over the past several decades, much effort has been devoted to the development and improvement of forecasting short-term traffic variables. Of the conventional statistical methods, the AutoRegressive Integrated Moving Average (ARIMA) family of models has been extensively utilized in constructing the forecasting models (Hamed et al. 1995;Williams 2001;Smith et al. 2002;Williams, Hoel 2003;Ghosh et al. 2005Ghosh et al. , 2007Chandra, Al-Deek 2009). For example, Hamed et al. (1995) employed ARIMA to develop a model for short-term prediction of traffic volume in urban arterials. Smith et al. (2002) compared the predictive performance of the ARIMA model and the nearest neighbour technique in forecasting traffic flow on highway. The results demonstrated that the ARIMA model produced better predictive performance than the nearest neighbour technique did. Ghosh et al. (2007) used the Bayesian ARIMA model in developing a short-term traffic flow-forecasting model. It was found that the Bayesian model could better match the traffic behaviour of extreme peaks and rapid fluctuation. However, the major limitation of the ARIMA model is the pre-assumed linear correlation structure among the time series values. The approximation of linear models to complex real-world problems is not always adequate (Zhang 2003;Aladag et al. 2009). Previous studies also suggested that the linear statistical algorithm was not adequate to capture the complicated process underlying traffic (Hamed et al. 1995;Williams 2001;Stathopoulos, Karlaftis 2003).
In response to the limitations associated with the conventional statistical methods, a number of studies have proposed non-parametric methods and artificial intelligence models for developing short-term traffic flow forecasting models. These models include Artificial Neural Network (ANN) model (Smith, Demetsky 1997;Zhang 2000), recurrent neural networks (Van Lint et al. 2002), genetically optimized neural networks (Vlahogianni et al. 2005(Vlahogianni et al. , 2007, Support Vector Machine (SVM) prediction model (Vanajakshi, Rilett 2004;Zhang, Xie 2008), and wavelet network model (Xie, Zhang 2006). Although these models could capture the nonlinear pattern of traffic flow and produce better predictive performance than conventional statistical methods, the major limitation associated with these models is that these models work as black boxes, which cannot be directly used to identify the relationships between input variables and output variable by a mathematical equation.
This study aimed to propose a simple and effective hybrid model for forecasting traffic volume that combines the ARIMA model with Genetic Programming (GP). Combining these two models could enhance the possibility to capture the linear and nonlinear patterns within traffic flow data and to improve the predictive performance. Previous studies also suggested that combining different models could improve the prediction accuracy over the individual model (Zhang et al. 2011;Wang, Shi 2013). GP is a relatively new modelling technique, which was proposed to solve the classification and regression problems. The GP model is an evolutionary computation method introduced by Koza (1992). In recent years, GP model has gained considerable attention in transportation engineering for regression (Das et al. 2010) and classification analyses (Xu et al. 2013). The GP model has two major advantages over the traditional statistical regression and artificial intelligence models. First, with GP model, there is no need to specify any pre-specified functional forms. The solutions of the GP model can be any functional forms describable by mathematics. The GP model could select the best functional form for the solution to the problem based on the features presented from the data. Second, in contrast to the 'black box' solutions in artificial intelligence models, the solution of the GP model is an easily readable math-ematical model, which defines the tangible relationship between input variables and output variable. This allows the results of GP models to be easily applied in practical engineering applications. In addition, previous studies also suggested that the GP model could produce better predictive performance over the traditional methods (Ong et al. 2005;Lensberg et al. 2006;Etemadi et al. 2009;Lee, Tong 2011). So far, no applications of the GP model for short-term traffic flow forecasting have been identified by the authors.

Methodology
The basic principles and modelling process of the ARI-MA and GP models are summarized in the following as the foundation to describe the hybrid model.

The ARIMA Model
The ARIMA model was introduced by Box and Jenkins (1976). The Auto Regressive Moving Average (ARMA) has been widely used in forecasting time series. In an ARMA(p, q) model, the value of the time series in the next period is assumed to be a linear function of several past observations and random errors, as represented in the following: where: y(t) and ε(t) denote the actual value and random error at time period t, respectively; f i (i = 1, 2, …, p) and θ j (j = 0, 1, 2, …, q) are the parameters of the model; p and q are integers and referred to as the orders of the autoregressive terms and moving average terms; ε t are assumed to be white Gaussian noise. After calibrating the model parameters f i and θ j using specific sampled data, the one-step forecast of y(t) can be estimated as: where: ; θ j (j = 0, 1, 2, …, q), ϕ i (i = 1, 2; …, p) are the estimated parameters of the ARMA model; ( ) − y t i are the known historical traffic volume data; and ( ) − y t i are the predicted volume of the ARMA model.
The ARIMA model is a generalization of the ARMA model. In an ARIMA(p, d, q) model, the parameter p and q are the same to those in the ARMA model. The parameter d represents the d-th order difference of the original data series, which aims to remove the trend from the data series. By introducing the backshift operator B (that is, −1 By t y t ), the Eq. (1) for ARMA(p, q) can be written as: where: ( ) φ B is the autoregressive operator which is represented as a polynomial in the backshift operator: Similarly, the ARIMA model can be written as:

The GP Technique
The GP model is an evolutionary computation method introduced by Koza (1992). The GP model can be used to generate mathematical models, which represent approximate or exact solutions to a problem (Koza 1992). It can be considered as an extension of the genetic algorithms (GA). The main difference between GP and GA is the representation of individuals. The individuals in a GA model are numbers coded as fixed-length binary strings, while the individuals in a GP model are mathematical models coded as function trees (Koza 1992;Xu et al. 2013). An example of function tree in GP model is given in Fig. 1. The inner nodes represent the mathematical functions such as '+' and '÷' , and the leaf nodes represent the predictors and constants. The left most tree in Fig. 1 represents the mathematical model In a particular problem, the list of functions and predictors should be specified. The mathematical models in GP are generated from the pre-specified set of functions and predictors.
In general, GP works on a population of mathematical models (individuals) based on evolution theory. In each generation, multiple models are stochastically selected based on their fitness, and modified to form a new population of models by genetic operations. The new population of models is then used in the next iteration of the algorithm. A GP model will stop when the predetermined maximum number of generations has been produced or the predetermined fitness level has been reached for the population. The evolution process is expected to produce continuously a better model for a problem.
The new models in a GP model are usually created by three genetic operators, including crossover, mutation, and reproduction. The reproduction operator simply selects a proportion of models and includes them into the next generation without any alterations. The creation of new or offspring models from the crossover operation is accomplished by combining information extracted from the selected parents. Two parent models are randomly selected based on their fitness level and sub-trees are chosen from both parent models. Then the crossover operator swaps the sub-trees from the two parent models. Fig. 1 illustrates an example of crossover operation.
The purpose of mutation operator is to introduce new information into the population and avoid the premature convergence of a GP model. In mutation operation, a single parent is randomly selected based on its fitness level. A random sub-tree on the parent model is selected and replaced with a new random tree created from the pre-specified set of predictors and functions (Fig. 2). In the procedure of generating a random tree, the node at the initial tree depth level is first randomly chosen from the set of functions. Then its children node(s) are randomly chosen between functions set and predictors set. The random tree will stop growing when reaching the maximum tree level. Readers may consult Koza (1992) for full description of this procedure.  The fitness function of a GP model determines how well a model in the population is able to solve the problem. The fitness function varies greatly across different types of problems. The fitness function is usually developed based on the error between the values predicted by the model and the actual data. In this study, a fitness function for short-term traffic flow forecasting was developed based on the Mean Absolute Error (MAE). Assuming a dataset S y x y x y x of input variables x i for output variable y i , the functional form of the fitness function is expressed as follows: where: F(B j ) denotes the fitness of the j-th model B j in the population; B j (x i ) is the value calculated by the j-th model B j in the population. The GP model uses the following steps to solve problems: (a) initialization -create at random an initial population of M models; (b) execute each model in the current population on training dataset and evaluate the fitness of each model in the current population; (c) select the parent models, which will be used to produce offspring models; (d) select the reproduction, crossover, and mutation operators probabilistically; (e) generate a new model by performing one of the three genetic operators; (f) repeat step (c) to step (e) until the predetermined population size M has been reached; (g) replace the M old models by new generated M models; (h) repeat step (b) to step (g) until the predetermined maximum generation N has been reached. The model with the best fitness level in any generation is designated as the result of GPs.

The Hybrid Methodology Based on ARIMA and GP
Since it is difficult to completely know the characteristics of the traffic volume time series data, hybrid methodology that has both linear and nonlinear modelling capabilities can be a good strategy. By combining different models, different aspects of the underlying patterns of traffic flow may be captured. This study proposed a hybrid model that combines ARIMA for modelling the linear component L t of traffic flow time series and the GP for modelling the nonlinear component N t , as follows: where: y(t) represents the actual value at time period t; L t and N t denote the linear component and nonlinear component of the model respectively; ξ t denotes the random error term. The residuals from the ARIMA model (r t ) were calculated as follows: where: ˆt L is the predicted value of L t , which is estimated using the ARIMA model. By modelling the residuals from the ARIMA(r t ) using the GP model, nonlinear relationships can be discovered. With n input variables, the GP model for the residuals r t can be written as: where: ξ rt denotes the random error term; represents the nonlinear function constructed using the GP model. Using the GP model to construct the nonlinear component of time series can generate a mathematical equation than ANN and SVM model. Thus, in practice, the predicted values using GP can be verified through the mathematical equation. The estimation of the residuals r t can be determined by Eq. (8). Then the predicted values of the time series are estimated as follows: The proposed hybrid approach uses the following steps to forecast traffic flow: 1) Model the linear component of the time series using ARIMA model, and estimate ˆt L using ARIMA model.

Data Sources and Evaluation Criteria
Data were obtained from the highway Performance Measurement System (PeMS) maintained by the California Department of Transportation (Caltrans), US. The PeMS database provided 30-sec raw loop detector data, including vehicle count, vehicle speed, and detector occupancy. The traffic data were collected from the Detector 401561 (Site A) and Detector 401517 (Site B) located on the northbound freeway I-880 (Fig. 3). The freeway has five lanes at the selected sites. The 30-sec raw traffic data were collected from all the five lanes. As shown in Fig. 3, the selected two detectors are far away from each other and have a number of ramps in between. Thus, the traffic data collected at the two sites are considered to have low correlations. The PeMS database also provides the detailed traffic incident data, including incident type, starting time, location and duration. As discussed in Stathopoulos and Karlaftis (2003), Dunne and Ghosh (2012), and Chen et al. (2012), the traffic flow series recorded on weekdays were substantially different from those recorded on the weekends or holidays. The prediction models for weekday might produce unsatisfactory results for traffic data on weekends. Thus, for consistency purposes, this study only focuses on the weekday traffic flows.
The missing data problems are unavoidable in traffic flow data. Previous studies suggested that the missing data problem greatly affected traffic analysis (Zhong et al. 2004;Xin et al. 2006;Qu et al. 2009;Chen et al. 2003Chen et al. , 2012. The missing data should be imputed before developing the traffic-forecasting model. Different statistical methods and artificial intelligence models have been used for missing data imputation, such as, the Bayesian networks (Chen et al. 2003), the Bayesian principal component analysis (Qu et al. 2009), the ANN (Zhong et al. 2004), and the Probabilistic Principal Component Analysis (PPCA) (Qu et al. 2009). Since the PPCA can quickly produce accurate imputations (Qu et al. 2009), the PPCA was used in this study to impute the missing values in the dataset. The PPCA also has the advantage of appropriate combing both neighbouring historical flow data and current-day flow data (Qu et al. 2009). The reader may consult Oba et al. (2003) and Qu et al. (2009) for full description of the PPCA method.
The measurement noises and useless traffic fluctuations in the high-resolution traffic data (lower than 1 min) can decrease the predictive performance of the prediction models (Castro-Neto et al. 2009;Chen et al. 2012). Accordingly, the 30-sec raw detector data was first aggregated into 5-min traffic data by summing up the 10 observations of the 30-sec traffic volumes: where: y denote the aggregated traffic volume; q i represent the average traffic volume across different lanes; n represent the number of observations during the aggregation time interval. If there are any missing values of the 30-sec traffic volume during a 5-min interval, the traffic volume for this 5-min interval was labelled as a missing value. The PPCA method was conducted on the 5-min traffic data to impute all the missing values within it. The imputed 5-min traffic data were further aggregated into 10-min, 15-min and 20-min time interval using Eq. (10). The proposed hybrid models were fitted for these four different time-aggregations: 5, 10, 15, and 20 min. Previous study suggested that the traffic flow prediction model developed by normal traffic data may produce poor predictive performance when incidents or atypical situations are present (Castro-Neto et al. 2009;Guo et al. 2013). Hence, the predictive performance of the proposed hybrid model was evaluated with traffic data under both normal conditions (Scenario 1) and incident conditions (Scenario 2). In Scenario 1, the used traffic flow data were not significantly affected by incidents, such as crashes. The traffic flow data at Sites A and B were collected from 1 May 2012 to 1 June 2012. To achieve more reliable and accurate estimations, a long period of traffic flows were selected as training dataset (Zhang et al. 2011). The traffic flow data from the weekdays in May 2012 were used as the training dataset and the traffic flow data on 1 June 2012 were used as the validation dataset for Scenario 1. Table 1 summarizes the descriptive statistics of the training and validation dataset for Scenario 1 based on the 30-sec traffic data.
In Scenario 2, the traffic data under incident conditions were collected to test the predictive performance of the proposed hybrid model under incident conditions. The only difference between Scenarios 1 and 2 was that the validation dataset for Scenario 2 contained the traffic flow data under incident conditions. The predictive performance of the models developed based on the training dataset in Scenario 1 was tested on the validation dataset for Scenario 2. Table 2 summarizes the descriptive statistics and characteristics of the traffic data under incident conditions in Scenario 2.
To compare the predictive performance of the ARI-MA and the proposed hybrid model, the following four performance indexes were used: 1) the Mean Absolute Error (MAE): 2) the Mean Relative Error (MRE):

Model Development
A statistical analysis of a time series requires that the time series are stationary. In other words, this time series should have the same statistical behaviour at each point in time. Forecast of statistical models, including the ARIMA model, based on non-stationary series usually exhibit large errors (Washington et al. 2003). Readers may consult Washington et al. (2003) for full explanation of the requirement of stationarity in the time series analysis. Thus, before modelling a time series, the data must be stationary. Fig. 4a illustrates the 5-min traffic data of the whole training dataset at Site A. Fig. 4b and 4c illustrate the AutoCorrelation Function (ACF) and the Partial AutoCorrelation Function (PACF) of the 5-min traffic data, respectively. The ACF plot indicates that the traffic volume series is non-stationary, since the ACF decays very slowly.
The 5-min traffic volume series become stationary after the first-order differencing. The first-order difference of 5-min traffic volume does not have a visible trend and its ACF and PACF decay quickly (Fig. 4d-f). The Augmented Dickey Fuller (ADF) test was further conducted to test the stationarity. The ADF test result indicates that the null hypothesis of non-stationarity can be rejected at the 0.01 significance level after the first differencing was performed. Thus, the first-order difference of 5-min traffic volume is stationary and can be used for the ARIMA model development.
To identify the best ARIMA model for the 5-min traffic data at Site A, the ARIMA models were developed for different combinations of parameter p and q. The parameter p and q were set from 0 to 10. The Akaike's Information Criterion (AIC) was used to find the best ARIMA model. It was found that the AIC reached a minimum when p and q were set to be 3 and 2, respectively. Besides, it was ensured that all the variables in the ARIMA model were statistically significant ( Table 3). The residuals analysis was further conducted for the developed ARIMA model to make sure there is no pattern remaining. Fig. 5 illustrates the graphical check of the residuals from the developed ARIMA model for the 5-min traffic data at Site A. As shown in Fig. 5a, 5b, the autocorrelations of the residuals from the ARIMA model are very small and insignificant. The partial autocorrelations (Fig. 5c) and inverse autocorrelations (Fig. 5d) of the residuals are also negligible. The white noise test was also conducted on the residuals. The results of the white noise test in Table 4 indicate that the residuals from ARIMA model have no pattern remaining, and that the best ARIMA model for the 5-min traffic data at Site A has been identified. The other 7 ARIMA models for different time-aggregations were developed using the same procedure. Tables 3 and 5 summarize the estimation results of the ARIMA models at Sites A and B for different aggregation time intervals, including 5, 10, 15 and 20 min.   The GP models were developed to predict the nonlinear component of the traffic flow time series. The parameters used in the GP models are given in Table 6. The function set contained 8 standard arithmetic operators, including +, -, ×, ÷, protected square root, sin, cos, and pow(2, x). If A £ 0, the protected square root of A equals to 0. When A > 0, the protected square root of A equals to the square root of A. The function pow(2, x) represents two raised to the power, x. The population size was set to 1000, and the maximum number of generations was 100. The reproduction probability was 0. The purpose of doing so was to let the crossover and End of Table 3 mutation operation govern the evolutionary process (Xu et al. 2013). The probabilities of the crossover and mutation were set to be 0.4 and 0.6, respectively. Implementing a lower crossover probability and a higher mutation probability can avoid genetic drift (Das et al. 2010), which is the accumulation to a sub-optimal solution in the search space. The terminal set included the constant terminals (randomly generated floating point numbers between -10 and 10) and the residual lagged variables (i.e., r t-1 , r t-2 , …, r t-n ).
To select an optimal number of residual lagged variables, the GP model was conducted in a successive phase in which the number of residual lagged variables n was set from 1 to 10. The number of 10 is expected to cover the possible n that ensures the best prediction accuracy. The optimal number of residual lagged variables in previous studies that use the similar hybrid model is usually lower than 10 (Zhang 2003;Aladag et al. 2009;Lee, Tong 2011;Zhang et al. 2011). The value would be selected when the prediction accuracy of the GP model reached a maximum. After the development of the GP model for the 5-min traffic data at Site A, the residuals from the hybrid model for 5-min interval was also analysed to ensure that there is no pattern left. The white noise test of the residuals from the hybrid model in Table 7 indicates that there is no pattern remaining in the residuals from the hybrid model for 5-min interval. Thus, the best GP model for 5-min interval at Site A has been identified. The other 7 GP models for different time-aggregations were developed using the same procedure. The white noise tests also indicate that there are no patterns left for these 7 hybrid models. Figs 6 and 7 illustrate the GP models for different aggregation time intervals at Sites A and B.

Predictive Performance under Normal Conditions
Tables 8 and 9 compare the predictive performance of the ARIMA models against that of the proposed hybrid models for Sites A and B under normal conditions. These two tables report four performance indexes on the validation dataset for Scenario 1 for different aggregation time intervals, including MAE, MRE, MSE and MSRE. As shown in Tables 8 and 9, the hybrid model produces better predictive performance than that of the ARIMA models for different aggregation intervals. By comparing the performance indexes for different aggregation time intervals, it can be found that the predictive performance of the hybrid method increases with an increase in the aggregation time interval. This may imply that data aggregation could suppress the effects of the measurement noises and useless traffic fluctuation information.
For further comparison of the predictive performance of the ARIMA and hybrid model, Figs 8 and 9 illustrate the predicted volumes of the models against the actual values for different time-aggregations at Sites A and B. In addition, Figs 8 and 9 also summarize the regression coefficients for the fitted linear relationship between the actual and predicted values. For different time-aggregations at both sites, the R-square values of the hybrid models are all greater than those of the ARIMA model, indicating that the predicted values of the hybrid method have higher correlation with the actual values.
The above results reveal that the hybrid models have better forecasting accuracy than the ARIMA model. This indicates the advance nature and effectiveness of combining the GP model with the ARIMA model. The hybrid strategy can better capture the characteristics of the traffic flow time series data. Moreover, the hybrid model can display a mathematical equation which can be easily used to forecast traffic volume in practice. For example, the hybrid model for 20-min interval at Site A is composed of a linear component and a nonlinear component. The linear component is estimated by the ARIMA model for the 20-min interval shown in Table 3, and the nonlinear component is obtained by the equations shown in Fig. 6.
For illustrative purposes, the prediction results of the hybrid model and the original observations for different aggregate time intervals at Sites A and B are shown in Figs 10 and 11. The hybrid model provides reasonably accurate forecasts of traffic volume. In general, the hybrid model has lower prediction errors for larger aggregation time intervals, and has higher prediction errors for greater traffic volumes.  Comparisons of prediction accuracy have also been made with several previous studies shown in Table 10. The prediction accuracy of the proposed is relatively good compared with the models in previous studies (Table 10). Table 10 also gives the improvements of the proposed models in previous studies over traditional models. It can be concluded that the improvements of the proposed model in this study are relatively high.   Table 11 gives the Central Processing Unit (CPU) times needed for the estimation of the hybrid model parameters, and the CPU times needed for the application of estimated hybrid models for one prediction using a desktop computer (3.4 GHz CPU and 8GB RAM).  Although calibrating a hybrid model needs a relatively long time, the estimated model needs very short time to make a prediction. The CPU running times required by one prediction of the estimated models are less than 0.1 second. Thus, the developed models have the potential to be used for online traffic control and management.

Predictive Performance under Atypical Conditions
The predictive performance of the hybrid model and the ARIMA model on the validation dataset for Scenario 2 (incident conditions) was tested. Since the durations of the most incidents on the I-880N freeway are lower than 60 minutes, we only tested the predictive performance of the hybrid model for the 5-min interval. The prediction model for the long time interval, such as the 20-min interval, can only make 3 predictions for a 60-min period. This may lead to unstable estimates of the predictive performance of the hybrid model. Fig. 12a and 12b illustrate the traffic flow data under incident conditions and the traffic data under normal conditions (average volumes across the 23 weekdays in May 2012). Traffic volumes under incident conditions were significantly lower than those observed on the normal weekdays. Fig. 12c and 12d illustrate the actual values and predicted values from the ARIMA and hybrid models for two sites. During the period of incident, the predicted values of the hybrid models are more closed to actual values than those predicted by the ARIMA models for both sites, indicating that the predictive performance of the hybrid model is better than that of the ARIMA model even under incident conditions.
The predictive performance indexes of the hybrid model and the ARIMA model under incident conditions are given in Table 12. It should be noted that these performance indexes were calculated for the pe-riod that began about 20 minutes before the occurrence of the incident and ended about 20 minutes after the traffic flow back to normal conditions. Previous study suggested that this could help evaluate the models' capability of responding to unexpected changes in traffic flow, as well as the ability of these models to recover the prediction performance when traffic flow returns to the normal patterns (Castro-Neto et al. 2009). As shown in Table 12, compared with the ARIMA model, the hybrid model can increase the MRE by about 9% on the validation dataset for Scenario 2. Thus, combining the GP model with the ARIMA model can better capture the characteristics of the short-term traffic flow time series data under incident conditions.

Conclusions
This study proposed a hybrid methodology, which combines the ARIMA and GP models for short-term traffic flow forecasting. Compared with the models in previous studies, the proposed method has the following advantages. First, the hybrid model can better capture the linear and nonlinear patterns within traffic flow data and improve the predictive performance. Second, the GP technique in the hybrid model does not need pre-specified functional forms and can select the best functional form based on the training data. Finally, un- Thus, the proposed model can be easily applied in practical engineering applications. The major shortcoming of the proposed model is that the GP model is a computationally intense algorithm that requires a great amount of machine running time. It usually takes relatively long time for training a GP model when the number of observations in the training dataset is quite large. However, the calibrated model only needs extremely short time to make predictions. The hybrid models were fitted for four different time-aggregations: 5, 10, 15, and 20 min. The validations were performed by using traffic data under both normal and incident conditions obtained from multiple locations on the I-880N freeway in the United States. The results showed that the hybrid models have better predictive performance than utilizing only ARIMA model for different aggregation time intervals under normal conditions. The MRE of the hybrid models was found to be from 4.1 to 6.9% for different aggregation time intervals under normal conditions. The predictive performance of the hybrid method increases with an increase in the aggregation time interval. In addition, the validation results also showed that the hybrid model can still produce satisfactory predictive performance under incident conditions. The predictive performance of the hybrid model is better than that of the ARIMA model under incident conditions. With regard to the aggregation level, the hybrid model for 5-min interval is more appropriate for practical application. The reasons are as follows. First of all, for incident traffic conditions, the hybrid model is expected to forecast traffic flow in high resolution, as the dynamic traffic management system needs to mitigate and minimize the adverse effects of incidents in a timely fashion. In addition, for the normal traffic conditions, the hybrid model for 5-min interval can also achieve relatively good prediction accuracy of 93%. The hybrid model for 5-min interval can provide good prediction accuracy for both normal and incident traffic conditions. Second, the 5-min traffic data are commonly used in practical engineering. The hybrid model for 5-min interval can be easily applied in practical applications by using the 5-min traffic data. Finally, previous studies about short-term traffic-forecasting also recommended to developed prediction model for 5-min interval.
The proposed hybrid model has the potential to be used for short-term traffic flow forecasting in practice. However, before the hybrid method is used in practical applications, additional research is still needed to further improve the model predictive performance. First, the effects of the other factors such as time of the day and weather conditions could be considered. Incorporating these factors as input variables may further improve the model fitness. Second, this study only modelled the traffic data from a single isolated detector. By combing the traffic information from adjacent loop detectors, the predictive performance of the hybrid model may be further improved. Finally, additional traffic data from other freeways are needed to test the transferability of the proposed model. The authors recommend that future studies may focus on these issues.