AN EMPIRICAL MODELLING FRAMEWORK FOR FORECASTING FREIGHT TRANSPORTATION

This paper presents a framework which includes empirical modelling methods to estimate freight transportation between defined zones. In this method, observed origin and destination matrices for each type of freight are constituted based on the link counts and the roadside truck survey data. The gravity method is selected to estimate origin and destination matrices by using observed link flows, gross domestic product by provinces and interzonal distances. Advanced statistical techniques and regression analyses are used to estimate the coefficients of the gravity method. The final freight transportation matrix is calibrated with the link flows data by using iterative techniques. The developed method was applied to find the origin and destination matrix of the total freight transportation in Turkey and successful results were obtained.


Notations
The following symbols are used in this paper: t -an overall scaling factor; θ -constant of proportionality; α, β, γ -constants; AADTT N -Annual Average Daily Truck Traffic in total for both directions for section N; A j -total trip attraction at j; A j -total trip attractions; ANN -Artificial Neural Networks; AT -total highway freight transportation matrix; AT f -average general total matrices; AVI -Automatic Vehicle Identification; C N -coefficient of enlargement or diminishment for section N; CT Nf -modified intermediary total matrices; DFBETAs -change in the regression coefficient that results from the deletion of the ith case; DFFITs -change in the predicted value when the ith case is deleted; DGTREN -Directorate-General for Transport and Energy; d ij -distance between province i and province j in kilometres; EM -Entropy Maximization; ESA95 -The European System of national and regional Accounts; f -freight types; GDP -Gross Domestic Product in Turkish Liras; GDP fi -sectored GDP by province for freight type f and province (zone) i in Turkish Liras; GDP j -total GDP by province ( -trips produced at i and attracted at j; T in -initial matrix; T N -total number of trucks and trailers in both directions for section N; T Nf -intermediary total freight transportation matrices; T Nnf -freight transportation matrices obtained from roadside surveys; UCT f -general total unit matrices; UCT Nf -unit matrices; V a -traffic volume on link a; X 1 , X 2 , X 3 -independent variables for regression analysis; Y -dependent variable for regression analysis.

Introduction
Passenger and freight carriage is a dynamic process since its parameters frequently change in the course of time.
The parameters of the carriage process are stochastic; however, their change is usually influenced by concrete conditions, the impact of which may be simulated and forecasted. There are many modelling concepts applied for estimation of freight transportation in the literature although they are originally developed for passenger transport. De Jong et al. (2004b) presented a review of the literature related to freight transport models that have been developed since the 1990s for forecasting, policy simulation and project evaluation at the national and international levels. They state that there are 222 transport models in Europe. While sixty-five of those models are freight transport models, 29 of them are joint passenger and freight transport models. Crainic and Laporte (1997) identified some of the main issues in freight transportation planning and operations. They presented appropriate Operations Research models and methods, as well as computerbased planning tools. The presentation was organized according to the three classical decision-making levels: strategic, tactic and operational. Garrido and Mahmassani (2000) proposed a multinomial probit model with spatially and temporally correlated error structure in order to carry out freight demand analysis for tactical/ operational planning applications. The resulting model has a large number of alternatives, and the estimation was performed by using Monte-Carlo simulation to evaluate the multinomial probit likelihoods. De Jong et al. (2004a) presented a fast and approximate metamodel for passenger and freight transport in Europe on the basis of the outcomes of five disaggregate national models for passenger transport, four national models for freight transport and two European transport models. The model is EXPEDITE meta-model for passenger and freight transport. It was developed in a project for the European Commission, Directorate-General for Transport and Energy (DGTREN) (European Commission 2001). They stated that the model was a fast and relatively simple one that integrates results from a number of national and international models. Moschovou and Giannopoulos (2010) modelled freight modal choice behaviour in Greece investigated in research between 2004 and 2009. They involved a large-scale survey of various freight transport actors in Greece; a full statistical analysis of the results and a presentation to determine priorities, preferences, and detailed rankings of mode choice criteria; and a modelling exercise to produce models depicted the mode choice behaviour of Greek firms.
Transport researchers generally agree on the fact that the four-step transport modelling structure adapted from passenger transport can be successfully applied to freight transportation as well. Nevertheless, there are some important differences within each of the four steps of passenger transport. These differences include the diversity of decision-makers in freight, the diversity of the items being transported and the limited availability of data (De Jong et al. 2004b).
The four-step freight transport modelling system is briefly presented below, and a multi-step freight transportation planning model is demonstrated in Fig. 1 (De Jong et al. 2004b;Goulias 2002): -generation and attraction: the amounts of goods generated by and attracted to the defined zones are determined in tonnes; -distribution: the flows of goods transported between the defined zones are designated in tonnes; -modal split: the flows of goods are allocated to transportation modes which are motorways, railways, waterways and combined transportation etc.; -assignment: freight flows are assigned to transportation network after converting the flows in tonnes to vehicle units. Although there are many methods available for estimating freight transportation from traffic counts, they are all rather complicated. Here, it was intended to give practical guidance to practical engineers on how to estimate freight transportations between the provinces. For this purpose, a framework including empirical modelling methods was developed consisting a fourteen-stage progressive evaluation for a new procedure. The collected data, such as network data, road side surveys and type of freights, were formulated and the most crucial data, such as economical indicators (GDP and GVA) and friction factor (distance), were used to set up the gravity model based empirical model. Advanced statistical techniques were used to determine the coefficients of the model. Finally, the final freight transportation matrix is calibrated with the link flows data by using iterative techniques. The developed method was applied to find the origin and destination matrix of the total freight transportation in Turkey and successful results were obtained. This original method proved its performance by using less but the most important data analyzed in fourteen-stage progressive evaluation framework.

Estimation of Origin and Destination (O-D) Matrix
Transportation as a process can be described by a set of criteria, usually, speed, safety and costs of transportation are considered. All of the above criteria describe quality of transportation from a particular perspective, therefore, all of them should be used for selecting a particular traffic route. De Grange et al. (2010) states that trip distribution models are intended to produce the best possible predictions of travellers' destination choices on the basis of trip generation and attraction information for each travel zone and the level of interception or general-ized cost of travelling between each pair of zones. There are a lot of O-D (Origin and Destination) estimation models in the literature. In this chapter, a brief literature survey is presented on estimation of O-D matrix. Bell (1983) described a model which would, under certain circumstances, yield the most likely O-D matrix which was consistent with measurements of link traffic volumes. The GLS approach to the estimation of O-D matrices permits the combination of survey and traffic count data in a way that allows the relative accuracy of the two data sources. Bell (1991) also proposed an algorithm to solve the GLS problem subject to inequality constraints. Cascetta et al. (1993) suggested different dynamic estimators using time-varying traffic counts to obtain time-varying O-D flows or average O-D flows. The proposed two types of estimators included simultaneous estimators and sequential estimators. Cascetta and Russo (1997) examined Bayesian statistical inference techniques and they evaluated the statistical performances of the NGLS estimators on a test network and on a real urban network. They found the results in general satisfactory, showing the capability of the proposed estimator to reduce errors in initial estimates significantly. Ashok and Ben-Akiva (2000) examined two different approaches for real-time estimation/prediction of time dependent O-D flows which are a state-vector in terms of deviations in O-D flows instead of the O-D flows themselves and a state vector in terms of deviations of departure rates from each origin and the shares headed to each destination. Asakura et al. (2000) presented the formulation of origin and destination (O-D) matrices estimation model using the observed data with the AVI system. The results of license plate matching between a pair of AVI cameras were involved as the input variables. The formulated model was a least squares model and yielded to the linear transformation of the partly observed O-D matrices. The model was applied to the Kobe corridor line in the Han-Shin expressway network. Timms (2001) presented a philosophical structure for classifying methods that estimate O-D matrices using link counts. A classification structure is built up by using concepts of realism, subjectivity, empiricism and rationalism. Ashok and Ben-Akiva (2002) presented a new set of models which was the explicit modelling and estimation of the dynamic mapping between time-dependent O-D flows and link volumes. Celik (2004) modelled inter-regional commodity flows for 48 continental states of the US with three different ANN. Chen et al. (2005) examined the capability of PFE in capturing the total demand of the study network as well as individual O-D demands when proper observations, in terms of the number and their locations, were provided. Dixon and Rilett (2005) used the information from AVI systems to help estimate short-term trip O-D matrices in an urban environment.  design to obtain efficient and high-quality solutions by using small number of demand samples reducing the computational effort without much compromise on the solution quality. The application and the performance of these alternative approaches were reported. They concluded that the results from their study would help in deciding suitable approximation techniques for network design under demand uncertainty. Among the commonly used calculation models, O-D matrix can be obtained by using the gravity model because it can ensure more practical results to transport researchers. Levine et al. (2009) developed an optimization model to estimate route flows and a corresponding multi-modal origin-destination table for containers by synthesizing data on international trade and railcar movements with a gravity model for the demand of container traffic. Veenstra et al. (2010) introduced a new trip distribution model for destinations that was not homogeneously distributed. The model is a gravity model, which incorporates the spatial configuration of destinations in the modelling process.
Travel time or travel distance can be used as the interceptive parameters in the gravity model. The gravity model is similar to Newton's law of gravity. The gravity model states that the trips between an origin and a destination T ij depend directly on the total trip productions P i and the total trip attractions A j and inversely the friction factor d ij called distance, travel time or cost (Rogers 2008). The formula of Gravity Model is given in Eq. 1: where: θ and β are proportionality and calibration constants respectively; i and j indices are origin and destination provinces.

The Empirical Modelling Method
In order to determine the observed highway freight matrices, the roadside surveys may be used. These roadside surveys contain the O-D information. Highways administrations perform the roadside surveys on different highway sections every year. The roadside surveys are carried out by stopping the trucks and filling in the specially designed questionnaire for every truck. In the questionnaire, some questions such as type of freight carried, O-D points of transportations, travelled distance and type of vehicle are asked, and the answers obtained are recorded. The freight transportation classification scheme of the Republic of Turkey General Directorate of Highways (KGM) consists of the recorded freight types which are agricultural products, ores, construction materials, animal products, manufactured materials, livestock and forest products (Vitoşoğlu 2006). In addition to the roadside surveys, the traffic count data giving the number of trucks on the highway sections are also useful information for O-D estimations. This paper presents a framework which includes empirical modelling methods to estimate freight transportation between defined zones. The developed framework for estimation of freight transportation is illustrated in Fig. 2. In this study, Visual Basic (VBA) macros in Microsoft Excel were written for implementation of the developed model. Several macros were written for each stage presented in Fig. 2. The produced trip matrices were transferred to the next stages step by step to obtain the final trip matrices.
There are several software packages available to help develop a four-step travel demand model. The common forecasting packages widely used on the world include Visum, TransCAD, Emme, Cube, QRS II (Quick Response System II), TMODEL (Travel Demand Model), FSUTMS (Florida Standard Urban Transportation Model Structure), TRANPLAN (Transportation Planning) and Synchro. These software packages are macroscopic simulation tools and they distribute the trips using either theoretically based methods, e.g. the standard gravity model, or growth factor methods, e.g. the Fratar method (Ullah et al. 2011). In this study, the standard gravity model was preferred but some important economical indicators were originally included in the model. The detailed explanations are given in Fig. 2.
The roadside surveys generally include the data in Table 1. The information contained in the roadside surveys performed every year on different highway sections are written into a matrix format in order to obtain the matrices for each freight type. Consequently, from each roadside interview, the matrices for each freight types [T Nnf ] are obtained for the zones (Eq. 2). Nnf ij t corresponds to each freight types obtained from roadside surveys. N, n and f indices are road section, road side survey and freight type respectively. The procedure is presented in Fig. 2, item 1.
Normally the roadside surveys are carried out on the same highway sections in different years. Therefore, the roadside surveys performed on the same highway sections may be combined together. This operation defi- nitely incurred the risk of significant changes in the flow during the period being obscured. Therefore, it is obviously better to use data from the same year. Whereas, it is apparent that this number is not enough to find meaningful results since the problem requires much more data. Therefore, the data from different years may be used in order to overcome this difficulty. As a result, the matrices obtained from the roadside surveys performed on the same sections in different years are added up, and intermediary total matrices [T Nf ] are formed (Eq. 3). n a corresponds number of roadside surveys performed on a definite highway section. This procedure is presented in Fig. 2, item 2. (3)

Fig. 2. A framework including empirical approach modelling methods
Intermediary total matrices The coefficient of enlargement or diminishing, The total number of trucks These intermediary total matrices obtained for each highway section are then enlarged or diminished by multiplying a coefficient called C N . Here, the coefficient of C N is the ratio of the AADTT N on the highway section studied to the total number of trucks in both directions T N . The statement mentioned above can be summarized by using Eq. 4. This procedure is presented in Fig. 2, item 3.
It is obvious that the total number of trucks and trailers surveyed in the combined roadside survey for a definite highway section T N can be calculated by first the summation of the cells in the intermediary total freight transportation matrix [T Nf ] obtained for each freight type f and then the addition of these sums obtained for every freight type t ij Nf . Eq. 5 summarizes this statement. This procedure is presented in Fig. 2, item 3.

= ∑∑
The modified intermediary total matrices [CT Nf ] are obtained for every highway section and the freight type (Eq. 6). This procedure is presented in Fig. 2, item 4.
After the modified intermediary total matrices belonging to the highway sections [CT Nf ] are obtained, the general total matrices for the different freight types [T f ] are determined by adding up the O-D matrices that are obtained for all road sections N (Eq. 7). This procedure is presented in Fig. 2, item 5.
In the event that any O-D transport is in the list of two or more survey stations, the averages of these counts that gave the number of trucks carrying freight between this O-D pair and that are obtained for different survey stations are calculated. The purpose of this procedure is to prevent the risk that performing the general addition operation as a simple accumulation calculates the freight volumes transported between distant regions much larger than in reality. Consequently the unit matrices [UCT Nf ] are formed by writing '1' for filled cells and '0' for empty cells in the intermediary total matrices multiplied with the coefficients of C N (Eq. 8). This procedure is presented in Fig. 2 Then, the general total unit matrices [UCT f ] are calculated using the unit matrices as presented in Eq. 9. This procedure is presented in Fig. 2, item 7.
Finally, the average general total matrices [AT f ] are obtained for the freight types by dividing the cells in the general total matrices [T f ] to the corresponding cells in the general total unit matrices [UCT f ] as presented in Eqs 10 and 11. In this way, in the event that any O-D transport is in the list of two or more study stations, the averages of these O-D carriages are calculated. The average general total matrices obtained for all freight types are, in conclusion, the observed matrices that give the number of trucks carrying freight between provinces. This procedure is presented in Fig. 2, item 8.
Then, as the last step in this phase of the study, the total highway freight transportation matrix [AT] is found by adding up all the average general total matrices obtained for all freight types [AT f ] as presented in Eq. 12. This procedure is presented in Fig. 2, item 9.
The initial matrices for each type of freight are constituted based on the matrices obtained from the link counts and the roadside truck survey data. For this purpose, the principles of Gravity Method can be used by using observed link flows at ij f , GDP by provinces and interzonal distances d ij . GDP by provinces named regional GVA estimates use national accounts definitions and concepts. GVA measures the contribution to the economy of each individual sector in an area. There is a link between GDP and GVA. In summary, GVA plus taxes on products less subsidies on products equals GDP.
The ESA95 collects comparable, up-to-date and reliable information on the structure and developments of the economy of the Member States of the European Union and their respective regions (source Eurostathttp://epp.eurostat.ec.europa.eu). The GDP series have been compiled according to ESA95 which is comprehensive and integrated set of accounts (Council Regulation (EC) No 2223/96). The procedure is summarised in Eqs 13-16. This procedure is presented in Fig. 2 The beginning phase of the procedure is performing first multivariate statistic analyses. For this purpose, the first multivariate statistic analyses are carried out. The obtained regression model is tested by using some hypothesis tests. Variance analysis is performed to determine whether there is a linear relationship between dependent and independent variables. F-test is used for this purpose. Significances of the parameters in the analysis are tested with Student test (t-test). This procedure is presented in Fig. 2, item 11.

Calibration of O-D Matrix from Traffic Counts
Using conventional methods based on home surveys or roadside interviews disrupting traffic in order to estimate O-D matrices is generally expensive, time consuming, and labour intensive. The life of data is very short in developing countries, where rapid changes occur in land use and demographic structure. It is therefore necessary to revise frequently the data obtained by using relatively inexpensive methods. Various methods that are cheaper and do not require intensive labour have been developed in order to form and to revise present and future O-D matrices (Chen et al. 2005).
Vehicle counts on highways can be viewed as a function of a trip matrix and a route-choice pattern. Therefore, they provide information about all O-D pairs that use the counted links. In addition, traffic counts are very attractive data sources because they could be obtained in a relatively inexpensive and automatic way without disrupting traffic. As a result, since the beginning of 1980s, the idea of estimating trip matrices and developing demand models from traffic counts has attracted serious attention of researchers, and various methods have been suggested on this subject.
If it is assumed that N zones are interconnected by a road network, which consists of a series of nodes and links, it will be clear that the trip matrix is made up of N 2 cells. If intra-zonal trips can be disregarded, the number of cells in the trip matrix is N 2 -N. In order to find out these N 2 -N cells constituting O-D matrices from traffic counts, it is necessary first to identify paths followed by trips from each origin to each destination. If a ij p is defined as the probability or proportion of trips from zone i to zone j travelling through link a, the flow in this link V a will be the summation of portions of all trips between zones using link a. Mathematically, this expression can be summarised in Eq. 17.
The probability of p ij a can be obtained by using various trip assignment techniques of which their degree of complexity increases from an all-or-nothing assignment to a equilibrium assignment. As a result, when all the a ij p proportions and all the observed traffic counts V a are given, there will be N 2 -N unknown T ij 's of the problem to be estimated from a set of L simultaneous linear equations, where L is the total number of traffic counts.
In principle, N 2 -N independent and consistent traffic counts are necessary for determining uniquely the trip matrix T. On the other hand, in practice, the number of traffic counts is much less than the number of unknown T ij 's. Therefore, it is impossible to find out a unique solution for the problem of estimating an O-D matrix. In general, more than one trip matrix that are consistent with the observed traffic counts will be found when they are assigned onto the network. Two basic approaches can be utilised in order to resolve this problem. In the first approach, the set of feasible solutions for the matrix to be estimated can be restricted by imposing a particular structure, which is provided by gravity or a direct demand model. In the second approach, general principles like maximum likelihood or entropy maximisation are utilised in order to provide the minimum additional information required for estimating an O-D matrix.
Assignment methods used for estimating trip matrix from traffic counts are classified under two main groups. In the assignment methods belonging to the first group, it is assumed that the proportion of drivers choosing each route does not depend on flow levels in links. The most common example of assignment methods in this group is all-or-nothing assignment, and the probabilities of a ij p are defined in this case as follows: Pure stochastic methods are also included in the first group. However, in these cases, the probabilities of a ij p can take values between 0 and 1.
Assignment methods in the second group, on the other hand, take account of congestion effects. Therefore, the probability of trips made between each O-D pair using any link also depends on traffic flow in that link. Equilibrium and stochastic user equilibrium assignment methods are included in this group (Ortúzar, Willumsen 2011).
Generally, the IM model (Snickars, Weibull 1977;Van Zuylen, Willumsen 1980), GLS model (Cascetta 1984), the EM model (Wilson 1970), the Bayesian model (Mahmassani, Sinha 1981), the LSE model (Cochran 1963) and the PFE (Bell, Shield 1996) are used for O-D matrix estimation. Al-Deek and Eman (2006) states that the reliability of travel time measure can be used as a tool to provide travellers with accurate information about the most reliable paths connecting origins and destinations. They developed and applied a new methodology to estimate the travel time reliability of a transportation network and its paths during the peak period in which links can degrade in a multimode, statistically dependent manner. Each model has its characteristics and application conditions based on the applied theory. The model developed by Bell is fundamentally similar to the princi-ple of the modified IM model. The IM model based on the information minimization theory was developed by Van Zuylen and Willumsen (1980). The most likely O-D flows can be estimated through an iterative process with this model until all flow constraints are satisfied, i.e. estimated traffic flows are equal to the respective detected traffic flows. If the route choice proportions are not completely known and partially duplicated, the IM model is not suitable and the respective estimation results are not stable. For that reason, Van Zuylen andWillumsen modified the IM model to adjust the difference between the total number of historical trips and the actual trips. The form of the modified IM model used by Bell is shown in Eq. 18 (Van Zuylen, Willumsen 1980;Wang, Friedrich 2009).
These two parameters, t and X a , and the most likely O-D matrix can then be solved with the satisfaction of the link flow constraints in an iterative process. The vector of parameters X a is initially set to unity, and unless other values for t have been defined by the user, the value of t is calculated by using Eq. 19. The value of t remains set at the level defined or calculated by the formula above during later phases. As for the solution procedure, it involves improving upon the initial estimated values of X a by carrying out many iterations.
For every count site, an adjustment factor h a is calculated in each iteration. This adjustment factor is then added to the prior estimate of X a in order to obtain the value of ′ a X as shown in Eq. 20. Eq. 21 is used to calculate the value of h a .
The process of iteratively calculating the values of h a for each count site continues until the difference between observed and estimated values is close to a minimum value defined by the user. After the final values of X a are determined for every link, the cells of trip matrix are calculated by Eq. 22. Finally, the trip matrix has been formed by determining all its T ij elements (Halcrow Fox & Associates 1986). This procedure is presented in Fig. 2, items 12 and Fig. 3. The total freight transportation matrix was obtained by the way as already explained in Fig. 2 and items 1 to 9. Gravity method was selected to model freight transportation between the defined provinces. For this purpose, first multivariate statistic analysis was carried out and the results are presented in Table 2. The coefficient of determination was found as 0.628. After first multivariate statistic analysis, case statistics and case analyses were performed to determine outlier and influential points. Standardized values, Stu-   gomery et al. 2012). Studentized values were plotted with estimation values. After high leverage points were determined, these points were tested whether they were influential on some variables. The DFBETAs (Change in the regression coefficient that results from the deletion of the ith case), Cook distance, DFFITs (Change in the predicted value when the ith case is deleted) and covariance ratios tests showed that there were no influential points on the parameters. The outlier points were excluded from the analyses as suggested and supported by Montgomery et al. (2012), and Draper and Smith (1998). Then, the second multivariate statistic analyses were performed for all freight types. F-test results were found lower than the p-value. Significances of the parameters in the analysis were tested with Student test (t-test) and the confidence level was found to be high because the p-value is lower than 0.05. The results are summarised in Table 2. The coefficient of determination was found as 0.712. Finally, from second multivariate statistic analyses, k f , α, β, and γ calibration coefficients were found. These coefficients were placed into gravity type equations to determine the initial matrix [T in ].
In order to determine the final intercity freight transportation matrix for the year of 2010, the gravity based freight transportation matrix considered as initial matrix was adjusted with the link flows data by using iterative techniques as already explained in Chapter 3. Vmat Subprogram, which is a part of TRANSPORT software, was used to determine the final matrix (Coombe 1989). The main purpose of this program is to identify the most likely trip matrix from input files of traffic counts at individual sites or groups of sites and from route probabilities, using Newton-Raphson Technique (Halcrow Fox & Associates 1986). Consequently, the final O-D matrix of the total freight transportation [T fin ] was obtained for 81 provinces.

Conclusions
Although many methods are used for obtaining O-D matrices, the suggested method using basic traffic information, such as network data, road side surveys and type of freights, and economical indicators, such as GDP and GVA, is not expensive, time consuming, and labour intensive. This suggested method is mainly an O-D estimation method for road freight transport based on road counts. The O-D matrices may be easily updated in case of any planning phase by using relevant traffic information. In addition, the coefficients of the gravity model including economical indicators and distance may be easily calculated by using advanced statistical techniques and regression analyses. The suggested empirical modelling method proved its performance on how to estimate freight transportations between the provinces in a fourteen-stage progressive evaluation framework. This framework enables practical engineers to understand the way they work, to evaluate the efficiency of the existing methods, and gives them particular data they can use to get better. In this study, the developed model using empirical modelling methods are suggested to determine the O-D matrices for a wide diversity of freight types. Here, the coefficient of determination for the total freight transportation was found in acceptable level (R 2 = 0.712). The suggested method proved to be useful to determine the O-D matrices for a wide diversity of freight types in case of having available road survey data over time. This method has also a potential for expansion to estimating passenger O-D flows for road transport as well as freight and passenger O-D flows for other modes of transport such as railways and airways. Consequently, freight transportation models which are fast and easy to use on the base of relevant data can be developed. In addition, these types of models can include different kinds of freights for intermodal freight transportation and logistics activities.