DISTRIBUTION OF TRAFFIC SPEED IN DIFFERENT TRAFFIC CONDITIONS: AN EMPIRICAL STUDY IN BUDAPEST

Fundamental diagram, a graphical representation of the relationship among traffic flow, speed, and density, has been the foundation of traffic flow theory and transportation engineering for many years. Underlying a fundamental diagram is the relation between traffic speed and density, which serves as the basis to understand system dynamics. Empirical observations of the traffic speed versus traffic density show a wide-scattering of traffic speeds over a certain level of density, which would form a speed distribution over a certain level of density. The main aim of the current research is to study on the distribution of traffic speed in different traffic conditions in the urban roads since the distribution of traffic speed is necessary for many traffic engineering applications including generating traffic in micro-simulation systems. To do so, the traffic stream is videotaped at various locations in the city of Budapest (Hungary). The recorded videos were analysed by traffic engineering experts and different traffic conditions were extracted from these recorded videos based on the predefined scenarios. Then their relevant speeds in that time interval were estimated with the so-called “g-estimator method” using the outputs of the available loop detectors among the videotaped locations. Then different parametric candidate distributions have been fitted to the speeds by Maximum Likelihood Estimation (MLE) method. Having fitted different parametric distributions to speed data, they were compared by three goodness-of-fit tests along with two penalized criteria (Akaike Information Criterion – AIC and Bayesian Information Criterion – BIC) in order to overcome the over-fitting problems. The results showed that the speed of traffic flow follows exponential, normal, lognormal, gamma, beta and chisquare distribution in the condition that traffic flow followed over-saturated congestion, under saturated flow, free flow, congestion, accelerated flow and decelerated flow respectively.


Introduction
Fundamental diagram, a graphical representation of the relation among traffic flow, speed, and density, has been the foundation of traffic flow theory and transportation engineering for many years. For example, the analysis of traffic dynamics relies on input from this fundamental diagram to find when and where congestion builds up and how it dissipates; traffic engineers use a fundamental diagram to determine how well a road facility serves its users and how to plan for new facilities in case of capacity expansion. Underlying a fundamental diagram is the relation between traffic speed and density, which roughly corresponds to drivers' speed choices under varying carfollowing distances. In fact, this relation serves as the basis to understand traffic system dynamics in various scientific areas including traffic flow (Wang et al. 2009). Empirical observations show a wide-scattering of traffic speeds over a certain level of density. This scattering effect is due to the randomness of drivers' speed choices. In fact, due to the stochastic nature of traffic flow, the observed speed may vary over a certain range, forming a distribution. These distributions in highways, or in general un-interrupted traffic flows, would follow the normal distribution over a certain level of density as shown in Figure 1 (Wang et al. 2009). Jun (2010) mentioned that the change or variability of speed distributions on a specified roadway during a certain period of time may explain the trends or patterns of how the characteristics of traffic on the roadway vary. The conditions that turn to a different speed distribution (rather than Normal distribution) are quite often realized in non-highway or urban roads, where, in general, the traffic stream is much more complicated (Maghrour Zefreh et al. 2016). This assumption is supported by IMAGINE (2006), where different speed distributions are related to different traffic conditions. The development of mathematical tools focused on the modelling of the speed distribution in a traffic flow is widely reported in the scientific literature (Castro et al. 2008;Dey et al. 2006;Fitzpatrick et al. 2000;Trozzi et al. 1996). In general, speed distribution is necessary for many traffic engineering applications. For instance, an appropriate speed distribution model is the fundamental input for generating vehicles in traffic microsimulation systems (Park, Schneeberger 2003;Llorca et al. 2015) and other applications such as activity-travel scheduling simulation (Liao et al. 2013;Liao 2016). Moreover, speed distribution can be utilized in theoretical analyses of traffic flow characteristics and to plan the appropriate traffic operational measures (Yu, Abdel-Aty 2014).
To sum up, in literature it is reported that, there is a normal distribution of traffic speed associated with each level of density in the so-called un-interrupted traffic flows (e.g. traffic flow in the highways) (Wang et al. 2013). Furthermore, different studies in the literature generally demonstrated the fact that distribution of traffic speed might deviate from normal distribution in interrupted traffic flows, e.g. urban road traffic flows (Maurya et al. 2015), however, to the best of the authors' knowledge, there is no empirical study available in the literature that assigns the best fitted parametric distributions to the interrupted traffic speeds according to different traffic conditions. Therefore, the current research work attempts to fill this gap in the literature by studying on the distribution of the traffic speed in the so-called interrupted traffic flows where traffic is more complicated than highways due to the presence of traffic lights, intersections etc.
The remainder of this paper is organized as follows. The applied methodology in details including the extraction of desired traffic conditions from the videotaped traffic flow based on the predefined scenarios, speed estimation, distribution fitting and goodness-of-fit testing are described in Section 1. A case study, at first, is considered in Section 2 where traffic flow followed acceleration process (for the illustration purpose) and then the obtained results for the other traffic conditions are presented and discussed respectively. Finally, the findings of the current research are concluded in the last section.

Methodology
In order to study on the distribution of the speed in the so-called interrupted traffic flow, a widespread traffic video tapping has been done in various locations in the city of Budapest (Hungary). Table 1 shows the investigated sites. Furthermore, Figure 2 shows the speed-density relationship of the traffic flow in the investigated area.

Extracting different traffic conditions from the traffic flow
Having recorded the traffic flow, the recorded videos were analysed by traffic engineering experts and different traffic conditions (under saturated flow, free flow, congestion, over-saturated congestion, accelerated flow, decelerated flow) were extracted from these recorded videos based on the predefined scenarios.

Defining different scenarios
The defined scenarios are theoretically explained here using a theoretical representation of a characteristic diagram for traffic moving at a traffic light when it turns to green shown in Figure 3. The condition in which the traffic congestion would not be disappeared in the cycles demand (exceeds capacity for significant period) is considered as over-saturated congestion. This is the region x j ≤ x < x L in Figure 3 where  density is close to jam density k j . The under saturated flow is considered when the traffic is close to the capacity (and will be discharged in a cycle) in the region x < x j , where traffic is flowing with the density lower than optimal density k i < k m , as the flow is not unimpeded. The free flow traffic is considered when there are a few numbers of vehicles in the street (much lower than the capacity) in the region x < x j , where traffic is flowing with the density lower than optimal density k i < k m , as the flow is not unimpeded. The deceleration process is considered in the condition where traffic flow is getting closer to the x j , where the shockwave, black curve, travels backward through the traffic in Phase 1 shown in the Figure 3. When the traffic light turns green, vehicles are able to leave the light entirely unimpeded, so the density would be equal to optimal density k = k m and obviously flow would be in its maximum. This condition is considered as the acceleration process in which the shock wave slows down and starts to move back towards the traffic light (Phase 2). The condition in which traffic light turns to green but the intersection is not completely empty (individual cycle failures) yet is considered as congestion.

Extracting different traffic conditions from the recorded videos
In the current research, traffic flow is videotaped at various intersections in the city of Budapest during the whole day. The video tapped traffic flow is further analysed by traffic engineering experts in order to extract different traffic conditions from these videos based on the already defined scenarios in Section 1.1.1. For instance, Figure 4a shows a series of the frames extracted from the recorded video when traffic flow followed the deceleration process in one of the videotaped locations (Hamzsabegi Road) during the day and Figure 4b shows a series of the frames extracted from the recorded video when traffic flow followed acceleration process in one of the videotaped locations (Szent Gellert Road) during the night.
It should be highlighted that literature has scientifically proved that night driving behaviour would influence on car-following conditions resulting in generating instabil-ity in traffic flow (Jiang, Wu 2007;Bella et al. 2014;Bella, Calvi 2013). This instability in traffic flow is the basis of generating different traffic conditions that would be leaded to different speed profiles and distributions. Therefore taking traffic conditions as the basis of speed distribution analysis might seem to be a logical assumption in both day times and night times.

Speed estimation by g-estimator method from loop detector outputs
Data from loop detectors have been primary sources for traffic information, and single loops are the predominant loop detector type in many places. Unfortunately, the most common form of traffic detector, the single loop detector, is incapable of providing speed measurements. Therefore, traffic speed should be calculated based on the detector output, that is, traffic volume and occupancy time. Since the loop detector outputs may contain some incorrect/ missed values due to equipment malfunctions and communication faults, before doing the speed estimation, the outputs of detectors have been validated and the incorrect/missed values were imputed based on the algorithms proposed by Maghrour Zefreh and Török (2018a). Having done the validation/imputation process, traffic speed is estimated by the so-called "g-estimator method" using the loop detector outputs as shown in Equation (1): where: i is the time interval index; S is the speed for each time interval; N is the number of vehicles per interval (volume); O is the percentage of time in which loop is occupied per interval; T represents the hours per interval; g is the constant based on mean vehicle length and detector size.
It should be noted that, in this research, the parameter g in Equation (1) Maurya et al. 2015Maurya et al. , 2016Bassani et al. 2016;Hustim, Fujimoto 2012;Chandra, Bharti 2013;Du et al. 2017). The main aim of the current research is to investigate the variations of traffic speed in different traffic conditions in urban roads. This assumption is supported by Jun (2010) where the change between different speed distributions would show the pattern of traffic variations of a roadway system. In this paper, distribution of the speeds within the desired time intervals (extracted traffic conditions from the recorded videos) is investigated by the Maximum Likelihood Estimation (MLE) method using "fitdistrplus" package in R programming language software (Delignette-Muller, Dutang 2015). The entire procedure is explained in details in forthcoming subsections.

Choice of candidate distributions
Before fitting one or more distributions to a data set, it is generally necessary to choose good candidates among a predefined set of distributions. The first attempt in choosing the candidate distributions for our set of data (vehicles speeds in different traffic conditions) was done by plotting the histogram and empirical distribution function of the speeds within the desired intervals based on Equation (2): where: X 1 , ..., X n are independent and identically distributed random variables with Cumulative Distribution Func- x and 0 otherwise. In addition to empirical plots, descriptive statistics of the data set (speeds) would help to choose candidate distributions among a set of parametric distributions particularly the skewness sk and kurtosis kr parameters linked to the third and fourth moments. The skewness and kurtosis from a sample ( ) . . .
x is given by Casella and Berger (2002): where: m 2 , m 3 , m 4 denote empirical moments defined by with x i the n observations of variable x and x their mean value.
The estimated skewness and kurtosis parameters of the empirical distribution are further investigated by a skewness-kurtosis plot to choose candidates in order to describe a distribution among a set of parametric distributions. The plot shows the locus of skewness-kurtosis pairs that the distribution can take by varying its parameter values. In fact, this plot shows the possible range of skewness-kurtosis combination a distribution can have. For instance, this combination can be a constant value (e.g. normal distribution with skewness of 0 and kurtosis of 3). It can also form a curve if the equation for estimating the skewness and kurtosis would be dependent on a single distribution parameter (e.g. gamma distribution). Skewness-kurtosis combination can further lie on a two-dimensional surface if the equation for estimating the skewness and kurtosis would be dependent on more than one distribution parameters (e.g. beta distribution). It should be noted that, for any distribution, kurtosis has to be greater or equal to the square of skewness plus one. The values lower than this threshold would be placed in the so-called impossible region where no distribution can fall in.

Fit of distributions by MLE method
Once selected the parametric distributions as the candidates, the distribution parameters q would be estimated by maximizing the likelihood function defined as: where: x i is observed traffic speed; ( ) ⋅ q f is density function of the candidate parametric distribution.
The investigated parametric distributions fitted to the empirical distributions are as follows: -normal: -exponential: -uniform: -logistic: -gamma: -Weibull: where: a is scale parameter; m is location parameter; l is rate parameter; A: Min, B: Max, k: shape parameter, w is second shape parameter; t is degrees of freedom parameter; ( ) η ⋅ is beta function; ( ) Γ ⋅ is gamma function. Having estimated candidate distributions parameters based on Equation (5), the candidate distributions would be fitted to the data set for the possible graphical comparison (goodness-of-fit plots) of the candidates with empirical distribution.

Compare fitted distributions by goodness-of-fit test
Having estimated different candidate parametric distributions for traffic speed in different traffic conditions based on Equation (5), these different distributions were at first compared graphically and then compared to each other by goodness-of-fit tests in order to find best fitted speed distribution in each traffic condition. The goodness-of-fit statistics aims to measure the distance between the fitted parametric distribution and the empirical distribution (distance between CDFs). In the current research three goodness-of-fit tests (Kolmogorov-Smirnov, Cramér-Von Mises, and Anderson-Darling) are considered based on D' Agostino (2017): -Cramér-Von Mises (CvM): -Anderson-Darling (AD): where: F n is empirical CDF of the vehicles speeds; F is fitted theoretical parametric CDF. Apart from these statistics, two other classical penalized criteria based on the log-likelihood are further con-sidered to tackle the over-fitting problems as follows: -Akaike Information Criterion (AIC): -Bayesian Information Criterion (BIC): where: k is number of estimated parameters in the model; ∧ L is maximum value of the likelihood function for the model; n is number of observations.

Case study and results
The main aim of the current research was to study on the variation of the traffic speed in different traffic conditions. To do so, the traffic flow is disaggregated visually by the help of video recording based on predefined traffic conditions (see Section 1.1.1). These traffic conditions, in the current research, are as follows: under-saturated flow, free flow traffic, congestion, over-saturated congestion, acceleration process and deceleration process. In this section, the proposed methodology is applied to the extracted speeds data (hereinafter sample time interval speeds) of a scenario in which traffic flow followed acceleration process (hereinafter called accelerated flow condition). Before fitting one or more distributions to our speeds data set, it is necessary to choose good candidates among a predefined set of distributions.

Choice of the candidate distributions for the sample time interval speeds in accelerated flow condition
The first attempt in choosing the candidate distributions was done by plotting the histogram and empirical distribution of the speeds based on Equation (2). Figure 5 shows the plotted histogram on the density scale together with the CDF of the sample time interval speeds, where traffic flow followed acceleration process.
In addition to empirical plots, the third and fourth moments of the empirical distribution of the sample time interval speeds were estimated based on Equations (3) and (4). Table 2 shows the summary statistics of the previously mentioned sample time interval speeds where traffic flow followed acceleration process.
Having estimated the skewness and kurtosis parameters, a skewness-kurtosis plot of the empirical distribution based on (Cullen, Frey 1999) is further investigated to choose candidates in order to describe a distribution among a set of parametric distributions according to the estimated skewness and kurtosis (last two rows of Table 2). It should be emphasized that the non-zero skewness reveals a lack of symmetry of the empirical distribution, while the kurtosis value quantifies the weight of tails in comparison to the normal distribution for which the kurtosis equals 3. Figure 6 shows the plotted skewness-kurtosis graph of the speeds within a sample time interval (traffic flow followed acceleration) plotted by "fitdistrplus" package in R programming language software (Delignette-Muller, Dutang 2015). In order to take into account the uncertainty of the estimated values of kurtosis and skewness from the calculated speeds, a nonparametric bootstrap procedure is performed. Values of skewness and kurtosis are computed on bootstrap samples (1000 samples) and reported on the skewness-kurtosis plot as shown in Figure 6.
By taking a wide look at Figures 5 and 6 and considering descriptive statistics of the sample time interval speeds (Table 2), normal, lognormal, beta, gamma and Weibull distributions are considered as candidate distributions for further investigations (distribution fitting process) for the mentioned sample time interval speeds.

Fitting candidate distributions by MLE method to the sample time interval speeds in accelerated flow condition
Once selected the parametric candidate distributions (normal, lognormal, beta, gamma and Weibull distributions in this case), their distribution parameters were estimated by MLE method using Equation (5) and the related density functions in order to fit the candidates to the data set for the possible graphical comparison (goodness-of-fit plots) of the candidates with empirical distribution.
The estimated parameters of the candidate distributions for the previously mentioned sample time interval speeds are shown in Table 3 and their related goodnessof-fit plots (density plot, CDF plot, Q-Q plot 1 , P-P plot 2 ) are presented in Figure 7. It should be highlighted that, since the beta distribution was among the candidate distributions in this time interval, the speeds were rescaled to (0-1) interval for distribution parameter estimation and distribution fitting.
It should be noted that all four plots of Figure 7 compare the candidate distributions by the empirical distribution of the sample time interval speeds in some aspects. For instance, the density plot represents the density function of the fitted distribution along with the histogram of the empirical distribution of the speeds. Apart from the two basic classical goodness-of-fit plots (density plot and CDF plot), the Q-Q plot emphasizes the lack-of-fit at the distribution tails while the P-P plot emphasizes the lackof-fit at the distribution centre. Taking the Q-Q plot of the sample time interval speeds into account, one can simply find out that the beta and normal distributions describe the tails of empirical distribution better though the beta   distribution could be preferred for its better description of the empirical distribution centre considering the related P-P plot.

Goodness-of-fit test comparison of the sample time interval speeds in accelerated flow condition
Having compared the candidate distributions to the empirical distribution graphically, they were further compared to each other by three goodness-of-fit tests (KS, CvM, AD) and two penalized criteria (AIC and BIC) based on the Equations (15)-(19) to find the best possible fitted distribution to that traffic condition (acceleration process in this case). The computed values of these three goodness-of-fit statistics and two classical penalized criteria based on the log likelihood for the fitted distributions to the sample time interval speeds in accelerated flow condition are given in Table 4. As previously mentioned, the main aim of the Goodness-of-fit tests is to measure the distance between the fit-ted parametric distribution and the empirical distribution. Therefore the lower parameter in Table 4, the better fitted distribution. Taking the outputs of the goodness-of-fit statistics into account (Table 4), beta distribution would be considered as the best fitted distribution to the case study sample time interval speeds where traffic flow followed acceleration process. The situation in which traffic speed follows different distributions is quite recognized in the literature. For instance, Leong (1968) and McLean (1979) found that, for lightly trafficked two-lane roads where most vehicles are traveling freely, car speeds measured in time are approximately normally distributed with a coefficient of variation ranging from about 0:11…0:18. In addition, Minh et al. (2005) have studied that the speed distribution followed the normal distribution on the urban road. Wang et al. (2012) introduced truncated normal and lognormal distribution for modelling speeds and travel time. Zou (2013) proposed that skew-t distribution can reasonably take into account the heterogeneity in vehicle speed data. Zou and Zhang (2011) said that a single normal distribution cannot accurately accommodate the excess kurtosis present in the speed distribution and they proposed skewnormal and skew-t distribution to fit speed data. Haight and Mosher (1962) considered that the speed data could be well represented by either a gamma or a lognormal distribution. Gerlough and Huber (1975) proposed the use of the lognormal distribution. This resembles the normal distribution but is skewed with a larger tail to the right. It offers the advantage that the same functional form is retained when the time speed distribution is transformed into a space-speed distribution and avoids the theoretical difficulty of the negative speeds given by the infinite tails of the normal distribution. This assumption is supported by IMAGINE (2006), where different speed distributions are related to different traffic conditions. Recently literature has remarked that there is a distribution of speed over each level of density in traffic flow, which might not necessarily be a normal distribution (Qu et al. 2017). The results of the distribution fitting process of the current research in the urban road traffic (interrupted traffic flow) show the fact that traffic speed in the urban roads might follow different distributions taking different traffic conditions into account.
The best-fitted distributions to the traffic speed in different traffic conditions (results of the distribution fitting process for all of the defined traffic conditions) defined in Section 1.1.1 along with the statistical specifications of the traffic speed in different traffic conditions are shown in Table 5.
It should be noted that the minimum boundary of the acceleration process is considered as the situation in which the vehicles are almost stopped and they are ready to increase their speed. Moreover, the minimum boundary of the deceleration process is considered the situation in which the vehicles are decreasing their speed until the time that they are almost stopped (reaching the x j in Figure 3).  Furthermore, it is worth noting that in the current study, the possible presence of the HGVs in the urban roads has not been considered since the Heavy Goods Vehicles (HGVs) are prohibited to enter into Budapest since 2016 (neither day nor night). Hence, the share of HGVs has not been integrated into the approach.

Sensitivity analysis
In this section, the sensitivity analysis in distribution fitting has been performed considering different levels of errors in speed estimation in under saturated flow, free flow, decelerated flow, accelerated flow, over-saturated congestion and congestion respectively. Tables 6-11 show the results of the sensitivity analysis considering different levels of error in speed estimation in different traffic conditions. By taking a wide look at the final results of the sensitivity analysis in under saturated flow condition (Table 6), it is evident that normal distribution is still the best-fitted distribution in under saturated traffic flow (the lower the values of goodness-of-fit tests the better fitted distribution). This is also the case for lognormal distribution in free flow condition (see the values in Table 7), beta distribution in accelerated flow condition (see the values in Table 9), exponential distribution in over-saturated congestion (see the values in Table 10) and gamma distribution in congestion (see the values in Table 11). It should be noted that the results of the sensitivity analysis in decelerated flow condition showed the fact that the lognormal distribution is better fitted than chi-square distribution considering +30% and -20% error level in speed estimation (pay attention to the values in +30% and -20% error rows in Table 8 and find the lowest value!) however the chi-square distribution is the best-fitted in the other levels.

Conclusions
Fundamental diagram, a graphical representation of the relationship among traffic flow, speed, and density, has been the foundation of traffic flow theory and transportation engineering for many years. Underlying a fundamental diagram is the relation between traffic speed and density, which roughly corresponds to drivers' speed choices under varying car-following distances. Empirical observations show a wide-scattering of traffic speeds over a certain level of density, which would form a distribution of speed over a certain level of density (see this scattering in Figure 2). Literature often stated that these distributions in highways, where traffic flow is uninterrupted, would follow the normal distribution (Wang et al. 2013). The conditions that turn to a different speed distribution are quite often realized in urban roads, where, in general, the traffic stream is much more complicated.
The main aim of the current research was to investigate the distribution of the traffic speed in urban roads in different traffic conditions. To do so, the distribution of traffic speeds in various locations in city of Budapest (Hungary) has been examined using the recorded videos and the outputs of loop detectors in the investigation sites. It observed that the speed of the traffic flow followed exponential, normal, lognormal, gamma, beta and chi-square distribution in over-saturated congestion, under-saturated flow, free flow, congestion, accelerated flow and decelerated flow scenarios respectively.
Apart from distribution fitting analysis, the sensitivity analysis has been performed in the current study to investigate the effect of potential errors in speed estimation by loop detectors in the final proposed distributions. The results of the sensitivity analysis showed that, taking the +30% and -20% error level in speed estimation by loop detectors into account, the best-fitted distribution to the decelerated traffic flow would be changed from chisquare distribution to lognormal distribution (pay attention to the +30% and -20% error levels in Table 8 and find the lowest value!).

Disclosure statement
Authors declare they do not have any competing financial, professional, or personal interests from other parties.