VEHICLE TRAJECTORY BASED CONTROL DELAY ESTIMATION AT INTERSECTIONS USING LOW-FREQUENCY FLOATING CAR SAMPLING DATA

Control delay is an important parameter that is used in the optimization of traffic signal timings and the estimation of the level of service at signalized intersection. However, it is also a parameter that is very difficult to estimate. In recent years, floating car data has emerged as an important data source for traffic state monitoring as a result of high accuracy, wide coverage and availability regardless of meteorological conditions, but has done little for control delay estimation. This article proposes a vehicle trajectory based control delay estimation method using low-frequency floating car data. Considering the sparseness and randomness of low-frequency floating car data, we use historical data to capture the deceleration and acceleration patterns. Combined with the low-frequency samples, the spatial and temporal ranges where a vehicle starts to decelerate and stop accelerating are calculated. These are used together with the control delay probability distribution function obtained based on the geometric probability model, to calculate the expected value of the control delay for each vehicle. The proposed method and a reference method are compared with the truth. The results show that the proposed method has a root mean square error of 11.8 s compared to 13.7 s for the reference method for the peak period. The corresponding values for the off-peak period are 9.3 s and 12.5 s. In addition to better accuracy, the mean and standard deviation statistics show that the proposed method outperforms the reference method and is therefore, more reliable. This successful estimation of control delay from sparse data paves the way for a more widespread use of floating car data for monitoring the state of intersections in road networks.


Introduction
Traffic control delay (the difference between the actual travel time influenced by traffic signals and reference travel time under free flow conditions) is an important performance indicator for evaluating signal control systems and the Level Of Service (LOS) in traffic operations at intersections. However, in current traffic data detection infrastructure, control delay is not directly measurable. A variety of theoretical models were developed to estimate control delay of signalized intersection. Cheng et al. (2016) reviewed and classified the estimation model development process into three stages. Stage 1 covers 1920-1970s, and approaches proposed in this stage largely considered random arrivals. These models failed to provide accurate results under high saturation degree. To improve the accuracy of delay estimation for high saturation level, the coordination transformation technique, time-dependent models were derived and progression factors to account for the filtering impact from upstream intersection are introduced from 1970s to 2000s (Stage 2). Due to inaccurate approximation of specific traffic condition, some modified approaches and supplementary terms were derived from 2000 onwards (Stage 3). The drawback of the theoretical delay estimation model is when the traffic is undersaturated, they all could achieve satisfying accuracy, but under high saturation degree, their performance will decline to varying degrees. Although some modified models could give acceptable estimation result by introducing some factors, the model is more complicated and more parameters need to be calibrated. Besides, these models need signal timing information and traffic volume collected by the fixed sensors like loop detectors as input. Therefore, the theoretical models could only provide control delay estimation for the intersections without fixed sensors. In recent years, probe vehicle technologies able to register vehicle trajectories have created an opportunity to address the limitations of the current systems in estimating traffic control delay. In theory, probe vehicle or floating data have the potential to provide high accuracy vehicle position, location, time and derivatives over a wide spatial-temporal coverage. Although probe vehicle data are spatial-temporally sparse due to the limitations of storage and transmission, it has been widely used for various traffic parameters estimation (Comert, Cetin 2009;Rahmani et al. 2015;Shi et al. 2017). But up to now there has not been much research focusing on the delay estimation based on sparse probe data, the aim of this article is to contribute to the estimation of delays at signalized intersection, making use of low-frequency trajectory data.
As early as 1991, researchers explored the plausibility of using floating car data to estimate control delay at intersections. Quiroga and Bullock (1999) proposed a forwardand-backward-acceleration method for detecting critical delay points and then estimating control delays. Colyar and Rouphail (2003) improved the prediction accuracy of the Quiroga and Bullock (1999) method by accounting for the influence of traffic conditions. Ko et al. (2008) estimated delay components based on speed profiles. Čelar et al. (2018) developed a algorithm based on average acceleration rate and deceleration rate and phase duration. The method aims to eliminate the delay that is not affected by traffic signals. Li et al. (2018) developed a virtual detection box methodology to generate control delay measures with high fidelity commercial probe vehicle trajectory data. The method did not encounter privacy issues, because no actual trajectory data would be transferred to the computer. However, these methods assume the sample frequency of Global Positioning System (GPS) data are 1 Hz, which are not always available in reality.
Applying these methods with low-frequency GPS data results in low accuracy delay estimation. Liu et al. (2006) attempted to assess the sensitivity of delay to sample frequency. The results show that delay measured from data at a sampling interval of 10 s are consistent with the values from an interval of 5 s for 74% of the cases. However, when the sampling interval is 60 s, the level of consistency drops to 37%. So the methods above are not suitable for low-frequency data.
To accurately obtain control delay from low-frequency floating car data, the main challenge is how to detect where and when a vehicle starts to decelerate and stop accelerating. He and Ye (2014) proposed a method, which delimits the affected area of intersection on the basis of queue length and calculated the times when a vehicle enters and leaves the affected area from low-frequency sample points. Although this method is simple and has a high computational efficiency, the affected area of the intersection in this article is assumed to be stable. The affected area of the intersection is related to the queue length and varies in different cycles. Wang et al. (2016) developed a piecewise model representing vehicle motion as it passes an intersection. An optimization method is used to determine the locations and times of the initiation of deceleration and stoppage of acceleration. Their model assumes that a vehicle travels at free-flow speed before deceleration. However, when traffic is congested this assumption may not hold. Some researcher proposed methods for reconstructing the trajectory with low-frequency floating car data. The critical points could be inferred from the trajectory. Hao et al. (2014) proposed a model investigating all possible driving mode sequences between two consecutive GPS updates. With likelihood quantified using an a priori distribution, a detailed trajectory is reconstructed and used to calculate delay. In principle, this should work well even for floating car data whose sample interval is 60 s. However, the distribution of each scenario's likelihood is difficult to obtain a priori. Wan et al. (2016) proposed an Expectation Maximum (EM) algorithm to reconstruct the maximum likelihood trajectory. However, the method has low computational efficiency and poor real-time performance. The methods need the signal timing information, which is not always available.
Several somewhat different approaches are developed by researchers. Liu et al. (2013) addressed the sparseness problem by introducing the Principle Curve method into the calculation of turn delay. In their study, the sampling interval of floating car data is 10 s. Ban et al. (2009) employ piecewise linear interpolation to obtain traffic delays. However, traffic delay is different from control delay. Neumann et al. (2010) computed turn-dependent delay times by introducing a simple linear model, which arises from the superposition of two types of turn-dependent delays and free flow travel time. Free flow speed and delay are estimated as model parameters. However, because of a lack of reference values, the results were not verified. Turn delay and traffic delay is different from control delay, so if these methods are used to estimate control delay, the performance is uncertain.
The limitations of the methods above could be classified into three categories: (1) they do not account for randomness of low-frequency sampling due to the dynamic nature of traffic flow, hence, the reliability of the results are not provided; (2) the data some model used are not always available, such as high-frequency probe vehicle data and signal timing information; (3) some models are designed to estimate turn delay or traffic delay, which may not be applicable for estimating control delay. This article aims to contribute to the estimation of signalized intersections with common low-frequency floating car data. This article addresses these data sparseness by introducing the Principle Curve method and using the expected value instead of the estimated value for high accuracy vehicle control delay estimation. By using historical data, the deceleration and acceleration patterns of vehicles through an intersection are constructed with the Principle Curve method and combined with low-frequency data to compute the spatial and temporal ranges of the deceleration onset points and acceleration end points. Using this information together with the control delay probability distribution function obtained based on the geometric probability model, the expected value of the control delay is calculated.
The main contributions of this article are as follows: 1) the proposed method tackles the control delay estimation problem of the signalized intersection without fixed sensors; 2) historical data are used to capture vehicle motion/ dynamics at signalized intersections to reduce sensitivity to randomness of sampling and no assumption is introduced; 3) the method developed calculates the expected value of vehicle control delay, resulting in the improvement of both accuracy and stability.

Preliminaries
As illustrated in Figure 1, control delay comprises three parts: (1) deceleration delay d b ; (2) stop delay d s ; (3) acceleration delay d a . T d is the time when the vehicle begins to decelerate; T s1 is the time when the vehicle stops decelerating; T s2 is the time when the vehicle starts to accelerate; T a is the time when the vehicle stops accelerating; L d is the location where the vehicle's deceleration process starts; L s is the location where the vehicle stops; L a is the location where the vehicle's acceleration starts; v f is the free-flow speed; d b is the delay caused by the vehicle decelerating from T d to T s1 ; d s is the stop delay when the vehicle is stationary; d a is the acceleration delay caused by the vehicle accelerating from T s2 to T a ; d is the control delay the vehicle experiences through the intersection and is the sum of the deceleration, stop and acceleration delays.
Each delays are calculated as follows: From the expressions above, the time and location when a vehicle starts to decelerate and stop accelerating are the most important for control delay estimation (i.e. the stoppage period is irrelevant in control delay estimation). Hence, the two are often referred to as critical points. From low-frequency floating car data, it is not always possible to realize a complete picture of the vehicle through the intersection. Hence, the objective is to detect the critical points from the sparse floating car data.

Methodology
In order to address the limitation of data sparseness, historical data are used to explore the deceleration and acceleration patterns. This helps to capture the changes in vehicle motion or dynamics through intersections to obtain the space-time ranges of the critical points. From these data and based on the geometric probability model, the distribution function of the delay values is obtained, from which the expected value of delay is calculated.

Travel pattern analysis in the spatial dimension
Detecting the "critical points" is the first prerequisite in control delay estimation. However, as Figure 2 shows, for a vehicle through an intersection only several sample points could be obtained, providing an incomplete trajectory. From such data the location of the critical points are unknown making it impossible to accurately estimate control delay. Hence, more trajectory data are required. Therefore, we use historical data to capture the vehicle travel (motion or dynamics) patterns through intersections. When a vehicle passes through an intersection, the dwell time depends on the arrival time of the vehicle and signal timing scheme, and has the characteristic of randomness. Deceleration and acceleration are relatively stationary processes largely not impacted by the environment. Therefore, the travel pattern of deceleration and acceleration processes were explored by mining historical data. In order to provide a reference for the historical values, the historical data must meet the two conditions that the instantaneous speed of historical sample points is above 0 and must be within the close position as the trajectory to estimate.
Extracting historical data according to the aforementioned conditions, the historical sample points were divided into two parts: (1) before a vehicle stops; (2) after a vehicle stops. The deceleration and acceleration patterns are investigated separately. Considering that historical data consist of discrete sparse points with uncertain quantities, the Principle Curve method is adopted for curve fitting for use to generate the required sample points to enable the determination of the critical points. In practical terms, the Principle Curve method is used to deal with raw data noise and non-uniform distribution, common phenomena in traffic data. As illustrated in Figure 3, the horizontal axis represents the distance from the centre of the intersection, and the vertical axis represents the instantaneous speed. Points A and B represent the critical points whose locations are to be estimated. The stars represent the current sampling points. The dots are the historical sample points. From Figure 3, it can be seen that adding historical data better captures the travel patterns. The fitting function represents the vehicle's speed distribution at different locations on the road. Hence, the function could help increase the number of sample points at each stage of the vehicle's movement. The distance and speed range of the critical points could be determined also. When new data comes, the fitting function could be updated.

Spatial-temporal range delineation
To determine the critical points automatically from the profile of speed versus distance, we adopt a forward acceleration method proposed by Quiroga and Bullock (1999). It should be noted that the forward and backward average acceleration method was proposed to automatically detect the critical points from the acceleration profile. In this article, the fitting function based on the historical data is the speed-distance curve.
The acceleration is defined as the differential velocity to distance: where: a i is the acceleration associated with point i; v i-1 , v i are speeds at points i-1 and i; s i-1 , s i are positions at points i-1 and i. It should be noted that a i is different from the common acceleration a = Dv/Dt. Next we prove that when a vehicle decelerates or accelerates, a i will show the same trend of change with a, which is the differential velocity to time; it is assumed that a vehicle travels at a deceleration rate a i , its initial speed is v 1 , initial position is s 1 , and after Dt, its speed is v 2 , and position is s 2 . Then: When the vehicle travels at a constant speed, a i = 0, when the vehicle decelerates, a i will increase, and as the speed becomes smaller, a i increases gradually. Similarly, when the vehicle accelerates, a i reduces gradually, and if the vehicle restores free-flow speed, a i = 0. a i performs the same variation tendency with the common acceleration a.
The expression above can be used to determine the acceleration that is significantly different from zero. Furthermore, the deceleration onset point could be detected. However, the expression only applies to the deceleration process. For the acceleration process, the backward method is adopted as follows: where: i = n + 1, n + 2, ..., N + 1.
Different from the forward average acceleration method, the backward average acceleration algorithm is used to determine when the acceleration is essentially zero, enabling the acceleration end point to be detected. Therefore, using the forward acceleration and backward acceleration methods, the spatial range of the critical points' is determined. For convenience, as shown in Figure 3, let A and B represent the critical points identified by the method. Their spatial and speed ranges are ( ) . s a is the distance from point A to the centre of intersection. s b is the distance from point B to the centre of the intersection. Points 1, 3, 4 are the sample points of the trajectory to estimate. On the basis of the definition of control delay, if t a1 (the travel time between A and 1), t b3 (the travel time between B and 3) are known, the control delay can be calculated. From the literature (Clement et al. 2004), it is assumed that a vehicle travels between A and 1, and B and 3 at a constant acceleration. Prior knowledge consists of historical deceleration and acceleration values. Its lower and upper bounds are a 1 and a 2 respectively.
In Figure 3, if point 3 does not exist and only point 4 is known, t b4 (the travel time between B and 4) could be calculated as follows: The number of sample points do not have impact on the calculation of time range. Our methods could work under different sample scenarios.

The expected value of delay
Based on the existing information, the control delay d of the trajectory can be calculated as: where: t 13 is the time between sample points 1 and 3; v f is the free flow speed.
To compute the expected value of control delay, the theory considered here is the geometric probability model. In this model, all possible results for the random experiment are infinite, and the probability of each basic result is the same. The magnitude of the probability is reflected by the length of the line segment that intersects the line with the feasible area. For convenience, the range of t a1 + t b3 + t 13 is expressed as ( )  Figure 4.
The shaded area represents the feasible region. The length of the line formed by the intersection between the objective function and square represents the possibility that d is equal to a certain value. When the objective function passes ( ) , l e s t , it is equal to d 4 and the probability is 0. As the value of the objective function decreases from d 4 , the probability increases gradually. When the delay (the value of objective function) is equal to d 3 , the probability is the largest. When the delay is between d 3 and d 2 , the probability is the largest and fixed. When the delay reduces from d 2 to d 1 , the probability is linearly reduced to 0. The probability density curve shown in Figure 5.
The expected value of delay is then obtained by the following formula:

Experimental tests
In order to capture the deceleration and acceleration dynamics of the vehicle through the intersection, historical low-frequency trajectory data of five months were adopted. The data contains longitude, latitude, speed, time and direction. The historical data is based on temporal sample and the sample frequency is 1/30 Hz. The process of historical data contains the following steps: 1) the trajectories, which the vehicles experience obvious stop are selected. It is judged by the speed of sample points. If there is a point whose speed is smaller than 5 km/h, the vehicle is considered to have a stop. The trajectory is divided into two parts: deceleration and acceleration; 2) the distance between the vehicles stop and the centre of the intersection are calculated for all the selected trajectories. The trajectories are grouped according to the equal distance interval. In our study, the interval is 20 m; 3) the speed before deceleration and after acceleration of a trajectory is estimated. For trajectory of deceleration, if there is one point, the speed of the sample point is seen as the speed before deceleration. If there are two or more sample points, the speed before deceleration is the max speed among the speed of the first two sample points and the average speed between the two sample points. For acceleration, if there is one point, the speed of the sample point is regarded as the speed after acceleration. If there are two or more sample points, the travel speed after acceleration is the max speed among the speed of the last two sample points and the average speed between the two sample points. The deceleration trajectory will be classified according to the speed before deceleration and the acceleration trajectory will be classified according to the speed after acceleration; 4) the trajectory of deceleration and acceleration will be divided into multiple subsets according to the speed and stop position. Each subset will be fitted with the Principle Curve method to capture vehicle dynamics; 5) for a new trajectory, the distance between where the vehicle stops to the intersection is calculated and the speed before deceleration and after acceleration are estimated. Combined with the corresponding curve, the expected value of control delay could be calculated with the proposed method. The parameters like 5 km/h, 20 m are chosen according to the field experience. A field experiment was conducted to validate the method proposed. The study site is the intersection of Songshan and Huanghe roads in Harbin (China). Both roads are arterials. There is a large shopping centre nearby, and therefore, the traffic conditions are complex with obvious changes at different times of the day. The speed limit is 50 km/h on both roads.
In order to evaluate the accuracy and reliability of the method proposed, a set of high-frequency trajectory data were collected on 13 September 2017. Eight probe vehicles were equipped with GPS receivers to collect GPS data at 1 Hz. The vehicles were driven in the north-south direction traversing straight through the intersection repeat-edly during the periods 07:00…10:00 and 16:00…19:00, capturing the morning and evening peaks respectively. The morning peak was from 07:00…09:00 and the evening peak from 17:00…19:00. The process lasted for six hours generating 144 valid high frequency GPS tracks.
Low-frequency floating car data were subsequently generated from the high frequency data at the typical interval of 30 s. It should be noted that the trajectory data constitute the GPS points generated from the time the probe vehicle enters the intersection to the time when the vehicle departs the intersection. The discretization of distance is at the 20 m level. This is based on a series of tests undertaken on the sensitivity of control delay accuracy to distance accuracy and computational efficiency. 95% of the vehicle's acceleration and deceleration rates were less than 2.8 m/s 2 in field observations. In Haas et al. (2004), for speed ranging from 20 to 25 mph, the average deceleration rate is 0.1⋅g, but when speed ranges from 35 to 40 mph, the average deceleration rate is 0.18⋅g. To make estimation results insensitive to the parameters and guarantee the assumption's robustness, the acceleration and acceleration range is selected to be between 0.1⋅g and 0.2⋅g. Control delays from the low-frequency probe vehicle data were computed using both the method proposed in this article and the reference method, based on the low-frequency floating car data.

Accuracy of individual probe vehicle control delay
Given the fact that the behaviours of a vehicle traveling at different speeds are different, the GPS tracks were classified into two categories according to the speed at which the vehicle starts to decelerate. Above 30 km/h is classified as high-speed pattern, otherwise, the class is low-speed pattern.
The observed control delay value is calculated from the high-frequency floating car data. The low-frequency floating car data are generated by resampling the highfrequency floating car data. The low-frequency data are then processed to generate control delay values for each of the classes using both the methods proposed in this article and the reference method. Figure 6 shows the comparison of the observed and estimated control delay values for different speed patterns from the proposed and reference methods. The horizontal axis and the vertical axis represent the observed and estimated control delay values, respectively. The black dotted line is the 45-degree line, which means that the closer the points are to the dotted line, the more accurate the estimation method. As shown in Figure 6, the proposed method has a better performance than the reference method for both the low-speed pattern and high-speed patterns. For both methods, the accuracy for the high-speed pattern is higher than the low-speed pattern. For the low-speed pattern, the vehicle's speed is relatively small and the traffic volume is large. Under this circumstance, vehicle movement is complex. For example, a vehicle may experience a second stop delay, which means that it does not pass the intersection in a signal cycle.

Estimation accuracy of the control delay of the intersection
To further demonstrate the effectiveness of the method proposed in this article a reference approach developed by He and Ye (2014) is adopted. The reason why we choose this as reference method is the method is not sensitive to the low-frequency sample points and it could achieve a satisfying accuracy, with 85% of the control delay estimation results within 10 s in terms of absolute error. The reference method delineates the affected area of the intersection according to the historical queue length. It is assumed that the vehicle travels at free flow speed out of the affected area. When the vehicle enters the affected area, it starts to decelerate, and restores its free flow speed on exiting the area. As shown in Figure 7, a trajectory through points P 0 , P 1 , P 2 generated during a vehicle travelled through the intersection. The area between S and E is the affected area of the intersection. It is assumed the vehicle travels at a uniform speed outside the area. The real travel time from S to E could be calculated as follows: where: L 1 is the distance between point P 0 and point S;   The control delay could be calculated as: where: L is the distance between point S and E; v f is free flow speed.
To quantitatively evaluate the accuracy of the estimation results, the Root Mean Square Error (RMSE) is selected as the evaluation indicator and provides an estimate of the goodness of fit between the estimated value and observed values according to: where: x i is the i-th estimated control delay value of the intersection; x is the i-th observed delay value of the intersection.
We list the RMSE for proposed method and reference method compared to the truth value of control delay in peak hour and off-peak hour in Table 1.
For low-frequency probe car data, the sample is random, which means that for a trajectory, there may exist many different sample point sequences. To test the stability of the proposed and reference methods, for each observation interval, all high-frequency trajectories were sampled 10 times at a low-frequency and control delay value of the intersection was calculated 10 times by the proposed and reference methods respectively. The results are presented in Table 2. The results show that the proposed method improves the accuracy of the reference method by 14% in the peak hour and 26% in off-peak period.
In Table 2, it can be seen that the standard deviation of the estimated control delays using the method proposed is smaller than that with the reference method. This shows that the method is more reliable.
To better demonstrate the reliability of the proposed method, a box plot is adopted to show the distribution of the control delay estimation results from 16:00…19:00. Figure 8 presents the estimated control delay value distribution of the intersection with the proposed and reference methods. The blue point is the ground-truth control delay value of the corresponding time period. P represents the proposed method and R, the reference method. 12 time periods, 15 min each, between 16:00…19:00 were generated. For each time period, the control delay of the intersection was estimated ten times by resampling the high-frequency trajectories at 30 s interval. As shown in Figure 8, for most time periods, the estimated control delay value distribution is more consistent than the control delay value distribution estimated by the reference method. Besides, in general, the mean value of the result obtained by proposed method is closer to the ground-truth value than the results from reference method. This shows that the proposed method has a better reliability.
The computer efficiency is another factor to be evaluated. The computer used in data processing and analysis had Intel® Core™ I5-8250 4 Cores 1.6GHz CPU, 4Gb memory, 1T Hard Disk and Windows 10 64 bit operation system. For an arterial with five intersections, link length is about 800 m. Five months historical data were used to capture vehicle dynamics through each signalized intersection. The computational time of this is from 5…10 min, but it only needed to be calculated once. Calculating control delay of each intersection takes about 7 s. These figures show that our approach has a satisfying computer efficiency and could be used in real time.
Given that there may be a concern that the sample interval may affect accuracy and reliability, we analyse the sensitivity of the sample interval to accuracy. For all the time periods, control delay of the target intersection was estimated with low-frequency trajectory data for different sample intervals, and the RMSE calculated. The results are shown in Figure 9.
As shown in Figure 9, although the RMSE increases as the sample interval becomes longer, the rate of growth is low and hence, at least for the range of sampling interval analysed (from 30 to 60 s), the accuracy of control delay estimation is largely insensitive to the sample interval.

Conclusions and recommendations
This article presents a novel method to estimate control delay at road intersections from low-frequency floating car data. In order to address the limitations of data sparseness, historical data are used to explore the deceleration and acceleration patterns. This helps to capture the changes in vehicle motion or dynamics through intersections to obtain the space-time ranges of the critical points. From these data and based on the geometric probability model, the distribution function of the delay values is obtained, from which the expected value of delay is calculated.
Both the proposed method and a reference method are compared against the truth control delay value of the target intersection for different time periods. The results show that proposed method has an RMSE of 11.8 s compared to 13.7 s for the reference method for the peak period. The corresponding values for the off-peak period are 9.3 s and 12.5 s. In addition to a better accuracy, the mean and standard deviation statistics show that the proposed methods outperforms the reference method.
The method proposed in this article could be used to estimate the control delay at road network intersections from sparse data (e.g. from floating cars), which is important for traffic management and control, and hence the improvement of the overall of operational efficiency of a road network. However, there are some limitations with the research methodology should be highlighted in order to enhance its applicability and transferability. First, as observed control delay is not easy to obtain, the statistical analysis is based on the observation result at a selected intersection for six hours. More data are needed to validate our conclusion. Second, the sample size of the trajectory data is not considered in our research. It may have impact on the performance of our method. The relationship between the sample size and performance of our method will be investigated in the future. Time period 1-P 1-R 2-P 3-P 4-P 5-P 6-P 7-P 8-P 9-P 10-P 11-P 12-P 2-R 3-R 4-R 5-R 6-R 7-R 8-R 9-R 10-R 11-R 12-R