A ROBUST METHOD FOR REAL TIME ESTIMATION OF TRAVEL TIMES FOR DENSE URBAN ROAD NETWORKS USING POINT-TO-POINT DETECTORS

. Data collection for the provision of real time traveller information services is a key issue, both for the travel-lers as well as for traffic managers. This paper presents a methodology for estimating travel times in dense urban road networks using point-to-point detectors. The aim is to fill in the gap of existing travel time estimation methodologies, which are based on point-to-point detection devices. Bluetooth (BT) is considered as one of the less expensive technologies for estimating travel times. Data filtering and data correction require rigorous methodologies, which if not correctly applied may result in inaccurate results as compared to other methods. The main difficulty of data processing is to identify the correct set of Media Access Control (MAC) addresses for estimating travel times, especially in dense urban road networks, where three main error sources exist: the co-existence of various transport modes (private vehicles, buses, pedestrians, bicycles etc.), the existence of more than one possible paths between two BT detectors and the existence of stops or trips ending between two BT detectors. These error sources create outliers that need to be identified and taken into account. The results of the proposed methodology confirm that outliers are eliminated, as shown by a case study with 10 BT detectors installed at major intersections of Thessaloniki’s Central Business District (CBD).


Introduction
The provision of accurate real time information services is essential for assisting and improving the decsisions of travellers in dense urban road networks (Mitsakis et al. 2015), within the framework of the advances provided by Intelligent Transport Systems (ITS) (Giannopoulos et al. 2012). Urban traffic characteristics may vary due to time-dependent demand variations, various traveller profiles, congestion and congestion effects and inhomogeneous fleets of vehicles. Travel times estimation in urban road networks relies on multiple data sources, such as Bluetooth (BT) detectors, Automated Number Plate Recognition (ANPR) cameras, cumulative counts and GPS enabled probe vehicles (Floating Car Data) or personal smart devices. BT detectors can scan and record travellers' BT devices, providing direct travel time monitoring by matching the MAC addresses of BT devices along pre-defined paths between two detectors. They hence constitute a valuable data source for urban traffic monitoring. However, since urban travel times show significant variations (e.g. due to different types of drivers' behavior, variations in traffic flow states etc.), provision of average travel times may result to lower accuracy levels.
In the context of the research presented herein, BT detectors are used for estimating point-to-point travel times for private vehicles on major paths in the Central Business District (CBD) of Thessaloniki, Greece.
The scientific literature on the use of BT detectors for travel time estimations indicates that data filtering is important, since several error sources may be contained. It is therefore necessary to effectively handle outliers, which may be a result of en-route stops, trip ends between detectors or detours deviating from the predefined paths. Data collected by BT detectors may also include records from other transportation modes, such as pedestrians, bicycles, users and vehicles of the public transport system as well as atypical vehicles, such as couriers or delivery vehicles. These result in significant scattering of the collected data. However, data cleaning (filtering and correction) methods for datasets coming from BT detectors can still be improved, since network particularities are significantly affecting their accuracy. In this context, the aim of the current research is to define a robust method for the estimation of travel times in dense urban road networks. Since most of the existing methodologies have focused on freeways, the present paper aims to fill in the identified gap related to the estimation of travel times in dense road networks within urban environments using BT detectors.
The paper is structured as follows: A critical literature review is presented in the first section, discussing existing studies on the estimation of travel times by using point-to-point detectors, especially BT detectors. The second section highlights the identified problems. The proposed methodology is presented in detail in the third section. Fourth section presents the results of the application of the methodology in the CBD of the city of Thessaloniki, followed by technical issues that are discussed in section five. Finally, conclusions and future research directions are presented in the final section of the paper.

Literature Review
Research on traffic monitoring based on data collected by BT detectors has mainly been reported during the last years. The idea of applying the BT technology for monitoring traffic conditions traces its roots back to the seminal paper of Pasolini and Verdone (2002). Their experiments constituted a proof of concept for the dynamics of the data sources. Welsh et al. (2002) worked on the improvements of the BT connection times between moving devices and proposed the creation of a mesh network of BT connected devices. Ahmed et al. (2008) further investigated the concept, envisioning the utilization of BT, in order to create a static mesh network for ITS data collection. The BT detection technology has been shown to be effective in several research studies (Ahmed et al. 2008;Sharifi et al. 2011). Analyses for freeways and arterials demonstrated that methodologies based on data recorded by BT detectors are capable of capturing traffic conditions as accurately as other data sources, including intrusive methods, such as loop detectors as well as non-intrusive, such as Floating Car Data (Haghani et al. 2010;Quayle et al. 2010;Wasson et al. 2008). Young (2008) performed experiments that verified the validity of BT technology for the collection of traffic data (see also Young et al. 2014). Wasson et al. (2008) indicated that traffic signals and route diversions have a significant impact on arterial traffic data collected by BT detectors.
Other research focused on analyses of traffic under adverse weather conditions (Martchouk et al. 2011) and road works (Haseman et al. 2010). The results confirm that data collected by BT detectors can be used for monitoring non-recurrent events (Hainen et al. 2011).
However, only limited research results exist on data filtering methods, despite the fact that the significant presence of outliers in the distribution of travel time measurements has already been identified by various researchers. Haghani et al. (2010) created a four step filtering approach to eliminate travel time outliers. They have shown that filtered data were not significantly different compared to Floating Car Data. Tsubota et al. (2011), after applying a filtering technique with an upper bound for travel times and for the number of bypassing detected devices, applied a third filter every 5 minutes that eliminates values within the fourth quartile (larger than the 75-percentile value). This approach provided accurate travel time estimations along arterials. Mei et al. (2012) investigated bicycle travel time estimation with BT detectors on a short corridor, suggesting an offline median filtering algorithm to eliminate the outliers. Outliers derived from the three types of errors contained in the distribution of travel times (spatial, temporal and sampling errors). Puckett and Vickich (2010) focused their research on urban travel time estimation and proved the viability of BT-based traffic data collection technologies. In their experiments they have justified the reduced validity of initial filtering techniques, where travel times that differ 25% are labelled as invalid and not considered. They have concluded that the proposed technique performs better on freeways with high traffic volumes and less speed variation, rather than on arterials with significantly varying speeds.
However research in filtering techniques for dense urban road networks, comprised of shorter (100-300 meters) and narrower segments is, at least to the authors' knowledge, limited.
Other types of point-to-point collection methods presented in the literature propose the use of mobile cell phones and Probe vehicles. Van Zuylen et al. (2010) introduced and compared two methods for decomposing travel times collected by probe vehicles along individual paths.  presented a technique to incorporate probe measurements from mobile cell phones for calibrating traffic flow models. The methodology was tested with data from the Mobile Century experiment , showing that data coming from mobile cell phones can successfully be used for the estimation of traffic conditions. Barceló et al. (2010) considered the combination of BT data with historical traffic data, in order to forecast travel times. They applied a Kalman filtering technique with a predicted to actual travel times R-square value of 98.6%. Li et al. (2011) compared the performance of various traffic data collection methods for urban road network environments, showing similarities, advantages and disadvantages. The authors examined BT detectors, Automated Number Plate Recognition cameras, cumulative counts and GPS enabled probe vehicles. BT detectors, ANPR and cumulative counts are techniques based on two or more point measurements and do not monitor vehicles between the observation points. This may be considered as a disadvantage, since for the case of urban road networks a relatively large percentage of vehicles may follow detours instead of pre-defined paths between two points. E. Mitsakis et al. A robust method for real time estimation of travel times for dense urban road networks ... Antoniou et al. (2011) provided an overview of data collection technologies and their impacts on traffic management applications. They argued that on the one hand point-to-point based data collection methods are more appropriate for measuring traffic characteristics, such as speeds and route choice fractions, while one the other hand traffic counts may not be captured correctly, since the vehicle-to-device correspondence is not necessarily one-to-one (drivers may not have a BT-equipped device on-board, or alternatively a vehicle may contain on-board more than one BT devices).
Recently, Nantes et al. (2014) further investigated the usability of BT data sources for city scale management of urban transport networks. Their research focused on both aspects of travel time estimation and route choice modelling.
The interested reader is referred to the work of Rescot (2011), Sintonen (2012 and Haghani, Hamedi (2013) for extended literature reviews on BT traffic monitoring and travel time estimation.

Problem Identification
The major problem associated with data collected by BT detectors in dense urban road networks, as opposed to highways or freeways, is related to the identification of invalid values, mostly due to: -users stopping between the detectors (short or long duration stops); -users passing by the second detector after long detours (with intermediate stops); -existence of alternative modes used along the path between the detectors; -existence of alternative paths, different from the pre-defined path, between two detectors; -atypical driving behaviors.
In order to qualitatively illustrate the above, the travel times of one pre-defined path for a typical weekday between 9:00 and 10:00 am are represented in Fig. 1.
The outliers observed in the upper part of Fig. 1 are also identified in the trajectories shown in the lower part. Outliers are identified as measurements with higher travel times, while they can also be observed as trajectories with significantly different slopes in the lower part of the figure.
The proposed methodology aims to dynamically exclude these values and provide representative travel time values for each path.

Statistical Methods Justification
An outlying observation (outlier) is one that appears to deviate significantly from other values of the analysed sample. The choice of how to deal with an outlier should depend on the cause. In the case of dense urban road networks, the distribution of travel times may vary significantly, due to the fact that it is possible to incorporate one or more qualitative variables that represent different paths and different modes. This specificity of the distributions does not allow distinguishing data correctly, because extreme (travel time) values of one transport mode may be a valid of another mode. The aim is to provide travel time estimations for private vehicles, which are assumed to be the most frequently used transport mode. This is however an assumption that needs to be validated on a case-by-case basis. Fig. 2 illustrates the Probability Distribution Function (PDF) of travel time observed by BT detectors in relation to travel times of alternative modes and paths.
Robust statistics provide methods that are not affected by outliers. Trimmed estimators are general methods for robust statistics. Based on the filtering approach :00:00 9:06:00 9:12:00 9:00:00 9:18:00 9:24:00 9:30:00 9:36:00 9:42:00 9:48:00 9:54:00 10:00:00 Travel time of Tsubota et al. (2011), who used the 75-percentile value of the observed data and applied the median estimator, the trimmed mean is considered as an adequate estimator for travel time estimation in urban areas. By trimming data the distribution of observed values can be mutated from skewed probability function distribution to almost central PDF. Fig. 3 illustrates the relationship among the stated estimators. Also it aims to provide the proper rationale of the retention of all values (as opposed to exclusion of outliers) combined with the use of robust estimators, such as trimmed mean and mode values, is included in the proposed methodology.

Definitions
The following definitions are proposed and used in the remainder of the methodology. The defined terms are schematically represented in Fig. 4: -Link: A link is the single, directed and unique road segment between two intersections. It is characterized by the free flow travel time and its source and sink nodes (intersections); -Path: A path is the unique group of directed links between two intersections equipped with BT detectors; -Route: A route is a unique group of paths passing by various BT-equipped intersections; -Itinerary: An itinerary is the group of intersections equipped with BT detectors followed by each traveller; -Trip: A trip is the path followed by each traveller along his/her itinerary, characterized by the path and the timestamps (trip start and trip end); -Detour: A detour is the path followed by a traveller, which is different than the pre-defined one between two equipped intersections.

Preparatory Procedures
Prior to the methodology itself, it is crucial to point out some critical aspects related to necessary preparatory procedures.
The BT detectors network that will provide the highest quality of data for the real time analyses needs to be carefully designed. Key issue is the selection of the locations for installing the detectors and the analysis of the characteristics of the pre-defined paths. The paths' length should be carefully selected. Long paths may present several alternatives for drivers and the probability of identifying drivers along longer paths may be lower compared to shorter ones. On the other hand, short paths can be associated with errors, due to the significance of the detectors' range (e.g. overlapping of two subsequent detectors' coverage areas). Once the paths are selected, the characteristics of each path need to be analysed in terms of: -existence of public transport routes passing by the path; -existence of alternative routes used for the same path; -free flow conditions and historical traffic data of the path. Once all paths between two detectors have been defined, longer routes can be formed by combining selected paths (by adding the travel times of each path), in order to cover larger areas.
Another important issue of the preparatory procedures is to identify fleets of vehicles using BT devices. Some of these fleets may use the same Media Access Control Identities (MAC-ID) for all users and must be added to a 'taboo MAC-IDs list' . Common MAC-IDs (e. g. 00:00:00:00:00:00, 11:11:11:11:11:11, etc.) must be added to this list as well. This list is used for ensuring that BT devices detected by their MAC-IDs are unique, random and representative.

Real Time Methodology
Once the network of detectors is defined, installed and calibrated, the collected data is analysed in real time. This processed data is then provided as information to travellers. The proposed methodology is comprised of seven steps: -STEP 1 -Data collection: all MAC-IDs are detected by the BT detectors as soon as BT-enabled devices come into the detection range and collected in the local databases, recording the MAC-ID and the timestamp of each record; -STEP 2 -Data transmission: all local databases of the BT detectors transfer the dataset to a central database (e.g. through GPRS communications), adding to the MAC-ID and timestamp the identity of the local BT detection unit; -STEP 3 -Taboo entries: MAC-IDs included in the taboo list are deleted from the database; -STEP 4 -Itinerary creation: the itinerary (e.g. A-B-C) followed by each MAC-ID is created using the timestamps. The travel times of independent followed paths (A-B and B-C, not A-C) are calculated; -STEP 5 -Trip starts and trip ends definition: all travel times related to each path are stored in the database, using path id and two timestamps (trip start and trip end); -STEP 6 -Identification of the potential subset of travel times: the potential subset of travel times for the final analysis is selected by applying physical and statistical bounds to the trip starts and trip ends. Valid trips are identified based on one of the following conditions: -the sample size within the time windows T1 (trip ends) and T2 (trip starts) is a minimum adequate. If this condition is fulfilled, the method continues at Step 7. -trip ends reach a maximum time (T1 in Fig. 5) without collecting an adequate sample. If this condition is fulfilled, free flow traffic conditions are assumed if the last travel time is below a threshold; -STEP 7 -Travel time estimators: trimmed mean and mode values are the two estimators for travel times. Trimmed mean is calculated within the second and third quartile of the final set. The trimmed mean aims to eliminate the outliers' effect on the calculated average. Different percentiles can be used, in order to achieve easier adaptability of the method (T3 and T4 in Fig. 6). The selection of the exact values of the T3 and T4 percentiles are related to traffic characteristics of the path, the transport modes used more frequently and the number of alternative paths used. Another approach is the mode estimation, which works sufficiently well, since mode estimation is insensitive to outliers. For some cases, mode val-ue is not necessary unique. This however needs to be validated, since it may suggest the existence of alternative transport modes or paths. The existence of significantly different travel time values with high frequency can be related to various transport modes or alternative paths. Both issues can be tackled during the preparatory procedures, by identifying paths used by public transport vehicles or by using detectors at intermediate pass-points along the paths.
Auxiliary detectors can be used for validating that the used route is the desired one (A-a-B means that the route A-B contains an auxiliary detector a, the valid route is A-a-B, not A-B).
If paths are used by public transport vehicles, then all the MAC-IDs of travellers using such modes will be concentrated in time, and they can be un-weighted as follows: -trip starts are grouped in 15-30 seconds intervals and one unique trip start and trip end is calculated for each group; -the optimum time for grouping the data is related to the frequency of the public transport services and the traffic signals phases. Steps 6 and 7 are repeated for the new dataset. Methodological flowchart is available in Fig. 7.

Use Case
10 BT detectors have been installed at major intersections in the CBD of Thessaloniki (Fig. 8), within the framework of the Thessaloniki's Urban Mobility Management System (Morfoulaki et al. 2011). Five of them are located next to Variable Message Signs that inform citizens about the traffic conditions along ten selected routes towards the city centre.
The methodology presented in Section 4 has been used for calculating travel times along 60 routes, 22 of which are calculated directly and 38 additional routes (combinations of these 22 paths).
Results from the application of the proposed methodology in the CBD of Thessaloniki are presented below. The data refer to the time period: 11-22 September 2012 (Fig. 9). For the specific date, statistics of travel times are presented in Fig. 11.
In order to provide an indication of the number of detected devices used for the estimation of travel times, it is mentioned that at one major intersection in Thessaloniki's CBD where traffic counts during the morning peak hour indicated 3765 vehicles, the total number of detected BT-enabled devices was 936. Thereof, 667 valid trips were produced by the proposed travel time estimation method.
In order to validate the methodology, real measurements of travel times during a typical day were executed and compared to the estimated values. The validation was performed at a Monday, in two time windows (morning peak hour 8:00-9:00 and afternoon peak hour 16:00-17:00). Ten probe vehicles were used for measuring travel times along 10 of the 22 paths. Fig. 12 presents the estimated (trimmed mean) and the measured (tester TT) travel times for one of the paths.
Three indicators are used for the evaluation of the obtained results, as shown in Table. The absolute percentage error is 15.4% for the morning and 9.38% for the afternoon peak hour. However, in 40% and 70% of the predefined paths, for morning peak and afternoon peak hours respectively, the estimation of travel time was absolutely accurate (with insignificant variation).    two users having the same MAC-ID without being a part of any of fleet, but this possibility is considered as remote and it will be considered as one of the outliers and deleted in the proposed filters; -characteristics of the MAC-IDs: all MAC-IDs have the same weight and are treated in the same way; since part of the MAC-ID indicates the producer of the BT chipset, the use of a list containing the MAC-IDs related to car users can be utilized to weight the detections accordingly; -since travel times depend on driving behavior, travel times for the same paths are not unique. The proposed methodology could be extended to provide a range of travel times rather than a unique value; -due to the large effects of delays at signalized intersections, travel times along same paths may vary. This issue can be solved by selecting a time interval (for the time analysis) integer multiple of the signal cycle time, using in this way average travel times of platoons of vehicles passing at different stages of the traffic lights; -extended detection range of the devices: ΒΤ detectors have an extended detection range (more than 100 meters), detecting the vehicle before it reaches an intersection; this detection distance depends on the network geometry and the surrounding environment, with different values for the different approaches to the intersection. This can be solved by using large paths, where this difference will have limited influence on the results; -low utilization of BT: since the methodology is based on the measurement of travel times of some devices with active BT, this 'random sample' must be representative of the population; -although fixed upper bounds for travel time estimation are proposed in previous studies, varying bounds defined form a statistical point of view are preferable.

Conclusions
A robust method for estimating travel times with the use of point-to-point Bluetooth (BT) detectors in dense urban road networks has been presented. The method is comprised of preparatory procedures that need to be executed in advance, followed by seven steps, which aim to cope with the influence of outliers and provide accurate travel times estimations. These can be used for the provision of real time traveller information services.
The proposed method uses trimmed mean and mode for the estimation of travel times.
A case study with the application of the proposed method has been presented, highlighting the applicability and the accuracy of the method in an actual road network in the Central Business District (CBD) of Thessaloniki, Greece.
Finally, remarks related mostly to technological issues for the use of BT detectors for traffic data collection have been presented.
The proposed methodology can have a wide range of applications and can be extended to other similar technologies, such as automatic toll collection systems or Wi-Fi sensors.
Since the majority of previous research and related studies focused on freeways, the proposed method fills in the gap of the existing level of knowledge related to BT detectors utilization for travel time estimations in dense urban road networks.
Future work will include the un-weighting of observations coming from public transport users, the identification of different transport modes and the study of cases of unplanned incidents.