A TRAFFIC FUNDAMENTAL DIAGRAM CALIBRATING METHODOLOGY TO AVOID UNBALANCED SPEED–DENSITY OBSERVATIONS

Traffic fundamental diagram is extremely important to analyse traffic flow and traffic capacity, and the central part of traffic fundamental diagram is to calibrate speed–density relationship. However, because of unbalanced speed–density observations, calibrating results using Least Square Method (LSM) with all speed–density points always lead to inaccurate effect, so this paper proposed a selecting data sample method and then LSM was used to calibrate four well-known single-regime models. Comparisons were made among the results using LSM with all speed–density points and the selecting data sample. Results indicated that the selecting data sample method proposed by this paper can calibrate the singleregime models well, and the method overcomes the inaccurate effect caused by unbalanced speed–density observations. Data from different highways validated the results. The contribution of this paper is that the proposed method can help researchers to determine more precise traffic fundamental diagram.


Introduction
Traffic fundamental diagram is significant to analyse traffic flow and capacity (Zhang et al. 2019), and manage traffic operation (Alonso et al. 2019). It is also the basis to establish traffic flow models (Fiems et al. 2019) for traffic control (Zhu, Li 2019). The main part of traffic fundamental diagram is to determine the speed-density relationship. Therefore, it is very important to model the traffic fundamental diagram (Baer et al. 2019) and to calibrate the corresponding models precisely according to speeddensity observations.
Researchers have done much work (Del Castillo, Benítez 1995a, 1995bJiang, Huang 2009;Lam et al. 2013;Wang et al. 2013) to analyse the speed-density relation-ship, and many models were developed including singleregime models and multi-regimes. For single-regime models, Greenshields et al. (1935) according to the limited data thought the speed and density relationship was a straight line; Greenberg (1959) assumed traffic to behave like a continuous fluid and developed a speed-density relationship model, shown in Table 1; apparently, the Greenberg model fails to remain finite at zero density, Underwood thought that the infinity asymptote may be along the density scale (Drake et al. 1967) and the Underwood model was proposed as illustrated in Table 1; the speed-density data from the Eisenhower Expressway exhibited concavity at low densities and Drake et al. (1967) suggested a bellshaped curve model, which is the Northwestern model as shown in Table 1; Newell (1961) considered the nonlinear car-following model and determined one speed-density model; Wang et al. (2011) established the logistic model of the equilibrium speed-density relationship motivated by the success of the logistic curves in modelling the growth phenomenon such as plant growth in agriculture, population dynamics, market growth in economics and epidemic growth in biology. For multi-regime models, Edie (1961) thought that the extreme of very light traffic and also the extreme of very dense traffic might show different behaviours and different models should be used; Sun and Zhou (2005) used cluster analysis to specify the number of regimes and it was shown that the k-means algorithm with original data worked well and could be conveniently used in practice. Although multi-regime models can accurately reflect the speed-density relationship to some extent, the mathematical elegance of multi-regime models is always not perfect, so the single-regime models are more often used. The four well-known single-regime models of Greenberg, Underwood, Northwestern and Newell are shown in Table 1, respectively. Though almost all the researchers (Poole, Kotsialos 2016;Zhong et al. 2016;Knoop, Daamen 2017) calibrated the traffic fundamental diagram models using the LSM, it has been verified that the calibrating single-regime models are not accurate under congested/jam conditions because of unbalanced observations of speed-density points (Qu et al. 2015). In other words, since most real speed-density observations are located in the uncongested condition (Maghrour Zefreh, Török 2020), the calibrated models using LSM can reflect the uncongested conditions precisely, but not under jam conditions, which can lead to significant errors for congested conditions. To overcome this problem, Qu et al. (2015Qu et al. ( , 2017 introduced WLSM to calibrate single-regime models, the weights were related to the density distance between adjacent data points and speed-density data points at congested conditions with little observations had large weights. Results showed better effect. However, the weight determination process of WLSM by Qu et al. (2015) is very complicated and the best weights of different models may be different . Zhang et al. (2017) used five weight determination methods of WLSM to calibrate five one-regime speed-density models, and results showed that different models have different best calibrating models. Bhouri et al. (2019) proposed a data-driven approach for estimating the fundamental diagram, and unbiased fundamental diagram was obtained for both congested and uncongested observations. However, it is not suitable to single-regime models. To avoid using WLSM and to calibrate single-regime models precisely both under congested and uncongested conditions, this paper tries to propose a new method to calibrate single-regime models.
The structure of this paper is organized as follows. The methods and data used to calibrate single-regime speeddensity models in the research are shown in Section 1. Results and discussion is presented in Section 2, followed by Results validation in Section 3. Then, conclusion is followed.

Methodologies and data
In this section, methodologies including the method to select data sample, LSM, and evaluation indicators are presented. The selecting data sample method is first designed to determine the data that are used to avoid unbalanced speed-density observations. Then, LSM is presented to calibrate the fundamental diagrams with the selected data. In addition, the calibration effectiveness is evaluated using RE and MSE, respectively. Besides, the original data information is presented at the end of this section.

Selecting data sample method
In order to avoid using the unbalanced speed-density observations, which leads to inaccurate calibrating, the selecting data sample method is proposed to balance the speed-density distribution to calibrate single-regime models. The specific steps of the method are as follows and are shown in Figure 1: »» Step 1: Rank the speed-density observations considering their densities (Qu et al. 2015) and speeds. Data points become: where: Step 2: Denote k A = 0, k B = 0 veh/km, i = 1. Data points (v 1 , k 1 ) and (v n , k n ) are selected; »» Step 3: Divide the density region from k A to k B into m parts and denote j = 1; Divide the density region from k to k into m parts and denote j = 1 Candidates with same k i Select middle point among candidates data points with same k i are candidates as shown in Figure 2a and the middle point considering speed among these candidates is selected finally 1 , go to Step 5 otherwise data points with same k i+1 are candidates as shown in Figure 2b and the corresponding point is selected otherwise start Step 5; Step 4.

LSM
LSM is used in many fields to determine parameters, such as electrics (Zheng et al. 2015), magnetic gradient tensor system (Yin et al. 2015), inductance estimation of electrically excited synchronous motor (Jeong et al. 2015), rating curves (Kim et al. 2014), electrochemical degradation of three reactive dyes (Djafarzadeh et al. 2014), laminar flow and heat transfer, thermal and flow analysis of microchannel heat sink , electrohydrodynamic flow (Ghasemi et al. 2014) as well as transportation researches (Qu et al. 2015). The LSM seeks a solution that minimizes the function (Washington et al. 2020) as shown in Equation (2): By setting the partial derivative/s of Q with respect to b equal to zero, the least square estimated parameter/s b are/is obtained (Washington et al. 2020 Solving Equation (3), the value/s of parameter/s b are/ is obtained. 1 If there is only one candidate, the candidate is as the middle point; if there are odd numbered candidates S, the (S-1)/2th point of the candidates considering speed is the middle point; if there are even numbered candidates S, the S/2th one is the middle point.

RE and MSE
To evaluate the effectiveness of the proposed methods, RE (Qu et al. 2015) and MSE (Washington et al. 2020) were used, as shown in Equations (4) and (5): where: the parameters are the same with above.

Data information
The original ITS data of GA400 were aggregated every 5minutes, which were often used to calibrate the speeddensity relationship models. One-year of 44787 continuous observations were obtained in 2003, of which the time interval is long enough to calibrate the speed-density relationships. The specific distribution of the data is shown in Figure 3a and in Table 2, and the unbalanced distribution of observation distribution can be seen easily.

Results and discussion
According to the method in Section 1.1, denote m equals 10, 20 and 50, respectively, and the three data samples are 125, 238 and 552 speed-density points respectively, as shown in Figure 3b and Table 3, Figure 3c and Table 4, Figure 3d and Table 5. And compared to Figure 3a, the speed-density points are balanced distribution. The four well-known single-regime models in Table 1 with methods of LSM using all speed-density points, LSM using 125 sample points, LSM using 238 sample points and LSM using 552 sample points are calibrated in Figure 4 and Table 6.
From Figure 4, it is obvious that all single-regime models' calibrating results with methods LSM125, LSM238 and LSM552 are better than that with method of LSM except the Northwestern model, especially under congested/jam conditions. In order to analyse the specific calibrating efficiency, the REs and MSEs of these four well-known singleregime models with different methods are calculated, as shown in Figures 5 and 6.
From Figures 5 and 6, for Greenberg model, the REs of methods using LSM with selecting samples are less than 0.4 when the traffic density is more than 30 veh/km, which are less than that of method using LSM with all the speed-density points. The RE of using LSM with all speed-density points is even larger than 1 when the traffic density is more than 60 veh/km, and the REs of methods using LSM with selecting samples are much similar, which indicates that these methods all can calibrate the Greenberg model well. For REs of LSM with selecting samples, RE of LSM125 is the lowest one in a whole, RE of LSM552 comes second, and then is the RE of LSM238 and these indicates that the calibrating with method of LSM125 is the best one in the concern of RE. For MSEs, that is similar  Figure 3. Speed-density data of GA400: a -all data; b -data sample of 125 points; c -data sample of 238 points; d -data sample of 552 points   to REs of Greenberg model, MSEs of LSM with selecting speed-density samples are much lower than that of method using LSM with all points when the traffic density is more than 30 veh/km, and MSE of LSM125 is the lowest one in a whole, MSE of LSM552 comes second, and then is the MSE of LSM238 and these indicates that the calibrating with method of LSM125 is the best one in the concern of MSE. In a word, the Greenberg model calibrating result of LSM selecting sample method is reasonable and much better than that of LSM using all speed-density points.
For Underwood model, the REs of methods using LSM with selecting samples are less than 0.3 when the traffic density is more than 30 veh/km. The REs of methods using LSM with selecting samples are much similar, which indicates that these methods all can calibrate the Underwood model well. For REs of LSM with selecting samples, RE of LSM552 is the lowest one in a whole, RE of LSM238 comes second, and then is the RE of LSM125 and these indicates that the calibrating with method of LSM552 is the best one in the concern of RE. For MSEs, that is similar to REs of Underwood model, MSEs of LSM with selecting speed-density samples are much lower than that of method using LSM with all points when the traffic density is more than 30 veh/km, and MSE of LSM552 is the lowest one in a whole, MSE of LSM238 comes sec-ond, and then is the MSE of LSM125 and these indicates that the calibrating with method of LSM552 is the best one in the concern of MSE. In a word, the Underwood model calibrating result of LSM selecting sample method is reasonable and much better than that of LSM using all speed-density points.
For Northwestern model, the REs of methods using LSM with selecting samples are less than that of method using LSM with all the speed-density points when the traffic density is more than 50 veh/km. In addition, the REs of methods using LSM with selecting samples are much similar, which indicates that these methods all can calibrate the Northwestern model well. For REs of LSM with selecting samples, RE of LSM238 is the lowest one in a whole, RE of LSM552 comes second, and then is the RE Table 6. Calibrating results of four well-known single-regime models with different methods

Model
Function  of LSM125. For MSEs, that is similar to REs of Northwestern model, MSEs of LSM of selecting speed-density samples are much lower than that of method using LSM with all points when the traffic density is more than 50 veh/km, and MSE of LSM238 is the lowest one in a whole, MSE of LSM552 comes second, and then is the MSE of LSM125. Though these indicated that Northwestern model calibrating result of LSM selecting sample method is better than that using all speed-density points, it is not recommended as the REs is even greater than 0.4 when the density is more than 70 veh/km. For Newell model, the REs of methods using LSM with selecting samples are less than that of method using LSM with all the speed-density points when the traffic density is more than 70 and between 30 and 50veh/km. In addi-tion, the REs of methods using LSM with selecting samples are much similar, which indicates that these methods all can calibrate the Newell model well. For REs of LSM with selecting samples, RE of LSM552 is the lowest one in a whole, RE of LSM238 comes second, and then is the RE of LSM125, and these indicates that the calibrating with method of LSM552 is the best one in the concern of RE. For MSEs, that is similar to REs of Newell model, MSEs of LSM of selecting speed-density samples are much lower than that of method using LSM with all points when the traffic density is more than 70 and between 30 and 50 veh/km, and MSE of LSM552 is the lowest one in a whole, MSE of LSM238 comes second, and then is the MSE of LSM125 and these indicates that the calibrating with method of LSM552 is the best one in the concern  of MSE. In a word, the Newell model calibrating result of LSM selecting sample method is reasonable and much better than that of LSM using all speed-density points. From above, the results of LSM125, LSM238 and LSM552 are different but very similar, so for convenience, we recommend the selecting data interval of density is 1 veh/km.

Results validation
To validate the effect of the proposed method in other highways, we used the data from I-80 in California to crosscheck the fitting results. The original data and the data sample selected with density interval 1 of I-80 are shown in Figure 7a and 7b, respectively. There are 76 speed-density points were selected as shown in Figure 7b. Unbalanced distribution of speed-density observation can be seen easily in Figure 7a and the data sample selected by the proposed method is almost evenly distributed shown in Figure 7b. The fitting curves using LSM with original data and data sample are shown in Figure 8 and Table 7. The results indicated that all calibrated models with data sample are better those with original data except Northwestern model.
To further analyse the calibration, RE and MSE were used to evaluated the effects and results are shown in Figures 9 and 10. If original data was used to calibrate speeddensity models, the results with low density (0…20 veh/km) were perfect with no doubt, because most observations are located between 0 and 20 veh/km as shown in Figure 8.   If density is high, the calibration with original data is not appropriate and REs are even greater than 0.5. However, the calibration with data sample selected using the proposed method is suitable whatever the density is, and REs are lower than about 0.3 except Northwestern model. MSEs also indicated the calibration with data sample is better than those with original data except Northwestern model. These verified that the proposed data section method is effective. For Northwestern model, the calibration with either original data or data sample is not suitable, as the REs is even greater than 0.5 under high density as shown in Figure 9c. It is consistent with the result for GA400 as shown in Figure 5c. These showed that Northwestern model is not suitable for some specific speed-density data.

Conclusions
For highway and freeways, traffic flow operates at uncongested conditions for most time. Little speed-density data points measured are located in the congested area. If LSM is used to calibrate fundamental diagrams with the original speed-density data, it can lead to unprecise results. To overcome this problem, this study proposed a new method to calibrate fundamental diagrams.
The main contribution of this study is that a selecting data sample method was proposed to avoid the unbalanced speed-density observations, which can lead to inaccurate calibration of the single-regime speed-density models under congested/jam conditions calibrated using LSM. The specific steps of the method were explained and illustrated in a figure. Real speed-density data from different highways were used to test the proposed method, and results indicated, in general, that the method can calibrate the single-regime speed-density relationship models precisely both under congested and uncongested conditions. The method proposed by this paper can help to establish traffic flow models precisely and is useful for traffic control in practice. In the future, multi-regime speeddensity models may be tested using the method proposed by this paper.

Author contributions
Chunbo Zhang conceived the study and was responsible for the design and development of the data analysis.
Zhaoguo Huang and Yonggang Wang were responsible for data interpretation.

Disclosure statement
The authors declare that there is no conflict of interest regarding the publication of this paper.