AIR TRAFFIC FORECAST IN POST-LIBERALIZATION CONTEXT: A DYNAMIC LINEAR MODELS APPROACH

The process of air transport liberalization in Colombia began in 1991. Liberalization entailed the entry of private capital into the airport sector which subsequently led, in several temporary phases, to the privatization of the country’s main airports. Simultaneously, new air operators entered the market. This new market situation, supported by the complete deregulation of airfares, generated a dynamic and sustained growth of air transport in Colombia for two decades. Within the context of post-liberalization, this article presents a forecast (medium-term – 5 years period) of air traffic in the country’s main airport using DLMs (Dynamic Linear Models). It has the following advantages vs. the usual forecast calculation methodologies: it detects stochastic tendencies that are hidden in the time series. It also detects structural changes that allow estimating the variable effect of exogenous shocks over time without increasing the number of parameters. From the results obtained, it should be noted that the application of DLMs presents MAPE (Mean Absolute Percentage Error) values below 1%, which guarantees predictions of higher accuracy and thus introduces a new alternative model to develop reliable forecasts in air transport, at least in the medium-term.


Introduction
Air transport in Colombia has been developing at an accelerated pace for about two and a half decades. This period coincides with the beginning of a continued implementation of public policies, designed specifically for the sector of air transport to drive and promote it. Since the 1990s, the growth of air traffic in Colombia has been strengthened by the public policy of liberalization of airspace in both domestic and international markets. In addition it has been reinforced by the reorientation of public and private investment toward modernizing and updating airport infrastructure through concessioning the busiest airports in the country (Díaz Olariaga & Ávila, 2015). The first generation of airport concessions was implemented in the mid-1990s and since then three additional generations have taken place (Díaz Olariaga, 2017). In the commercial aviation sector, the national airline was privatized during the same period and new (private) air carriers entered the market, including low-cost carriers (LCC). Since 2012, airfares are completely deregulated (Díaz Olariaga & Zea, 2018).
As a result of public policies, both of privatization and public investment in airport infrastructure (together with deregulating policies in the commercial aviation sector), passenger transport in Colombia has increased 863% (Aerocivil, 2019) during the last 15 years. This significant growth has been boosted and led by the main airport in Colombia, Bogotá-El Dorado International Airport (hereafter BOG), in the country's capital city. However, the master plans for BOG (as well as several questionable studies and reports) predict growth in demand. Which implies that the airport's present capacity (as it has no plans to expand) will not meet the anticipated demand. This situation prompted the public sector to approve the construction of a new airport in the outskirts of the city, which is supposed to start operating in 2025/2026.
Forecasts are a crucial aspect of airport planning for determining future capacity requirements. Since airport infrastructure projects are expensive and involve many resources, an understanding based on data for future demand provides planners with necessary information for prospective decision-making in the short, medium and long term. Said data includes the expected number of aircraft movements, passenger traffic, and air cargo volumes. Therefore, despite unforeseen circumstances, the airport and aviation industry (airlines, aircraft manufacturers and aviation engines, air navigation service providers, etc.) require forecasts to anticipate future scenarios (Kazda & Caves, 2015;de Neufville & Odoni, 2013;Horonjeff et al., 2010). In regards to forecast horizons in the airport industry, it is common to make short-term forecast (next season, within the same year), short-term (1 year), medium-term (5 years), and long-term (20-25 years) (ACI, 2016;ICAO, 2006). Therefore, the goal of this article is to carry out a forecast for BOG (passengers, air cargo, and air operations or aircraft movements) in the medium-term. To this end, and as a calculation methodology (unique for this type of analysis of air traffic), (DLM) will be used, which in comparison with usual methods for forecast calculation, present the following advantages: detecting stochastic trends hidden in time series (West & Harrison, 2006), as well as structural changes that allow estimating the variable effect in time of exogenous shocks without increasing the number of parameters (Honjo, Shiraki, & Ashina, 2018). Furthermore, the structure of conditional independence (on which the state dynamics are based) allow for an interpretation of forecasts through a recursive algorithm (Petris, Petrone, & Campagnoli, 2009).
The present investigation is organized as follows: in the section Literature Review the current review of existing literature is carried out in two aspects. The first is on the investigations (type and approach) carried out in a context of liberalization of the aviation or air transport industry (worldwide), as it is the framework for the basis of this research. The second aspect focuses on the presentation of research and the methods used by academics to carry out air traffic forecasts. Subsequently the Methodology and Data section describes the methodology used in the research: (DLMs), and the type and origin of the data used in the calculations. In the following section, the Application Case (or Case Study) is presented, i.e. the information and general data of the airport for which the traffic prognosis is developed. In the next and penultimate section, all the results are presented and analyzed. In the concluding section the final results of the investigation are revealed.
Regarding the methods for forecasting, the air transport industry has been addressing the issue of traffic prognosis for at least six decades. However, academics only began to present formal studies and research about three decades ago. During this time a variety of models have been developed to predict the demand of passengers. The most used prediction methods can be classified into two large groups: economic models and time-series models (Dantas, Oliveira, & Repolho, 2017). The economic methods focus on the correlation between the demand of passengers and multiple variables, which are considered to be influential in the change of the economic environment and traffic system. The forecast models are established through a series of equations. Commonly used models include regression analysis (Abed, Ba-Fail, & Jasimuddin, 2001), causality test (Fernandes & Pacheco, 2010), logit model (Garrow & Koppelman, 2004), and gravitational models (Grosche, Rothlauf, & Heinzl, 2007). Time series methods primarily rely on historical data to predict by extracting the intrinsic relationship between current data and past observations. The various time-series models have been used to forecast passenger demand, such as smoothing techniques (Samagaio & Wolters, 2010), the adapted Markov model (Chin & Tay, 2001), ARIMA/ SARIMA (Tsui et al., 2014), seasonal adjustment method (Aston & Koopman, 2006), etc. However, due to the nonlinear nature of passenger demand, economic and timeseries approaches are severely criticized for their limited and ineffective forecasting (Tsui et al., 2014). Therefore, some academics try to explore other methods, such as artificial intelligence (for example, neural networks), which is characterized by self-adaptation and non-linearity and can map arbitrary functions (Jin et al., 2020;Xiao et al., 2014). Regarding the methodology used in the present investigation DLMs virtually no related publications have been found. Thus the interest and motivation to test this technique to evaluate behavior and reliability in the prediction of typical air traffic variables and their positive benefits (cited in the Introduction and more developed in the Methodology section).

Methodology and data
In any statistical application, a crucial and challenging step is to carefully specify the model. The first strategy is a static model, where the effect of time does not play a prominent role. For this research, Dynamic Models (DMs) have been chosen. Unlike static models, some elements that participate in the construction of the model do not remain invariable but are considered as functions of time describing temporal trajectories (Glynn et al., 2019;Laine, 2019;McAlinn & West, 2019;Pole, West, & Harrison, 2018).
DMs have the advantage of having "dynamics" in the model's parameters, thereby rendering the parameters not fixed but changing or dependent on time. Their main application is the analysis of time series. They also have the advantage of being useful to perform sequential analyses because the updating of parameters is carried out based on the data that has been obtained sequentially.
The development of forecasts is usually based on autoregressive models, moving averages or their combination. However, such models have a complicated verisimilitude function and due to this the final distribution of parameters inherit the same difficulty. Based on the aforementioned, DLMs, which are a particular case of DMs, are used for modeling time series in order to carry out forecasts by distributions of stochastic variables that influence observations in time. One of their advantages is that by using them one realizes that they are simpler models, yet powerful enough to adjust and forecast data and they may include explanatory variables in a simple way (Sargan & Bhargava, 1983;Ahn & Schmidt, 1995;Arellano & Bond, 1991;Arellano & Bover, 1995;Gelman et al., 2013;Kenkel, 2018). DLMs are defined under the following structure for each time t (Valencia & Correa, 2013;Bolstad, 2007;Glynn et al., 2019;Asparouhov, Hamaker, & Muthén, 2018 , ,t t t q t θ = θ θ … θ be the true q processes of interest. We express the DLM with the following equations: (2) where: F t is a matrix of a dynamic regression. G t is a matrix of a state. V t is a matrix of a observational variance, V t ~N(0,V t ) W t is a matrix of evolution,W t ~N(0,W t ) θ t is a vector of parameters.
In time 0 a priori distribution is postulated for 0 0 ( ) D θ where D 0 represents available information until time zero. West and Harrison (2006) where m 0 and C 0 are the vector of averages and the matrix of variances and covariances, respectively.
The observation equation defines the observational model for an answer Y t . and its relation with p covariables or explanatory variables F t . The first explanatory variable is generally a constant or intercept that represents the level of the series. As F t is univariate, then θ t is a vector of the form ( 0 1 It is possible to consider Y t as multivariate, in which case θ t . is a matrix of dimension m × p, m 0 it's a vector of zeros and C 0 it is the diagonal matrix of variances and covariances.
The system equation presents the evolution of the parameters in time. If the model includes p changing coefficients, it will result in the evolution to be defined as a transition matrix G t of dimension p × p, where p represents the number of covariates in each of the models.
Finally, DLMs present errors v t and w t with variances dependent on time V t y W t that denote the matrix of observational variance and the evolution of variance, respectively.
When dealing with temporal series, it is important to consider that, for DLMs, a source of variability that works for representing errors in the observation equation and in the system's equation is known as a vector of permanent effects. Even though it appears to have more limitations, some classical models of time series are presented as a particular case, especially ARMA models, for theoretical relationships of dynamic linear models with ARIMA models see Durbin and Koopman (2012), Tsui et al. (2014), Box et al. (2016) and Wei (2006). They are dealt with through Kalman's filter, when the error terms in the observation equation follow a normal distribution, they are independent and are distributed identically in average 0 and known variance.
For the proposed models, 5 chains of the MCMC (Markov Chain Monte Carlo) sampler were made, with 100,000 iterations each, discarding the first 10,000 first iterations, using the JAGS (Just Another Gibbs Sampler) methodology (Plummer, 2003). To determine the convergence of the chains, the Gelman-Rubin R-hat statistic (Gelman et al., 2013) will be used using the "coda" library of the R software (Plummer et al., 2006), which to present values below 1.1.
In order to dermine the strength in numerical terms of the proposed model, Mean Absolute Percentage Error (MAPE) will be used, which measures the size of error (absolute) in percentage terms. The fact that the magnitude of percentage error is estimated, it renders it an indicator frequently used by forecast developers due to its easy interpretation. A small MAPE value indicates that forecasts have a higher likelihood of being accurate (S. Kim & H. Kim, 2016;Ren & Glasure, 2009).
There is data available about air traffic in the airport of study (passengers, air cargo and operations or air movements) during the last four decades (1979-2017) (Aerocivil, 2019). Likewise, socioeconomic data is available for the city where the airpt is located (GDP, GDP/per capita, population, etc.) (DANE, 2019; Banco de la República de Colombia, 2019). According to the chosen variables as covariables, a medium-term forecast will be presented due to the changing economic conditions and their effects on air traffic. For that purpose, years 2018 to 2022 will be forecasted. To achieve such forecast, first forecasts should be carried out using ARIMA models (Brooks, 2008) on the covariables chosen in order to include these new variables in the selected model, thereby attempting to obtain a relatively low MAPE.

Application case
In Colombia, the aviation industry has been liberalized since the beginning of the 1990s and airfares are completely deregulated since 2012. Within the national context, the case of Bogotá-El Dorado International Airport (IATA code: BOG; OACI code: SKBO) has been selected. This is the main airport in the country and the main country hub, situated in the city of Bogotá (capital of Colombia, and with more than 8 million inhabitants), about 7.5 miles from the city center. The airport is a public property but it has been concessioned to the private sector since 2007 (Díaz Olariaga, 2017), a year when the airport developed its first significant expansion in infrastructure and facilities (with an investment of $650 million), which was finished in 2013. In 2015, a second expansion began which finished at the end of 2018. About 25,000 people work at the airport. Total traffic data (for 2017): 30M Pax, 690,000 Tn air cargo, 320,000 operations (or aircraft movements) (Aerocivil, 2019).

Results
Bayesian statistics present an ideal alternative to make models without the problem of updating data presented by classical statistics. One of the advantages is obtaining the new information, since a posteriori distribution can be updated and will be used as an a priori distribution, obtaining a new, more updated a posteriori distribution. This is a great advantage of Bayesian analysis because classical analysis requires everything to be recalculated as more data appears (Bolstad, 2007). In the calculation of the different forecasts for the auxiliary variables, ARIMA models were used and subsequently the DLMs, with their respective MAPE analysis, to choose the best model for each of the variables.
In the case of the variable "national (or domestic) passengers" Consumer Price Index (CPI) was used as an auxiliary variable to estimate the future forecast. Figure 1 shows the result.
In Figure 1 Model 1 is presented, where the behavior of estimated values for the chosen model is shown. These values overlap with the behavior of the original values. 1,08% MAPE can also be observed (see Table 1). To estimate the forecast, ARIMA (2,1,0) model was used in the variable CPI in order to carry out 5 years forecast and for it to be included in the variable of national passengers.
The final model has the following structure: In the case of the variable "international passengers" GDP, Population and Currency Exchange Rate (in Spanish TRM) were used as auxiliary variables to estimate the future forecast, thereby obtaining the results shown in Model 2 (see Figure 2).  In Figure 2 Model 2 is presented, where the behavior of estimated values for the model chosen is shown. These values overlap with the behavior of the original values. 0,97% MAPE can also be observed (see Table 2). To estimate the forecast, ARIMA (3, 1, 0) model was used in the variable GDP, ARIMA (1, 1, 0) model in the variable population, and ARIMA (2, 1, 0) model in the variable TRM in order to carry out 5 years forecast and for it to be included in the variable of international passengers.
The final model has the following structure: In the case of the variable "operations" (take-offs/ landings, where national and international operations are included), GDP per capita, Population and Currency Exchange Rate (in Spanish TRM) were used as auxiliary variables to estimate the future forecast, thereby obtaining the results shown in Model 3 (see Figure 3). In Figure 3 Model 3 is presented, where the behavior of estimated values for the chosen model is shown. These values overlap with the behavior of the original values. 0,24% MAPE can also be observed (see Table 3). To estimate the forecast, ARIMA (3, 1, 0) model was used in the variable GDP per capita, ARIMA (1, 1, 0) model in the variable population, and ARIMA (2, 1, 0) model in the variable TRM in order to carry out 5 years forecast and for it to be included in the variable of operations.
The final model has the following structure: In the case of the variable "national (or domestic) air cargo", GDP a per capita and population were used as auxiliary variables to estimate the future forecast, thereby obtaining the results shown in Model 4. In Figure 4 Model 3 is presented, where the behavior of estimated values for the chosen model is shown. These values overlap with the behavior of the original values. 0,42% MAPE can also be observed (see Table 4). To estimate the forecast, ARIMA (3, 1, 0) model was used in the variable per capita GDP and ARIMA (1, 1, 0) model in the variable population in order to carry out 5 years forecast and for it to be included in the variable national air cargo.
The final model has the following structure: 1 0 In the case of the variable "international air cargo", GDP and international trade (imports and exports) were used as auxiliary variables to estimate the future forecast, thereby obtaining the results shown in Model 5 (see Figure 5).
In Figure 5 Model 5 is presented, where the behavior of estimated values for the chosen model is shown. These values overlap with the behavior of the original values. 0,63% MAPE can also be observed (see Table 5). To estimate the forecast, ARIMA (3, 1, 0) model was used in the variable GDP, ARIMA(1, 1, 0) in the variable imports, and ARIMA (1, 1, 0) model in the variable exports in order to carry out 5 years forecast and for it to be included in the variable international air cargo.

Conclusions
Air traffic forecasts are essential in airport planning for determining future capacity requirements. Since airport infrastructure projects are expensive and involve many resources, an understanding based on data from future demand provides airport planners with the necessary information for effective decision-making in the short, medium and long term. Therefore, regardless of unforeseen circumstances, the airport industry requires forecasts to anticipate future scenarios. Considering the advantages of using DLMs in the forecast of time series, an initial description of variables was made, revealing a growing behavior as well as strong correlations in time with the covariables. Regarding the covariables presented in the models, an ARIMA model was used to carry out the future forecast and the values to be included in the model chosen. In order to test the convergences of the chains, the R-hat test was applied, which showed values in the test of R-hat < 1.1 in all the final models chosen. The result of the application of DLMs presents MAPE values below 1%, which ensures high predictability forecasts. Furthermore, it could be verified that when the model chosen is contrasted with models that compared the variable with delay t-1 (which is equivalent to AR(1) models), DLMs showed an acceptable performance as alternative models to develop reliable forecasts in air transport (or air traffic prognosis), at least in the medium term.
As seen in the results obtained, the DLMs showed excellent performance by giving a new, and to some extent, original alternative to develop reliable forecasts in air traffic (no investigations were found that use the DLMs for calculating forecasts in the field of air traffic). However, the present investigation focused on a medium-term prognosis (5 years), and with an important historical series (39 years), thus the next phase of research will be to test the performance and reliability of the method for calculating a forecast in a long-term period (20-25 years -usual requirement of airport planners) by using a similar historical series.