AN ALgORITHM FOR SOLVINg THE PROBLEM OF FORECASTINg

. In this article, the main forecasting methods are considered. A new algorithm based on the group method of data handling and artificial neural networks is presented. The algorithm was tested on real data and showed better results than neural networks. This shows its suitability for further use in forecasting aviation tasks.


Introduction
Forecasting has always been and will always be one of the most interesting themes to mankind because knowledge of the future is perhaps one of man's greatest desires, but we should understand the risk of choosing inappropriate methods of forecasting, since incorrect predictions can lead to making the wrong decisions.
Demand prediction is one of the most crucial issues of inventory management. Forecasts, which form the basis for the planning of inventory levels, are probably the biggest challenge in the repair and overhaul industry, and the one common problem facing airlines throughout the world is the need to know the short-term demand forecast for aircraft with the highest possible degree of accuracy. The high cost of modern aircraft and the expense of repairable spares such as aircraft engines and avionics constitute a large part of the total investment of many airline operators. The production of an insufficient amount of aircraft can lead to excessive downtime costs, while overproduction will result in downtime.
Forecasting techniques in aviation have grown more sophisticated over the years and are widely used in aviation nowadays. Owing to the sporadic nature of demand for aircraft, airline operators perceive difficulties in prediction and are still looking for superior forecasting methods.
The purpose of this paper is to develop a new forecasting method that takes into account and uses the increased power of modern computing. It is suggested that our algorithm may further be applicable to regional aviation facilities and other industrial sectors that have demand patterns similar to those of airlines.

The mathematical formulation of the problem of forecasting
Let us have n discrete samples { } 1 2 n x , x ,...,x at successive time points 1 2 n t ,t ,...,t . Then the problem of prediction ( Fig. 1) consists of predicting the value n k x + at some future point of time n k t + , where k is the duration of the forecast: where F is some unknown function.

Review of existing methods of forecasting
Here are the basic methods of forecasting. The method of moving average (Alesinskaya 2002 is the average sum of a number of previous values and some random component. Weighted moving average method (Alesinskaya 2002). The next step in the modification of the model is the assumption that the more recent values reflect the situation more accurately. Then, each value is assigned greater weight the later it was added.
Group method of data handling (GMDH) (Ivakhnenko 1968). The GMDH is a set of forecasting algorithms that is based on splitting the original data into two sets (training and testing), and the use of some kind of base functions, the parameters of which are derived from the training set. Verification of how well they model a given process is performed on the test set.
An artificial neural network (ANN) (Mak-Kalloc 1956). The main element of a neural network is a formal (artificial) neuron. It is a mathematical model of biological neural cells.
An ANN is a system of connected and interacting artificial neurons.
An ANN is not programmed in the usual sense of the word: they can be trained. During training, the neural network is able to detect complex relationships between input and output data and perform synthesis.
The ability of neural networks to forecast comes directly from their ability to generalise and find the hidden relationships between input and output data. After training, the network is able to predict the future value of a certain sequence on the basis of several previous values and/or any current factors.
The main advantage of an ANN over other methods of forecasting is that the network equally well predicts the processes whose regular components have any distribution law, whereas most other methods are best suited for processes that have a regular component that belongs to a specific class (obviously, the method of polynomial smoothing is best suited for processes with a polynomial regular component, the method of smoothing by Fourier series is best suited for processes with a periodic regular component, etc.). Another important advantage of neural networks is the ability to learn.
The application of artificial neural networks (namely time-lagged feed-forward neural networks) for prediction of passenger traffic flows was described by T. O. Blinova, and they showed reasonably good results (Blinova 2007). However, as will be shown in the application results of the proposed algorithm, its performance is far better than the performance of neural networks on their own. Another application of time-lagged feed-forward neural networks for aircraft engine overhaul demand forecasting was mentioned by P. Kozik and J. Sęp and again, neural networks showed good performance, which justifies their use for forecasting tasks (Kozik, Sęp 2012).
In this situation, a forecasting algorithm based on two common methods, ANN and GMDH, is offered.

An algorithm for solving the problem of forecasting
The proposed algorithm combines the advantages of both ANN and GMDH: a gradual increase in the complexity of the model (GMDH) and the ability of ANN to learn. It consists of the following steps: 1. Pre-processing of source data, including getting rid of outliers, data normalisation, etc. (Amir, Samir 1999;Jerome et al. 1994;Klevecka, Lelis 2008;Mohsen, Yazdan 2007). This stage is often more important than the modelling stage. Also you should take into account the specific characteristics of the forecasted process, such as seasonality of values in the case of predicting various atmospheric parameters, the trend component in the majority of financial processes, etc.
Study shows the importance of getting rid of unwanted outliers in the source data when using artificial neural networks (Jerome et al. 1994). It is important to determine whether the specific value is accidental and therefore an unwanted outlier or whether it is an informative outlier. To do this, you should consider the process you are forecasting. For example, if you want to build a model of energy consumption of some region depending on the day of the week, then it should be noted that energy consumption on weekends will be dramatically different from the consumption on weekdays, and consequently the value of consumption over the weekend will not constitute unwanted outliers. In this case, during the training of the network, it would be reasonable to include a binary value as an additional input showing whether this day is on the weekend or not.
If it is known that the process does not include informative outliers, there are several basic algorithms of getting rid of accidental outliers.
-The simplest algorithm is based on the characteristics of a stochastic variable, according to which an outlier is a value that deviates from the aver-age value by an amount greater than 2 ... 3 mean square deviation 2 σ . -The Tukey 53H algorithm consists of the construction of a smoothed sequence using a median filter and a moving average filter, and after its application all original values that deviate from the smoothed sequence more than a preassigned threshold k will be considered outliers (Klevecka, Lelis 2008). 2. With the use of data windowing, two matrices are formed from the source data where N is the number of samples obtained from the previous stage of processing: where k is the size of the window. With the help of these matrices, networks will be trained. Each row vector where m is a number of rows, of matrix ( ) 0 X and the corresponding value n y are an independent sample, and a column vector is a separate variable. 3. The resulting samples are divided according to some ratio (usually about 0.7:0.3 m) into training and testing sets.
4. The type of base function is determined by what variables they depend on, for example ( ) 6. Each perceptron is matched with a specific base function; namely the variables to be fed to the inputs of the network are selected (for example, for support func- , one MP will work with variables 2 2 1 2 1 2 1 2 x , x ,x * x ,x ,x and the other one will work with variables 2 2 1 k 1 k 1 k x , x ,x * x ,x ,x ), and are trained using only samples from the training set.
7. In this step, the initial data for the next iteration of the algorithm should be composed. For this, you should determine the mean square error (MSE) of each MP on the test set, select k best networks (you can also select a smaller amount, but add to them original variables that were inputs to networks with a small MSE), and then create a new matrix: where ij h is the output value of the j th neural network when it is given the inputs of the i th sample, i 1 m, j 1 k = =   (or the original variable). 8. The next iteration is performed, but as the source data a matrix, ( ) 1 X , is taken. Iterations are performed until the MSE value of networks on the test set decreases or until we reach the desired MSE.
9. During an each iteration of the algorithm, the weights and the structures of networks (and/or the original variables) that were selected to make the source matrix for the next iteration should be remembered. After reaching the required MSE (or reaching the iteration after which MSE starts increasing), we should stop performing the iterations and select the only network that will forecast the further values. For further forecasting of some value c x using the obtained results we should: -Create source sample { } ñ-k-1 ñ-k ñ-1 inp x ,x , ,x =  ; -Use this sample as the inputs for the MPs that were used to obtain the source matrix of the second iteration, and using the output values obtained (and/or the original variables), make a new input sample for the second iteration; -Repeat the previous step until the input sample for the iteration at which the algorithm was stopped is created, and then use it as the input for the selected MP, the output of which will be a forecast of the value c x .

Advantages and disadvantages of the algorithm
The advantage of this algorithm compared with the usual GMDH is that explicit specification of the basic functions is not required; the dependence will be found by the neural networks, which are known to perform this task very well. This algorithm is also devoid of some drawbacks of ANNs: first, during the construction of the network, its optimal difficulty is not known beforehand, and second, overly simple neural networks are usually prone to underfitting, while overly complex networks are prone to overfitting. The proposed algorithm uses simple networks at each iteration, but due to cascade complication it is able to forecast very complex processes.
The main drawback of the proposed method is its resource usage, caused by building some number of neural networks at each iteration, so it can take a lot of time. However, with the use of numerically optimised algorithms of neural network training and general algorithm optimisation, it is possible to reach relatively fast performance.

Application results of the proposed algorithm
To test the proposed algorithm, a public set of aircraft sales (total number of aircraft sold in the USA per year) from 1947 to 2011 was used (Accident Database … 2012). The data was pre-processed using the Tukey53H algorithm.
To realise the particular algorithm, the following parameters were used: -The size of a sliding window for obtaining matrices of initial samples: k 5 = ; -Ratio training set size/testing set size: 0.7m / 0.3 m , where m is the number of initial samples. For the construction of the training set, the first 0.7 m samples were used; -Basic function form: l 1 C =  , i 1 5,j 1 5,i j = = ≠   ; -Structure of neural network: single output neuron, one hidden layer with three neurons, five input neurons; -MSE for selection of networks was calculated on all initial samples, which means on testing and training sets; -For the construction of the source matrix for the next iteration, the outputs of three networks with the minimal MSE on training and testing sets were taken, along with two initial variables

Conclusions
The proposed algorithm has shown significantly better results in comparison with the artificial neural networks, a sign of its suitability for further usage in forecasting tasks.
A possible way to further improve the algorithm is to consider using different neural network structures (not only inputs) for each neural node. This will require determining how exactly to change the structure for each node, however. that were the inputs for the network that showed the smallest MSE. For comparison of the results of the application of the algorithm, a multilayer perceptron with single output neuron and one hidden layer having eight neurons and ten inputs was also built. MP, as well as the proposed algorithm, was trained only on the training set. The Levenberg-Marquardt algorithm was used as a training algorithm. Since it is known that ANNs are prone to getting stuck in local minimums of the network's cost function, about ten MPs were trained, and the perceptron with the minimal MSE was chosen.
Comparison of the results obtained with the use of ANN and the proposed algorithm is shown in figure 2. -actual data, -forecast The MSE values: ANN -0.0613, algorithm -0.0255.
From the figure, it can be noticed that the trained perceptron predicts the values of the testing set worse (the very last values), while the proposed algorithm forecasts the values of both training and testing sets equally well, indicating the more accurate model of the process. The algorithm also more correctly predicted extreme values, which is very important in demand forecasting.