TRIPLE PROBABILITY DENSITY DISTRIBUTION MODEL IN THE TASK OF AVIATION RISK ASSESSMENT

The probability of an airplane deviation from pre-planned trajectory is a core of aviation safety analysis. We propose to use a mixture of three probability density distribution functions it the task of aviation risk assessment. Proposed model takes into account the effect of navigation system error, flight technical error, and occurrence of rare events. Univariate Generalized Error Distribution is used as a basic component of distribution functions, that configures the error distribution model from the normal error distribution to double exponential distribution function. Statistical fitting of training sample by proposed Triple Univariate Generalized Error Distribution (TUGED) is supported by Maximum Likelihood Method. Optimal set of parameters is estimated by sequential approximation method with defined level of accuracy. The developed density model has been used in risk assessment of airplane lateral deviation from runway centreline during take-off and landing phases of flight. The efficiency of the developed model is approved by Chi-square, Akaike’s, and Bayes information criteria. The results of TUGED fitting indicate better performance in comparison with double probability density distribution model. The risk of airplane veering off the runway is considered as the probability of a rare event occurrence and is estimated as an area under the TUGED.


Introduction
Operation of the air transport system is performed in accordance with a certain safety level. The safety of air transportation depends on numerous factors whose action is taken into account by assessing the risks of their operation in the form of the frequency of their occurrence over a certain period of time. A special place in aviation safety is given to assessing the risk of mid-air collision (Mitici & Blom, 2018). Results of the collision risk assessment are used for the separation minima estimation in air traffic management. A risk of mid-air collision can be evaluated as a product of the probabilities of airplanes deviations from the preplanned trajectory of flight (Fujita, 2013). The risk of collision with the terrain at low altitudes can be considered as the probability of airplane proximity to a dangerous point of relief. Consideration of collision risks and events that can lead to unsafe flight (Fala & Marais, 2016) is an important step of airspace design and usage.
Performance of safety assessment depends on the numerous factors that are taken into account in the research model. Simple models of risk assessment based on the statistical processing of airplane trajectory data are commonly used. Also, analytical functions of probability distributions are used in the tasks of assessing the risks of rare events. In particular, the assessment of the risk of airplane veer off the boundaries of the runways can be considered as an assessment of an extremely rare event. The risk associated with rare events is practically impossible to assess by the frequency of occurrence due to the limited statistical data sample. Therefore, estimation of the probability of rare events is performed on the basis of statistical data processing taking assumptions concerning the function of random variables distribution. In this case, parameters estimation of the probability distribution function is performed. Probability of random value is evaluated as an area below probability density function (PDF) limited by the confidence band and reference frame. In the common tasks of risk analysis, a set of confidence bands is used to estimate probabilities of parameter deviation occurrence.
A mixture of two probability density functions is commonly used in the task of risk assessment of airplane deviation from a cleared flight level or pre-planned trajectory in both horizontal and vertical planes. For example, the mixed distribution model (MD) consisting of core distribution (CD) and tail distribution (TD) is considered in numerous studies (Mori, 2011;Nagaoka, 2008): where x is an airplane deviation from pre-planned trajectory, γ is the probability of a gross navigational error.
The CD represents errors derived from standard navigation system deviations. These errors depend on the onboard navigation system precision level and human factor. The TD corresponds to non-nominal performance and represents result of rare factors degradation influence. TD corresponds to rare events simulation.
The main problem of model (1) usage is a correct selection of α coefficient, which characterizes the CD and TD mixture level. Incorrect or inaccurate selection of α is the reason for a certain overestimation or underestimation of the risk evaluation by the PDF on the tails.
CD utilizes errors of on-board positioning system and flight technical error which is result of human factor influence. Performance of on-board positioning system has been increasing rapidly during last three decades. An accuracy improving is a result of development of advanced signal processing algorithms, new antenna design, introduction of numerous augmentation systems. Thus, an error of on-board positioning system becomes much smaller than flight technical error in manual mode. Different degrees of these errors make usage of one PDF in CD inaccurate. Thus, model (1) is not adequate in the case of imbalanced errors.
The main purpose of current research is to design a new model of risk assessment connected with airplane deviations which take into account imbalanced error contribution of positioning system and flight technical errors. Separation of PDFs in the risk model leads to improvement of risk assessment performance and makes the model adequate to current equipment level.

Triple probability density distribution model
In the case of estimating the airplane deviation from the planned trajectory by statistic, the CD characterizes the distribution of Total System Error (TSE), that is, the total error of maintaining a given trajectory. The TSE is a sum of Navigation System Error (NSE), Flight Technical Error (FTE), and Path Definition Error (PDE) (ICAO, 2006(ICAO, , 2008a: (2) In a general case, the NSE is determined by an error of airplane positioning and depends on the type of navigation system. FTE characterizes the ability of a pilot or an automatic piloting system to follow a predetermined flight trajectory. In case of manual control, the FTE includes errors of indication by flight instruments and data interpretation by pilot. PDE utilizes the difference between path defined in Flight Management System and path expected to be flown over the ground. Due to use of on-board computing systems with digital route mapping support, the impact of PDE is too small (compared to NSE and FTE values) that its value can be neglected (ICAO, 2008a). According to error model (2), the deviations from the pre-planned trajectory should be represented by decomposition of CD on the main components corresponding to the NSE and FTE in tasks of risk assessment. Thus, a proposed model includes a mixture of three PDFs ( Figure 1): where ρ NSE (х) is the PDF utilizing the errors of navigation system; ρ FTE (х) is the PDF characterizing the FTE; ρ T (х) is the PDF characterizing the appearance of rare events; α and β are weight coefficients of ρ NSE (х) and ρ FTE (х) in the mixture.
Global Navigation Satellite System (GNSS) can be considered as a primary on-board positioning system of airplane due to a high level of accuracy, availability, and continuity of measured data. Influence of some factors can degrade performance of GNSS and may lead to positioning lock. Therefore, inertial navigation system or positioning by navigational aids data can be used as a stand-by positioning systems . Wide-area multilateration system, primary and secondary surveillance radars are used for airplanes coordinates measurement for air traffic control purposes. Also, LIDAR data (Ryu & Young, 2016) or localization by image processing (Naidu & Durgarao, 2012) are used in some risk Figure 1. Components of triple probability density distribution model х Core Tail Tail center line analysis, but these approaches require specific sensors onboard or in the ground infrastructure.
Typical positioning system errors are described by a normal error distribution model with different parameters for lateral, longitudinal and vertical directions. GNSS error distribution model depends on time and user location, while radar error distribution model depends mostly on slant distance to airplane. Risk assessment of lateral deviations can be performed by a statistical analysis of trajectory data measured by on-board sensor. Position measurements by Global Positioning System (GPS) with Satellite-based (SBAS) or ground-based augmentation systems (GBAS) allow getting precise trajectory data in the terminal area at a level that satisfies the conditions for CAT 3 landing system (less than 1m). Performance of positioning by navigational aids does not reach level of GNSS in normal operational mode . However, GNSS error depends on a number of factors such as: geometry of location in space, state of ionosphere, interference of radio waves propagation, interference and unintended jamming of navigation signals (Kutsenko et al., 2018). Some of these factors are difficult to take into account because their actions are related to extremely rare events. For example, the state of the ionosphere depends on the amount of solar radiation and the state of the Earth's magnetic field. Also, simulation of the probability of unintentional jamming of GNSS signals can be considered in terms of fuzzy sets.
The NDF should be considered as FTE model in (2) (Cramer & Rodriguez, 2013). A comparative analysis of NSE and FTE shows that NSE values are smaller than FTE due to wide range of admissible deviations from flight path in lateral plane and high accuracy of trajectory data measured by GNSS.
The simulation of rare events ρ T (х) is usually performed by an EDF, that "moves up" the tails according to input statistics. We propose to use Univariate Generalized Error Distribution (UGED) as ρ NSE (х), ρ FTE (х) and ρ T (х) density functions to make model (3) universal due to high flexibility property. The domain of the UGED is x ϵ [−∞, ∞] and the function is defined by three parameters: defines the dispersion; and, b ϵ [0, ∞] controls the PDF shape. UGED can be represented in different forms (Giller, 2013;Ayebo & Kozubowski, 2003;Czyżycki, 2013). We use the follows: where a is a scale factor; b is a shape coefficient; μ is a mean value; Γ(b) is a Euler-gamma function.
In general form Triple UGED (TUGED) model can be represented by (3) with PDF in the form (4): In certain tasks, the PDF displacement in a horizontal plane can be neglected to make deviations in positive and negative directions of equal probabilities in order to improve computation performance. Then: Parameters in matrixes A and B, obtained after TUGED fitting to a series of data, evaluate each component of PDFs mixture in wide range of curves from NDF to DEDF. This property of TUGED makes its flexible to any input sensor data.

Model parameters estimation
The TUGED model in form (7) can be used in statistical analysis of lateral deviations from the pre-planned airplane trajectory. The TUGED is defined by a set of parameters estimated by statistics. Grouping of statistics by different properties (for example, under certain conditions of flight or requirements of area navigation) creates a series of airplane deviation models and gives a list of different probabilities assessment.
Different methods of mathematical statistics can be used to estimate the parameters of TUGED by experimental set of data. Maximum Likelihood Method (MLM) (Mori, 2011;Klein, 1980) or Least Squares Method (Markovsky & Huffel, 2007) can be applied. The choice of evaluation method depends on two basic factors: the statistical properties of a valid estimator and the computational complexity of the method. Analytical solution is impossible in many cases due to high complexity of the calculation. Moreover, the computation complexity grows with increasing the number of estimated parameters.
The least-squares method is more oriented on a linear model and is undesirable for nonlinear functions. The MLM is based on likelihood function estimation that provides probabilistic relation for parameters evaluation and confidence band data simultaneously. Also, MLM compares the ability of different models to fit the statistics. According to (ICAO, 1988), the main disadvantage of the MLM makes method less sensitive to the data structure within the tail in comparison with structure in the center of distribution. In order to improve MLM performance, tail data uses greater weight than in the central part. The mentioned above advantages make MLM reasonable to use in the task of TUGED fitting to input data.
We assume that X is a discrete random variable represented by n measured values x 1 , x 2 ,…, x n . A variable X is distributed randomly under TUGED (7) with unknown matrix θ = [α, β, A, B, M].
According to MLM, the likelihood function is the following: where f(x i , θ) is probability that x variable takes a value of x i ; i = [1,n]; n is a number of measurements; θ is a matrix of TUGED parameters. The likelihood function takes the maximum value at the point of optimal fitting with values of θ. The maximum value of function can be reached at the point of first order derivative from the likelihood function equal to zero: Since the functions L(x i , θ) and lnL(x i , θ) take a maximum at one value of θ, we localize the maximum function of lnL(x i , θ) to replace product operator by a sum in order to improve computation performance: Finally, we have the system of non-linear equations: The first order derivatives of (7) by parameters α, β, a 1 , a 2 , a 3 , b 1 , b 2   , , , , , exp 2  3  3  2  3  3  3  3 , , , , , 1 exp 2   3   3   3   3  3  3 3  3   1   3  3  3   3  3  3  3  3  3 , , , , , 1 exp 2 Substituting expressions (12 -19) in (11), we have a nonlinear system of eleven equations, that can be represented in the matrix form: where θ is a matrix of arguments, and F is a vector of functions.
The Newton's method of iterated local linearization is applied to find a solution of (20). The solution of nonlinear system of equations at i th iteration is: is an error matrix. Substituting (20) into (21), we have: (22) can be represented in the Taylor series expansion form: where W(θ i ) is a matrix of partial derivatives by estimated parameters: From (23), we have Substituting (25) into (21), we have (26) Matrix θ became closer to maximum of likelihood function in (26) at each iteration. Computation of (26) is iterative until a certain acceptable level of error will be achieved.
Since F(θ) has a multiple critical points, it is important to localize a search intervals for each of the parameter. Maximal and minimal values of search intervals for TUGED parameters are represented in Table 1. Since positioning errors make the greatest contribution into the measurement results, it is advisable to start search of α from the middle of the general set (α = 0.5). We choose starting value for β = 0.1 much less than α, which characterizes the effect of FTE. The initial value for a and b coefficients are chosen from the assumption about the normal error distribution. Also, we use μ = 0, which corresponds to the airplane motion at the preplanned path.

Risk of airplane lateral deviation
Risk in aviation safety can be estimated by different approaches. One of them is grounded on evaluation of expected number of incidents or accidents per airplane flying hour (ICAO, 1998). Another approach considers risk as a probability of undesirable event. In this case, risk of airplane lateral deviation from cleared trajectory can be represented as a probability of unplanned airplane deviation out the defined boundaries in horizontal plane. An area navigation requirements specify admissible level of performance for airplane positioning system (ICAO, 2008a), that can be used as a boundary level in risk assessment . In case of known PDF of airplane deviation, a risk can be estimated as an area under PDF limited to boundaries of area navigation requirements. A probability area is represented in Figure 2.
Risk of airplane deviation can be estimated by the next formula: where x L and x R are left and right boundaries; P{x R <x<x L } is a probability of random value x located out of interval [x L , x R ], R LR is a risk of airplane deviation out of interval [x L , x R ].

Risk of airplane lateral deviation during runway operation
Veer off the runway is one of the dangerous events in aviation that can be analyzed in terms of rare events. Let's make a risk level assessment of airplane deviation out the runway by TUGED model (7). As input data, we use statistics of airplane path on the runways of Purdue University Airport (LAF) during take-off and landing phases of flight. The majority of statistical data contains flight data from non-professional pilots with high values of flight technical error, which is a good example of imbalanced levels of FTE and NSE. Airplane navigation is supported by G1000 integrated flight instrument system with GPS receiver supporting Wide Area Augmentation System (WAAS). Statics includes results of airplane coordinates measurements represented as latitude and longitude data with 1 Hz rate. Table 2 shows the basic description of statistical data for 10/28 (6600×150 ft) and 5/23 (4225×100 ft) runways. A training sample includes 447 unique flights for 5/23 runway with (8229 data length) and only 36 flights for 10/28 runway (only 717 data length). Mean values indicate about left side deviation behavior during runway operations. Figure 3 shows a histogram of airplane deviations from the runway geometric center using statistical data of both 10/28 and 5/23 runways.
In statistical analysis it makes sense to consider a set of models that support some assumptions about the event.
The results of parameters estimation for the whole training sample (10/28 and 5/23) in the various TUGED options including comparison with DUGED are shown in Table 3. Mean values of each component of PDF are about zero, that indicate identical statistics for left and right sides deviations from the centerline of runway. Performance of statistical data fitting is shown in Table 4.
The validity of the estimated models can be compared by a set of coefficients: 1. Akaike's Information Criteria (AIC). The AIC coefficient indicates the quality of the fitting model by the training sample (Akaike, 1973): where N is a size of training sample; ρ(x) is a fitting model; k is a number of estimated parameters. 2. Bayes Information Criteria (BIC) takes into account the volume of training sample in the form of a fine in distribution parameters evaluation (Schwarz, 1978): where x is an observed data; n is a number of data points in x; k is a number of parameters estimated by the model. 3. Criterion c 2 (Pearson Criterion). c 2`c oefficient takes into account the divergence of the empirical and theoretical absolute frequencies over the histogram intervals: where r is a number of histogram intervals; m i is an absolute frequency at i th interval; p i is the theoretical probability of getting into the i th interval; n is the total number of experimental data.
4. The sum of the probability difference between the frequency of occurrence in the interval and the probability of occurrence evaluated by fitting function: where υ is a frequency; p is a probability; n is a number of intervals; N is a size of training sample. The quality of fitting a training sample to the PDF is better when the value of the coefficient is smaller.
Results of risk assessment and performance of approximation by MLM are represented in Table 4 for different models of PDF. Risk of airplane lateral deviation is estimated by (27) for boundaries in half of runway width in 100ft and 150ft according to input data.
Results of using TUGED model gives a better fitting performance in comparison with other models according to AIC, BIC, c 2 , and SPD criteria. However, TUGED requires more computation time due to fitting complexity.
Results of data fitting by TUGED in comparison with DUGED for different training samples are represented in Figures 4-6.  Normalized frequency of airplane deviation approves the assumption of triple component behavior of PDF. Part of core within small deviations from the centerline of runway (±5 ft) can be a result of airplane positioning error. Deviations within interval ± 30 ft utilize FTE. Fitting with TUGED makes possible to recognize FTE and rare events. Estimation of rare events with TUGED, such as airplane veering off the runway, is more appropriate in comparison with DUGED due to more adequate PDF used in part of rare events.

Conclusions
In the paper, we propose to use a triple probability density distribution function as a model of airplane deviations from preplanned trajectory in the tasks of aviation risk assessment. An airplane deviation is a result of navigation equipment error, flight technical error, and rare events. Affect of these factors is included in the triple probability density distribution function to develop a model adequate to acting errors. Usage of Univariate Generalized Error Distribution allows getting a sufficient level of flexibility from Normal to Double exponential distribution function evolution. Introduction of additional probability density distribution component to airplane deviation model is caused by different degrees of NSE and FTE.
As an example, we use TUGED in the statistical analysis of airplane lateral deviation during runway operations. Training statistical data of airplane deviations contains unbalanced errors caused by a significant difference between NSE and FTE in manual mode. Results of TUGED validation in the statistical analysis of airplane deviations from the center of the runway give the risks of airplane rolling out the runway of LAF airport in levels of 3.7×10 -5 (runway 5/23) and 1.2×10 -6 (runway 10/28). Efficiency of TUGED model usage for input data was proved by AIC, BIC, and Pearson criterions TUGED is a flexible and universal model that can be applied in case of different errors affecting a transport system. Thus, TUGED will give the same result with DUGED in case of the same level of FTE and NSE. TUGED model is recommended to use in different tasks of risk assessment at different flight phases with manual flight control mode.
Also, obtained results can be used for the further development of air traffic safety models and improvement of airplane operation within the runway.