The probability of dying in a plane crash or having a safe flight

In the article, an overall study involving a plane crash and the likelihood of having a safe flight is presented. The probability of dying in a plane crash and the probability of getting to the destination safely were calculated within the period 1970–2008. The formula for calculating the average length of a sequence of consecutive safe flights was determined. The probability of dying in a car accident and the probability of dying as a result of a plane crash were compared as a conclusion.


Introduction
Ensuring a high level of safety and reliability is one of the key issues in aviation. Of all accidents, aircraft crashes are one of the major topics of research. Flights are and will be attractive targets for terrorists (Hougham 2009). Also, many articles (Colebunders 2011;Oakes, Bor 2010;Van Gerwen et al. 1997) show that fear of flying is a common phenomenon. In the authors' opinion fear of flying is not equivalent to the probability of being harmed as a result of a plane crash this hypothesis was proved using statistical analysis.
In previous publications the issues of security procedures in an event of an aviation disaster (Hougham 2009), extracting knowledge from data in the field of aviation security (Nazeri et al. 2001), or fear of flying (Van Gerwen et al. 1997) were discussed. In recent years, there have been many publications on improving safety in aviation engineering (Krause 2003;Shyur 2008;Sweet 2009). NASA has launched the "Aviation Safety & Security Program", in which they publish a number of reports on aviation security (NASA 2013). An extensive study of aviation accidents can be found in (Krause 2003). However, the authors did not come across a study on the likelihood of dying in a plane crash. In this publication the probability and its variation over 38 years are studied.
Reliability is the ability of an object to fulfill the required function in a specific environment and operating conditions in a given period of time. The reliability of an object is determined by a reliability function R(t). This is the probability that the object does not fail in an interval (0; t). It is equivalent to the probability that a random variable Ta time that elapses from the start of the operation of the object until its damage -does not gain a value from the interval [0; t); therefore, R(t) = P (T ≥ t).
Publications in the field of aviation generally examine the reliability of plane components and materials (Al-Garni et al. 1997;Huang et al. 2012;Jun, Huibin 2012). These works are important for aviation engineers, pilots, air traffic controllers and other people directly connected with aviation. In order to minimize disasters, related to the unreliability of aircraft components, special training centers are created such as the East Midlands Training Center. Pilots undergo theoretical training regarding the main elements of an airplane and spend a few days on practical training in a simulator. The simulator allows simulating failure of any combination of items and flights in any conditions. Possible phenomena, such as turbulence and smoke in the plane, can also be simulated. Specialists of such training centers are trying to minimize the impact of human factor in a crash. Procedures are devised which the pilots have to know and have to repeat many times. The most important element of the training and later practice is the safety procedures. The pilots must repeat the safety procedures before starting up every plane. It is an assurance that the pilot will behave automatically in an emergency situation, which increases the likelihood of a safe flight.
From the passengers' point of view it is not relevant whether or not plane components work properly. The passengers are interested in a safe, timely and comfortable journey to their destination. On this basis the authors have specified two key safety indicators. The criterion was the safety of the passengers: -The probability of completing at least n flights without any incident for a passenger. -The probability of completing at least n flights without a death of a passenger. In this paper an analysis was carried out in order to determine the probability of completing at least n flights without incurring a death of a passenger. This indicator was chosen for the analysis because passengers mostly have a fear of death in a plane crash (Van Gerwen et al. 1997). In the following part of the paper the concept "safe flight" is defined as a flight without a death of a given passenger.
The main goal of many publications is to raise the level of safety in aviation. The hypothesis is that the safety in aviation increases each year. The motivation was a widespread opinion that the thesis is correct but has hardly any statistical proof in publications. The authors used frequency analysis of fatal accidents in aviation to prove the thesis.

Passenger-plane system
In the article, an examination of the reliability of the passenger-plane system is performed using a similar pattern as for the analysis of the human-machine system. The goal, which is to met by the passenger-plane system, is undamaged passenger transport by aircraft. The system is defined as working properly, if within a given flight a given passenger flies safely. The system is defined as faulty, if within a given flight a given passenger dies.
In the passenger-plane system time points are defined. Consecutive time points represent subsequent flights taken by a passenger. Time is measured in successive natural numbers.
The system uses the concept of an elementary event. It is defined as the possibility of a given passenger to take at least n flights without the passenger's death. The elementary event was denoted as A n , where n is the number of held flights. The space of the elementary events is, therefore, the sum of all elementary events. The designation P(A n ) is defined as the probability of holding at least n flights with a given passenger without the passenger's death.
On the basis of the data on the number of passengers using airlines and the number of deceased people due to aircraft accidents, we studied how probability function P(A n ) changed over time. For this purpose, a sample set was defined as a set of the number of passengers and the number of deaths as a result of air accidents from 1970 to 2008 for each year. The time range has been established according to the availability of complete source materials for the period (Bureau… 2012; International… 2008; Kilroy 2013; PlaneCrashInfo 2013).

Obtaining the sample data
The data of air crashes and flights was collected from two sources: -The archives of the ACRO organization (data about air disasters, with the exception of the events of 11.09.2001) (Bureau… 2012). -The data from the International Civil Aviation Organization (International… 2008). These sources were selected because of the completeness of the data. ACRO (Aircraft Crashes Record Office) is an organization dedicated to archiving aviation accidents since 1918. ICAO (International Civil Aviation Organization) is an organization dedicated to archiving the statistical data on civil aviation since 1970. Other sources (Kilroy 2013;PlaneCrashInfo 2013) contain data from a shorter period of time or are incomplete.
Data collected from these sources us not saved in a format that is easy to reuse. It has to be read from the web pages. Due to the large amount of data we decided to automate the process of collecting and processing using a dedicated author's application.
The data from the sources (International… 2008; Bureau… 2012) includes information from the period: 1918-2013 -ACRO and from 1970 to 2008 -ICAO. In order to analyze the probability of holding at least n flights without a fatal accident for a passenger, the data on accidents and the number of passengers in a given year are necessary. The period for analysis was chosen to be 1970-2008, as the data from this period was complete.
Sources (International… 2008;Bureau… 2012) are the websites where the organizations summarize the results of their work. From the ACRO archives we obtained the total number of deaths in a given year. The archives of ICAO contained information about the number of passengers carried. This data was divided into groups according to countries. Data downloaded for each country has been counted to calculate the sum of all passengers carried during the year.
To download the data PHP programming language was used, along with a library Simple html DOM (Chen 2012). The websites should be written in standard-compliant HTML so that it would be possible to parse the data using an XML parser. In contrast to other document formats, web pages are often written incorrectly. For example, the ACRO website (Bureau… 2012), according to the W3C Validator (W3C 2013) contains 1210 errors, while the ICAO website (International… 2008) contains 56 errors. Therefore, it is necessary to use tools that enable processing of the source data, despite errors. The Simple Html DOM library supports invalid HTML, and in known cases carries out correct parsing of invalid documents.

The probability function of carrying out n safe flights
We defined A i ′ as an event opposite to event A i (defined in section 2). A i ′ is the event of having i flights, of which at least one was fatal to a given passenger. A 1 is an event of having one flight that was fatal to a given passenger. In order to determine P(A i ) we first obtained the estimation of P(A′ 1 ) in successive years from 1970. For this purpose, we calculated what is the part of passengers that had a fatal accident in relation to all passengers carried during the year. The data is illustrated in Figure 1.
The x-axis represents individual years, while the y-axis denotes the ratio of the number of people who died as a result of an air accident to the total number of passengers, expressed in percent and marked by R. Variable R in each period is indicated by squares. In Figure 1 there is also a B-spline curve of the 10th degree (Chen 2010).
The graph in Figure 1 shows that function R(t) is decreasing. This provides support to the hypothesis that the estimated probability of safe flights is increasing. According to graph, we can estimate the value of likelihood of having one flight without the death of a given passenger each year, denoted by P(A 1 ): Using equation (1) and Figure 1, we estimated, that in 1972 the P(A 1 ) was equal to approximately 99.9991% while in 2000 it was equal to nearly 99.9999%.
As a safe flight for a passenger we understand a flight during which the passenger survived. Let's denote by B i the event of i-th flight being safe for a given passenger. {B i } is a sequence of independent repetitions of the same trial. The probability of carrying out n safe flights for a given passenger, denoted as P(A n ) is obtained the following equation: It must be noted, that P(B 1 ) = P(B 2 ) = … = P(B n ) and that A 1 = B 1 . Taking this into consideration for equation (2), we get: (3) Based on equations (1) and (3) and the data illustrated in Figure 1 the estimation of the function P(A n ) for a given period was calculated. Graphs of functions P(A n ) are illustrated in Figure 2. The x axis represents the number of flights, denoted by n, while the y axis represents the probability of carrying out n safe flights for a given passenger, which is denoted by P(A n ). The graph was drawn on the basis of the data from 1972, 1992, 2008 years and from the calculations of P(A n ) for these years. The range of n was fixed at a [0, 210.000] interval.
From the results shown in Figure 2 it can be concluded that the probability of death in a consequence of air accident has been decreasing in consecutive years. It is worth mentioning that in year 2008 one should take approximately 200000 flights in order to reach a 10% probability of death in air accident.
Let C n be an event of carrying out exactly n -1 safe flights for a given passenger. Then C n is an intersection of two events: A n-1 of carrying out at least n -1 safe flights and B n ′, that the n-th flight will end with a passenger's death. Using equations P(B 1 ′) = P(B 2 ′) = … = P(B n ′) and A 1 ′ = B 1 ′ we obtain that: Let us denote X(C i ) as: X is a random variable assigned to an event of carrying out exactly i -1 safe flights for a given passenger number i. Using definition (5) and equation (4) a formula for the average length of safe flights for a given passenger is obtained. It is the expected value of random variable X. It is assumed that the probability of event A 1 is obtained before. The expected value of random variable X is equal to: Substituting into equation (6) the formula obtained in equations (3) and (1) 1972, 1992 and 2008 Next the following transformations in order to calculate the sum of the above series are made: In order to simplify the notation this denotation was introduced: Also, the sum of the series below was calculated by: When we equate the expressions obtained in the equations (7) and (8) and make the following transformations we see that: When we substitute variable y from (9) into equation (11) and use equation (10) Using equation (7) and the definitions of y from (9) and from (12) we get that: From equation (13) the average length of the sequence of safe flights for a given passenger can be calculated. For the data from 2008, where P(A 1 ') = 0.0000005 the average length of the sequence of safe flights was equal to 2000000, while in 1982, where P(A 1 ') = 0.000003 the average length of the sequence of safe flights was equal to 333333.

Conclusions
In the article the analysis of the works on reliability in the field of aviation is presented. To the knowledge of the author's there are no papers describing the probability of carrying out n safe flights. We introduced three groups of elementary events. By A i we denoted the event of carrying out at least i safe flights for a given passenger. By B i we denoted the event of carrying out the i-th flight safely for a given passenger. By C i we denoted the event of carrying out exactly i -1 safe flight by a given passenger.
On the basis of the source materials from ACRO and ICAO we calculated the percentage of people who died in the air accidents relatively to all carried passengers during the year. This quantity was denoted by R. The data was illustrated in the Figure 1. On the basis of the results presented in Figure 1 it was evident that R is a decreasing function from which it was concluded that the safety of air travels has been increasing in the period 1970-2008. It has also been found that in year 2008 the probability of taking a flight during which a given passenger suffers death, is equal to 0.0000005. The authors compared this data with the data from NHTSA the National Highway Traffic Safety Agency for the year 2008 (National… 2008). On the basis of the NHTSA data and the result of this work, the authors noticed that driving about 40 miles with a car results in the same probability of a death accident as one flight does.
In the paper the authors also devised formula (13) to calculate the average length of the sequence of safe flights for a given passenger. From the formula the authors calculated the average length of the sequence of safe flights on the basis of data from the year 2008.