Measuring Efficiency in Transport: The State of the Art of Applying Data Envelopment Analysis

Data Envelopment Analysis (DEA) is a non-parametric linear programming method used for determining the efficiency of a set of companies as compared to the best practice frontier. It can be employed to analyze private, public or non-profit organizations. The application of the method in the transport sector is wide-spread, especially in the evaluation of airports, ports, railways and urban transport companies. The present paper is aimed at giving a review of how DEA is applied in the transport sector with a special emphasis on the inputs and outputs selected in the DEA models employed in different fields. For this reason, the author has compiled data from 69 DEA applications found or reported in literature, investigated their characteristics and the field of applying them along with the inputs and outputs used.


Introduction
Determining the efficiency of transport structures or companies has always been a challenging task due to the multitude of features that characterize these systems. Several approaches can be found in literature aspiring to solve this problem. Total Factor Productivity (TFP) (Oum, Yu 1995), the endogenous weight TFP method (Yoshida, Fujimoto 2004), the Malmquist Productivity Index (Odeck 2008), stochastic frontier analysis (Good et al. 1993) or even multiple linear regression (Pina, Torres 2001) are all good examples. The theoretical models and initial applications of transport efficiency analyses are available in the fields of cost, performance management (Bokor 2009), environmental pollution and energy consumption (Tánczos, Török 2007). This paper, however, deals with the application of Data Envelopment Analysis (DEA), a linear programming method of long standing which has been widely applied parallel to and also separately from these methods for the efficiency evaluation of companies involved in the transport sector.
The reason behind the choice is the strong theoretical background of the method that has proved its merits over the decades since it was introduced by (Farrel 1957) and (Charnes et al. 1978). This is also a method that has been used for the evaluation of several transport modes and thus makes a comparison between them feasible. The review of literature reveals that a comparison of DEA application in different transport modes has not been carried out yet although the use of a broader perspective could contribute to raising the quality of individual studies.
The rest of the paper is structured as follows: Chapter 2 follows the introductory section and gives a brief overview of the DEA method highlighting its strengths and weaknesses and the main features of application; Chapter 3 contains the comparisons of 69 transport related DEA studies found in literature; Chapter 4 discusses the choice of inputs and outputs for different transport modes; Chapter 5 makes conclusions.

The DEA Approach
Data envelopment analysis (DEA) is a tool for evaluating the performance of different companies, organizations or even persons, i. e. decision making units (DMUs) that convert multiple inputs into multiple outputs. It is a method with a background in operational research supported by IT solutions (Tibenszkyne 2007) the strength of which, as compared to linear regression, lies in the fact that it does not relate the efficiency of these units to the average but to the best practice frontier created from the performance of the most efficient units. A further advantage is that it allows using multiple inputs and outputs and does not even require the conversion of these to the same dimension. Nor does it necessitate a priori knowledge of a production function or information on prices, although a priori knowledge -if present -can be incorporated in the model. However, the drawbacks of this method should not be forgotten either: it is true that outliers may influence the results and that efficiency scores are relative to the study sample; thus, enlarging the sample might alter efficiency scores. Both of these problems can be overcome: first, by excluding the outliers by preliminary investigation, and second, by conducting sensitivity analysis. The third problem of the method is its sensibility to measurement errors and noise in data; however, this can also be easily surmounted by joining statistical regression and DEA in a two-stage process (Odeck 2008).
The basis of data envelopment analysis has been laid down by (Farrel 1957) who claimed a DMU technically efficient when no waste could be eliminated without worsening any input and output. It was his model that was further developed by (Charnes et al. 1978) to yield the CCR DEA model (named after the initials of the authors) which has been the starting point of each DEA study until the present day.
The CCR DEA model can be described as follows (Cooper et al. 2004): let us assume that there are n DMUs to be evaluated. Each DMU consumes m different inputs and produces s different outputs. Thus, e.g. DMU j consumes x ij of input i, and produces y rj of output r. We also assume that: x ij ≥ 0, y rj ≥ 0, and for each DMU, there is at least one positive input and one positive output.
From these, the ratio of outputs to inputs is used to measure relative efficiency DMU j = DMU 0 , DMU to be evaluated relative to the ratio of all j = 1, 2,..., n DMU j s.
Thus, the function to be maximised is: where: u r , v i are weights; y r0 , x i0 are the observed input/ output values of DMU 0 (DMU to be evaluated). We introduce the following constraints so as to give a limit to the values: and u r , v i ≥ 0. Using the Charnes-Cooper transformation, this leads us to the following equivalent linear programming problem: This formula is also called the 'Farrel model' as it was created by Farrel. However, he did not apply the dual theorem of linear programming (by virtue of which z*= θ • , and either problem can be solved) and hence was not able to make the connection between the models introduced above.
Formulae (4) is also called the 'strong disposal' or 'weak efficiency' model as it ignores non-zero slacks. Should we want to take them also into account, we have to use the following modified model that is also called the envelopment model: , , 0 where: ε is a non-Archimedean element defined to be smaller than any positive real number. The dual linear program of this model, also known as the multiplier model, is: Using these formulae, a DMU 0 is efficient if and only if θ* = 1 and s i -* =s r +* = 0 for all i, r, and it is weakly efficient if θ* = 1 and s i -* ≠ 0 and/or s r +* ≠ 0 for some i and r in some alternate optima (Cooper et al. 2004.). Formulae (5) and (6) represent the input-oriented DEA CCR models (envelopment and multiplier form). The output oriented model is also very similar and makes difference in the values to be maximized/minimized.
The DEA BCC (Banker et al. 1984) model incorporates an additional constraint: which enables to take into account non-constant returns to scale. Certainly, the years since the introduction of DEA CCR have seen the advent of a wide variety in auxiliary methods and modifications to the original model, all altering the method from a different aspect so as to bring about further improvement. As these are not the main topic of this paper and there are space constraints, these cannot be discussed in our case, nevertheless, a short description of those especially present in the studies relating to transport can be found at the end of the next chapter.

Comparison of the DEA Studies Related to Transport
When reviewing found literature on the application of DEA, a very broad and colourful picture emerges. In this appraisal, 69 studies on the method have been gathered in order to enable the examination of data, supporting methods as well as the chosen inputs and outputs. These studies were elaborated either directly in the papers listed in the references, or were reported in the same articles (Azadeh et al. 2008;Barros 2008;Barros, Peypoch 2009;Bazargan, Vasigh 2003;Cullinane, Wang 2005;Hamdan, Rogers 2008;Jitsuzumi, Nakamura 2010;Karlaftis 2004;Martin, Román 2001;Odeck 2006;Pacheco, Fernandes 2003;Sampaio et al. 2008;Tongzon 2001;Wu, Goh 2010, etc.). We have to be aware that not always were all the data available to work with: sometimes the chosen inputs or outputs were not mentioned, or the reference to the place of application was missing. Nevertheless, for most of the studies the data needed were accessible, and surely, only the available information is included in this paper for further investigation. Fig. 1 shows the distribution of the studies among different transport modes and clearly indicates that the majority of the studies deal with airports and ports; these two represent more than 50% of the studies. Public transport and railway companies also have a significant share while airlines are only mentioned in 4 studies.
When talking of public transport companies, we have to be aware that this is not a homogenous group. Urban just as well as rural companies, or the blend of the two have been investigated applying DEA; in some cases, these were companies operating buses only, whereas in other cases, there was a mix of fleet (bus, underground) present. However, this is not a factor which prevents us from comparing these studies as the methodology and their chosen inputs/outputs were remarkably similar. Even the mixed fleet did not pose a problem for the application of DEA, since the 'number of equivalent vehicles' could homogenize the fleet from the point of view of the selected input. Fig. 2 shows the distribution of DEA studies among different continents showing the share of different transport modes. It is evident that the most DEA applications can be found in Europe, Asia (the Near East and Japan having an important role) and North-America. Although it is not indicated in the figure, 11 out of 12 studies coming from North-America originate from the United States of America.
Looking at the share of different transport modes, we find that the majority of the studies in Europe deal with airports and public transport while in Asia port efficiency is investigated the most extensively. Although curiosity about railway efficiency is nearly evenly distributed between these two continents, it has a smaller share. Fig. 3 presents the frequency of the number of DMUs in the samples of the investigated DEAs. Three outliers in the range above 70 have been excluded from the data so as to make graphical representation simpler. Nonetheless, these outliers also reveal an interesting phenomenon to be observed in choosing the number of DMUs. All three outliers (each with DMU number above 150) belong to the studies in the public transport area. This shows that the researchers of this mode of transport have a much larger provisional data set to choose from and are less limited by a lack of data. Fig. 3 also indicates that the number of DMUs in DEA applications cluster around thirty (the average is 29.22), and the huge majority is between 15 and 40 (see descriptive statistics in Table). On the one hand, this can be explained by the data available (we shall remember that DMUs in the transport sector would for instance be airports, ports or railway companies) and on the other hand by the fact that there is a desired correlation between the number of inputs/outputs and the number of DMUs. As a thumb rule, the number of observations should be three times greater than the number of inputs plus outputs; and the number of DMUs should be equal or larger than the product of the number of inputs and outputs. Before proceeding to the analysis of the inputs and outputs used in the studies, we provide a short description of the modifications and supplementing methods of DEA that seem to be the most significant at present. The methods applied in DEA studies for investigation are first of all the input or output oriented DEA CCR and BCC methods, enabling the examination of technical efficiency and applying BCC, the variable returns to scale. From the review of the studies it seems that it is mainly output orientation that is preferred for the evaluation of airports and ports. This is rather reasonable as these dispose of facilities (e.g. runways, terminal buildings, terminal area of ports) which are difficult and/or very expensive to extend, and as such, most of the inputs chosen for DEA would be hard to alter. For the evaluation of public transport organizations and railway companies, input orientation can also be a viable choice (and indeed it is observed among DEA studies) as they dispose of more inputs (e.g. the number of vehicles) that can be flexibly changed.
The calculation of allocative or overall efficiency next to technical efficiency can be observed in several DEA studies independently of transport mode.
The very important development of recent years is the Simar-Wilson method applied in more and more studies to bootstrap DEA scores with truncated regression and found to be more adequate in describing efficiency scores rather than Tobit or alternative bootstrap procedures (Barros, Dieke 2008). Barros and Dieke (2008), Von Hirschhausen and Cullmann (2010), Hung et al. (2010) clearly present the applied method.
The use of the super-efficient DEA model (also known as the A&P DEA model, named after the authors - Andersen, Petersen (1993)) is also gaining place as it enables to define a rank of all DMUs (we have to bear in mind that traditional DEA selects the most efficient units by allocating the efficiency score of 1 to all of them and does not fully rank them). For application, see for example (Adler, Berechman 2001).
Although not very widespread but very promising, the multi-activity network DEA (MNDEA) model is also worth to be mentioned, as a methodology that seems to be tailor-made for the evaluation of transport systems and makes it possible to separate transport efficiency from transport effectiveness. The former means the creation of transport opportunities (e.g. expressed by the number of seats available at a given flight) while the latter takes into account load factor (e.g. the number of seats sold). MNDEA enables us to investigate deeper connections between different parts of transport service. Lin (2008), andYu (2008) apply MNDEA to the railway sector while Yu (2010) shows an example of applying it to the airports.

Comparison of the Inputs and Outputs Used
As already mentioned, the number of inputs and outputs chosen for DEA are quite restricted.
First, one has to adhere to the thumb rule which demands that the number of observations be three times the number of inputs plus outputs, and the number of DMUs be equal or larger than the product of the number of inputs and outputs. Given a dataset, sometimes the authors are forced to choose less inputs and/or outputs than desired. Then, there is also the tendency that the more inputs/outputs are included, the more DMUs prove to be efficient (Bunkoczi, Pitlik 2009). Fig. 4 shows the frequency of the number of the studies opting for a given number of inputs, while Fig. 5 presents the same for outputs. It is clear that the number of inputs cluster around 3 and 4, whereas the number of outputs tends to be 1 or 2. This means that in most cases, 3 or 4 inputs are used (theoretically covering traditional labour, capital and energy inputs as highlighted in (Sharma, Yu 2010)) to produce 1 or 2 outputs. This coverage is only theoretical as seen in the detailed analysis of inputs and outputs. The same tendency is more or less valid when looking at the number of outputs distributed among transport modes (Fig. 6), although 'airports' show more even distribution. Regarding the number of inputs, both 'airports' and 'ports' show more even distribution (Fig. 7).
Finally, let us examine inputs and outputs themselves. The appendix contains all the inputs and outputs gathered from the studies where they were selected for the evaluation of airports, ports, public transport companies, railways and airlines. It is not unexpected that the more studies are available for a given transport mode, the wider is the variety of the chosen inputs and outputs.
Labour (as the number of employees or the cost of labour) is input omnipresent in the studies and some sort of measurement unit is also nearly always vindicated for capital. However, energy consumption as input is only applied in the evaluation of public transport companies, although it could also be employed in the DEAs of airlines and railways.
A new category, 'facilities' , has been introduced in the classification of inputs (even though it can be regarded as a part of 'capital' inputs) because the factors listed here seem to constitute a vital and integral part of inputs, especially for airports and ports. However, it is mildly surprising that there was only one study that employed more technical inputs like a 'dummy z variable for slot coordinated airports' and a 'dummy z variable for time restrictions' (Pels et al. 2003), although technical facilities at an airport (e.g. availability of ILS) or the level of air traffic control can significantly contribute to the results of an airport.
Regarding outputs, they could be ordered into two main categories: operational and fiscal outputs. Operational outputs are the measurement units created from the physical movement of vehicles or passengers and cargo while fiscal outputs are the ones that can be expressed in some monetary unit. The listings in the appendix also indicate that generally the number of inputs is higher and maximum one or two outputs are chosen for DEA study.

Conclusions
The examination of the studies dealing with DEA has revealed that data envelopment analysis is widely applied for the evaluation of companies in the transport sector. The biggest majority of DEA studies cover air- ports, ports, public transport companies and railways, using principally the DEA BCC and/or CCR method. The number of DMUs investigated in the studies cluster around 29 with a deviation of nearly 14. The applied inputs are predominantly chosen from the areas of labour and capital (including 'facilities') and there are 3 or 4 of them. The number of outputs is mostly 1 or 2 and the outputs usually describe operational and/or fiscal characteristics. An extensive number of studies and a huge variety in the nuances of application show that DEA can be successfully employed for the assessment of decision making units in the transport sector.