A FUNCTIONAL ROAD CLASSIFICATION WITH DATA MINING TECHNIQUES

. The current international road standards, in order to give organization and safety, promote the classification of roads according to their technical and functional characteristics beyond their administrative membership, but the procedures are yet strongly based on the expertise’s judgment. In fact, although this activity has a great importance for the consequences that produces in terms of responsibility and allocation of economic resources, it is solely based on the quantification of some variables without specifying methods or analytical procedures. In this paper, after an instrumental survey of the road environment, we applied data mining techniques that consider the ‘vagueness’ of the analysed scenario. The type of algorithms used, therefore, permits to quantify a degree of membership (among 0 and 1) of a road to the groupings provided and to prepare any corrective action in order to direct the final result towards a specific class with greater precision. In addition, this method is very flexible and willing to contain new variables or observations at different times with great easiness. Moreover, the geographical location of the individual observations, as it was done also in this research, can be transferred to a GIS system, with a positive impact on maintenance programs.


Introduction
The functional classification of the roads consists of grouping them into homogeneous classes depending on the type of service they provide within a defined geographical area and within a road network. It takes a major concern not only with regard to the creation of new infrastructures, but also to promote functional retrieval and/or the rehabilitation of existing routes instead of expensive new designs.
The international road standards, given the importance of the issue, lead analyst towards specific procedures, very similar from country to country. Usually, in fact, it is necessary to identify certain factors as the following: -the type of movement (transit, distribution, penetration, access); -the extent of travel; -the assumed function in the environmental context; -the allowed components of traffic. One of the objectives of the road standards is to identify the various functional levels (primary -connectors, main -manifolds, secondary, local) in the network and match them to the roads, divided into several class-es. In the Italian case, these are the following six groups: -A -highways; -B -main roads; -C -secondary roads; -D -urban roads; -E -urban district roads; -F -local roads.
In other countries the division into groups is very similar: -in the United States and Canada -we find Freeways, Arterials, Collector and Local roads; -in the UK -Motorway; Primary A-road; Nonprimary A-road, B-road, C-road; Unclassified roads; -in the France -Autoroutes, Route Nationale, Départmentales Routes, Routes Communales. The selection of the class (from A to F, according to the Italian standard) derives from a complex process which comprises surveys of the road context and the following analysis of the collected data: this process, however, can also terminate with a judgment of un-classification, with the consequent necessity of adjustments in the road itself and the need to determine safety measures to be applied during the transition pe-riod. Although there are slight differences among the different countries (AASHTO 2011;Design Manual for Roads… 2002, FHWA 2004Lamm et al. 1999;Italiano Strada Standard 2001;Bosurgi et al. 2011), the classes and performed functions always depend on the demand for drivers' mobility, safety conditions, territorial development expected, availability of different transport modes and environmental protection of the examined area (Kaptein, Claessens 1998;Jaarsma 1997;Giummarra 2003;Chen et al. 2008). Among the most complex elements to evaluate we have the traffic generating sites (i.e. the presence of administrative, cultural, economic, commercial, industrial centers) to which is assigned a hierarchy usually depending only on the level of experience of the analyst.
The functional classification is, therefore, one of the tasks of the designer and it is essential in the case of existing roads. A misuse of the road, compared with its proper characteristics, in fact, worsen the users' safety (Kockelman 2001;Karlaftis, Golias 2002;Forkenbrock, Foster 1997;Dickerson et al. 2000;Nowakowska 2010;Kashani, Mohaymany 2011) and most usually leads to an increase in social costs and possible inefficiencies in the programming of maintenance (Sohn, Lee 2003;Cafiso et al. 2011).
This activity is deceptively simple, because a road mutates easily its geometric and functional characteristics along its length, making it necessary to properly investigate the importance of the collected data. In this regard, the modern digital instruments allow us to perform measurements almost continuously, managing in an optimal way the sampling frequency with relatively low computational costs and ensuring a good reliability of the collected data. However, at the end of the onsite survey, it is possible that the equipment gives us a database with redundant or hidden data, so that useful information can be obtained only after appropriate filtering operations (Pellegrino 2011(Pellegrino , 2012a(Pellegrino , 2012b. It is necessary, therefore, to organize a procedure capable of acquiring knowledge from the detected data base, presumably from different sources (GPS, GPR, Speed Trap, image analysis, GIS, etc.) -in order to take into account the high degree of uncertainty of its variables (often correlated among them) -and reasonably quickly (Kuehnle, Burghout 1998;Mena 2003;Manduchi et al. 2005;Kang, Scott 2008;Paclík et al. 2000;Praticò, Giunta 2011). Finally, the evaluation of the collected data must be carried out using techniques that limit the analyst's role, so far predominant in the procedures that are suggested by the regulations (D' Andrea, Pellegrino 2012).
We can find the means to resolve these issues by using soft computing or artificial intelligence techniques such as fuzzy logic, neural networks and genetic algorithms (Dorsey, Coovert 2003;Cafiso et al. 2004). The soft computing approach, of course, is not always preferable to other methods, but it produces more realistic results when the number of variables involved is considerable and, especially, when their non linear dependence would render other techniques not applicable (Jang 1993 Shanahan et al. 2000;Dağdeviren et al. 2008). Soft Computing techniques have been applied with profit also for functional classification of roads. Especially in the last 10-15 years, researchers have used them to extract information from the road environment through image analysis, previously recorded with usual camcorders (Zhu et al. 1998). The concomitant development of the Intelligent Transportation System (ITS) has encouraged the preparation of fuzzy models or based on Genetic Algorithms (GA), able not only to recognize but also to understand the images with uncertainties controllable by analysis probabilistic (Shanahan et al. 2000;Lingras 2001). More recently, soft computing techniques have been applied to evaluate details of the functional classification as, for example, the recognition of the road alignment by means of Artificial Neural Networks (ANN) (García Balboa, Ariza López 2008) or to evaluate traffic flows (Toplak et al. 2010).
With this paper we propose the application of a data mining technique that, based on the knowledge of a data set previously collected, allows us to quantify the membership degree of a road to a given functional class, identifying the contribution of the structural, geometrical and functional elements involved in the analysis.
The proposed model, furthermore, has the advantage that it can be updated with additional variables without losing accuracy (but, rather, increasing it) in the calculation of final solutions. The achievement of this objective is fundamental because it would allow, both during the design of a new road and during the maintenance of an existing road, to act on well-identified characteristics of the scenario that often are not covered by the applied standard.
In order to quantitatively assess the contribution of the proposal, we present a case study of a country road located in Sicily (Italy).

Method
The procedure suggested by the majority of the international standards is very general and is based on the determination of certain parameters, useful for the recognition of the most opportune class. Amongst them, for example, there are type of movement, offered services, standards of construction, traffic components and route directions. Furthermore, these properties depend on other variables (flows, operating speeds and design, consistency, visibility, planning tools, etc.), often with interdependency between them, which makes this problem extremely complex to solve. Finally, a further element of difficulty is represented by the fact that some constructive and functional characteristics can vary along the track, especially if this is sufficiently extended.
Therefore, taking into account the conditions here briefly introduced, we proposed a methodology based on the following steps: -survey of the following 25 variables for the definition of the expected class, recorded at appropriate points of the road axis: function in the network, served movement, transport capacity, design speed, operational speed, difference between de-sign speed and operational speed, stopping sight distance, available sight distance, degree of conditioning between users, roughness of the pavement, radius of the curve, length of the curve, lane width, width of the traffic island, width of shoulders right and left, vertical and horizontal signs, safety barriers, traffic flows (light and heavy vehicles), attractors of traffic, traffic components, admissions, presence of emergency lanes and parking areas, separation of lane directions; -the survey was performed with electronic equipment and produced a data base in which the rows represent the variables and the columns represent the observation; -to the data base so formed a data mining technique has been applied, in order to obtain 'knowledge' and delete information less useful. In particular, we have thought of applying an algorithm of fuzzy type, in order to detect 'borderline' situations -in which the group assignment is controversial. Moreover, the possibility that all the variables may direct the analyst to a clear decision is essentially remote. Our proposal is based, therefore, on a fuzzy clustering algorithm, which allowed to quantify the degree of membership of the road to every scheduled class.
In Table 1 we reported all the 25 variables that will serve to further analysis and their values that, ideally, should belong to 5 different classes: the first 4 (A, B, C and F) refer to the typology prescribed by Italian legislation for the roads. The fifth class is, instead, a condition of un-classification. Even if the designation of classes varies from country to country, the procedure is, however, of very general use.
Further below we give some details on the determination of certain variables more complicated to determine.

Visibility Distances and Design Speed
The effectively Available Sight Distance (ASD) was obtained using a commercial software (Civil Design® by Digicorp, http://www.digicorpingegneria.com), through the preparatory reconstruction of horizontal and vertical alignment of the road, including the identification of all the obstacles in the three dimensions of the space able to influence the vision of the driver (Bosurgi et al. 2010). The 3D analysis, as it is well known, is more precise than the traditional 2D analysis, because it allows considerations of the effective trajectory of the vision radius, following the altimetry of the road and the obstruction of the obstacles with their effective height. The calculation of the Stopping Sight Distance (SSD), as required by Italian Road Standard (Italiano Strada Standard 2001), is carried out by solving the equation below, substantially similar to those contained in many international standard: where: Knowledge of the geometry of the road has also permitted the valuation of the design speed V d , obtained by following the indications provided by Italian Road Standard (Italiano Strada Standard 2001) -greatly similar to those contained in other countries' standards. In the case we studied, it is not important to report the course of V d on the whole map, but only on the curves. The speed on these elements can be obtained from the following expression, that links the radius R [m], the inferred design speed V d [km/h], the coefficient of lateral friction (function of the V d ) and the cross-sectional slope of road q [%]: The previous formula is complicated by the fact that the variable f t , in turn, depends from V d .

V85 Determination
The campaign of survey was carried out during daylight hours, with dry and regular paving and good meteorological conditions. Of course, according to the procedure for this type of measurements (Lamm et al. 1999), the conditioning factors due to the traffic were not considered, nor did motorcycles, commercial or heavy vehicles, or cars with time spacing between them less than 5 sec take part in the analysis.
The speed was surveied by using a laser speed gun. This instrument, as it is known, measures the round-trip time for light to reach a vehicle and reflect back and shoots a very short burst of infrared laser light, then waiting for it to reflect off the vehicle.
For every place, the speeds of over 250 vehicles were recorded, obtaining from the analysis a minimal number not beneath 100 isolated vehicles. In total, more than 9000 passages were therefore acquired and all the data collected were processed in order to obtain the 85th percentile of the values of the speeds (V85). The analysis, as already stated, is based on the study of 45 cross sections.
For every analysed section, the main parameters of interest were assessed in order to reconstruct the distribution of the speeds and, therefore, determine the V85.
The density of relative frequency of the class f i was calculated with the well-known equation: ( From the relative frequency, we can then pass to the cumulated frequency through the equation: In which F i is the frequency to which the vehicles travel at a lower or equal speed to the considered. In this way, it was possible to reconstruct the course of the distributions of the frequency in relation to the position of each section. Referring to the Eqs (3) and (4) reported above: -n i is the absolute frequency, that is the number of elements pertaining to the class; -n is the total number of the elements in the series of data; -A is the amplitude of the class. The information so collected allowed the reconstruction of the distributions of the frequency in relation to the position of each section and the calculation of the representative value of the 85th percentile.

Classification of Other Characteristics of the Road Scenario
In order to complete the survey of the road environment, further variables (admissions, intersections, horizontal and vertical signs, barriers, roughness of the surface pavement, etc.) have been identified and classified in four categories, based on the analyst's judgment. Although this is a subjective determination, however, it was simply found or not the presence or the degree of effectiveness of the element under consideration and, therefore, we think that the difference with what is perceived by the user is negligible.

Brief Notes about Clustering
Generally, cluster analysis can be used to deal with a very wide range of problems, like classification, optimization, pattern recognition, prediction, decision support, especially when: -the system is non-linear, time-variant or ill defined; -the variables are continuous; -a mathematical model is either too difficult or expansive to organize; -there are too many or noisy inputs. The objective of data mining or, more specifically, of cluster analysis in this paper is the classification of objects according to similarities among them, in order to organize data into groups. The main characteristic of these techniques is to detect the underlying structure in data, not only for classification and pattern recognition, but also for model reduction and optimization. In the recent past, some researchers (Jang 1993) have ascertained the convenience to pair the methods previously seen in order to maximize their benefits. For example, one of the most effective procedures, that is the fuzzy clustering approach, consists of a structure that permits objects to belong to several clusters simultaneously, with different degrees of membership. In this way, the analyst should avoid to classify in a too much simplistic way certain phenomena within a single category when they have, instead, common characters to several classes, albeit with different degrees of membership.
Clustering techniques are generally used to classify similar objects and, especially, to organize data in predetermined groups by identifying hidden structures in the source data (Abonyi, Feil 2007). The data submitted for analysis originate from physical observations or surveys and each observation consists of n measured features, grouped into an n-dimensional vector: Therefore, a set of N observations can be represented as a matrix N × n: A cluster is representative of similar elements with respect to other belonging to another cluster. This characteristic is measured in analytical way as the normal distance between the center of the cluster and the data that belong to it. In the traditional view of hard clustering, an element of the data set belongs only to one cluster without no possibility to belong to another. In the last years, the growth of soft computer techniques, as fuzzy logic, has permitted to propose methods in which an object can belong to a number of clusters c simultaneously, with different membership degrees between 0 and 1. Naturally, the sum of the different membership degrees related to the interested clusters must be equal to one. The structure of the partition matrix U = [N × c], then, is the following: where: c is the number of fuzzy subsets or of clusters.
This matrix U=[µ ik ] is subject to the following conditions: With these premises, the fuzzy partitioning space for X is represented by the set: The ith column of U contains the values of the membership function of the ith fuzzy subset of X. With reference to the equation (5), the sum of each column is 1 and thus the total membership of each x k in X equals one.
Given the more realistic compliance with the uncertain nature of the variables collected in this paper, we have used the algorithm called Fuzzy C-means instead of other methods referred to hard clustering. This procedure regards the minimization of an objective function, termed C-means functional, defined as: where: V is the vector containing the centers of the clusters and m is a weighting exponent >1 that influences the fuzziness of the results: The minimization of the C-means functional J is a problem of nonlinear optimization, usually solved through a simple Picard iteration through the first-order conditions for stationary points of J equation.
The stationary points are obtained by applying appropriate constraints (Eq. (5)) to J through the Lagrange multipliers and the setting of the gradients ( ) J respect to U, V and λ to zero: where the following expression is the squared innerproduct distance norm: If > ∀ > 2 0, , and 1 then: can minimize J only if: The above equations can be easily automated within computing environments such as Matlab (http://www. mathworks.com) or Mathematica (http://www.wolfram. com/mathematica), greatly reducing the computational cost.

Feature Extraction
Successively, we have applied a features extraction technique to summarize the so classified data into a lower dimensional space, to remove second order dependencies and for displaying results in a more comfortable way. In order to map the high dimensional data point into a lower space, we have used the projection technique called Principal Component Analysis (PCA).
In particular, the PCA technique ) transforms a number of potential correlated variables into a predetermined number of uncorrelated variables (the principal components). The first component contains information regarding the variability in the data, while the other components (in general only one) consider the remaining variability as possible. The aim of this analysis is not only to find new significant variables but, furthermore, to reduce the dimensionality of the data set.
The algebraic solution is based on an important property of eigenvector decomposition.
In particular, the first principal component presents the same direction of the associated eigenvector with the largest eigenvalue. The direction of the second principal component is identified by the associated eigenvector with the second largest eigenvalue.
In order to reduce the size of the data set, its covariance matrix is defined in the following way: where: x is the mean of the data: and N is equal to the number of objects in the data set. The projection of the data onto a hyper-plane is based on the first few q nonzero eigenvalues and the corresponding eigenvectors of the following expression: where: U is a n × n matrix that presents the unit lengths eigenvectors in its columns; L is the diagonal matrix with the corresponding eigenvalues λ i , …, λ n along the diagonal. The variance of the data set is: If the first 2 greatest eigenvalues are used to visualize the original high dimensional data, the sum of the remainder eigenvalues is lost. The eigenvectors are the principal components and the eigenvalues are the relative variances. Therefore: is the representation of the kth sample in the new basis and its approximation in the original space is: This procedure is useful to decrease the size of the data set but it causes the loss of the connection between the output and the input variables.

Results
In order to evaluate the proposed procedure we have studied a 22 km long rural road, called 'SS 113' that connects Messina with Trapani (Italy). The infrastructure is old, antecedent to modern road standards and, for this reason, it is composed of a succession of straight stretches and circular curves, without the presence of transition elements.
The alignment is very winding, with short straight stretches interposed between the circular curves. The variability of the radii of the horizontal bends is really elevated and spans from a minimum of 24 m to a maximum of 3300 m. Longitudinal slope is modest for the entire development (0 ≤ i ≤ 5%). Moreover, the road is characterized by low volumes of traffic, the absence of intersections and a sufficient geometric consistency, that is such as not to induce abrupt manoeuvres by drivers. The main geometric characteristics were found by the authors through the examination of 3D-digital cartography and with the aid of a differential GPS. In the next phase, after the reconstruction of the road geometry, we have computed the design and the V85 speed, the sight distances and the other variables, all in correspondence of the bisector of the circular curves. The final results are summarized in the Table 2.
In details, the application of the fuzzy clustering procedure can be schematized in the following phases: -Collection of two data sets with N = 5 observations (first phase) and N' = 24 observations (second phase). Because of the heterogeneity of the units contained in the data sets, it was necessary to perform a normalization procedure not reported here, given its banality; -Imposition of the number of clusters equal to 5; -Establishment of the termination tolerance e > 0 (in this case e-= 110 -6 ); -Initialization of the partition matrix randomly, such that ( ) ∈ .
� Computation of the distances: � Updating of the partition matrix: The shape of the clusters is determined by the choice of the particular A in the distance measure (A = I) that induces the standard Euclidean norm: The application of the Fuzzy C-Means algorithm returns the structure matrix representing the partition matrix U (c × N) (Table 3), the distance matrix containing the square distances between data points and cluster centers 2 ik D (c × N), the cluster centers v i (c × n) (Table 4); N = 5 is the number of the initial observations; c = 5 is the clusters number; n = 25 is the number of input variables.
Finally, the insertion of new observations (Table 2) has allowed to evaluate the membership to the clusters without knowing the value of the density and to deter-mine the partition matrix for the evaluated data set U * (c × N') and the distance matrix representative of the distances between the evaluated data points and the cluster centers *2 ik D (c × N'), with N' = 24 number of new observations (Table 5).
Through the Tables 5, therefore, it is possible to identify the preponderant class to which report all the observations and, therefore, the entire road. Fig. 1 shows how the general infrastructure belongs to the class F but also evidences an attraction to the cluster representing situations of un-classification (red dashed line).
Finally, the reduction of the size of the matrix via the PCA procedure, allowed to represent the data in a more compact form and to highlight the closeness to the 5 clusters (Fig. 2). Table 3. Partition matrix U (it permits to assign the cluster number to a specific class; in this case A -cluster 2; B -cluster 3; C -cluster 1; F -cluster 4; NO -cluster 5) Cluster 1 1.1E-11 1.8E-11 1.0E+00 1.8E-10 1.6E-11 Cluster 2 1.0E+00 6.9E-11 1.1E-11 8.2E-12 4.5E-12 Cluster 3 6.9E-11 1.0E+00 1.8E-11 1.4E-11 6.5E-12 Cluster 4 8.2E-12 1.4E-11 1.8E-10 1.0E+00 1.9E-11 Cluster 5 4.5E-12 6.5E-12 1.6E-11 1.9E-11 1.0E+00 the full membership to an expected class would permit to predict and organize the most correct maintenance operations in order to improve the functional characteristics of the infrastructure up to the desired level. The uncertainty and the extreme subjectivity of the traditional methodologies regarding these issues are exceeded by the procedure proposed here. In particular, we have quantified the degree of membership to a predetermined class (cluster) of each road section. This information is already sufficient for the stakeholder of the infrastructure: for example, if the degree of membership to the group C is limited by an insufficient available sight distance, he can act on the obstacles (barriers, walls, trees, etc.), or on the other parameters involved, to bring the section close to the desired cluster.

Discussion
An examination of the tables and figures included in the Results section allows us to highlight some advantages of the proposed procedure. As it is well known, the classification of a road is hardly ever an easy operation. This eventuality represents a rare and desirable case, but actually, the analyst has to manage a multiplicity of parameters whose values are often contradictory to each other. That is the great weakness of all the manual procedures: none legislation, in fact, specifies the way in which we can carry out the analytical process. For example, a road may have geometrical and constructive characteristics adequate to be classified as C, but, at the same time, the traffic flows would address it towards the group F. Situations like this are highly conflictive and can compromise the quality of the analysis. In fact, the analyst would need to know in quantitative terms the conformity of the road against all groups, in order to take the subsequent decision with due prudence and knowledge. Furthermore, The examination of the Fig. 1 allows to evaluate synthetically the content of the Table 5. In particular, the graph shows the sections 2, 6 and 23 in which there is a strong inconsistency compared to adjacent portions of the road. In fact, while all the other sections have a high prevalence of the group F (black solid line around the value 0.53), these three ones have a relatively low degree of membership to it (black solid line around the value 0.40), though still prevalent than other classes. The same graph shows that, at the same time, there is a certain tendency of these sections to acquire characteristics of the group C and, therefore, better functional characteristics. To inspect the reasons for these results it is sufficient to examine the Table 2, where we can see how the three sections have abnormal values of the following variables: -design speed very high (100 km/h) and, therefore, very large radiuses. -V85 speed high enough (80, 86 and 94 km/h respectively for sections 2, 6 and 23). -remarkable sight stopping distance (but only in section 2). Since the outcome of the analysis cannot be seen as positive (it is, in fact, an obvious lack of homogeneity), the stakeholder should introduce solutions to mitigate the speed, in order to make the route more uniform.
The response of the model to the fuzzy clustering analysis also highlights a certain affinity (between 0.25 and 0.30) towards the NO cluster. This time the examination of the Table 2 allows us to see how some variables do not respect the nominal limits of the four standard classes (A, B, C and F): among the most evident there are the V85, the stopping sight distance and the available sight distance, the presence of emergency lanes and parking areas. It is clear, therefore, that this procedure does not diminish the role of the analyst, but rather makes him capable to take more rational decisions.
At the end, the Fig. 2 summarizes a procedure (PCA) to decrease the size of the sample. In this case, it has been used to reduce it to a two-dimensional matrix and appreciate the disposition of the observations compared to the 5 clusters. It would be interesting to perform the analysis only with the more important and delete the others, in order to make the subsequent survey less expansive. However, in this phase of the research this additional step has been avoided to focus on the actual validity of the presented procedure.
We solved our problem with the clustering technique called Fuzzy C-Means. Only later, the results have been simplified in two dimensions through the PCA procedure, solely for arrange them in a more comfortable way. Although there are many other techniques applicable to this type of problem, we believe that the FCM is the most suitable one for its simplicity and because the analytical structure returns the final result accompanying it to a degree of uncertainty.

Conclusions
The functional classification of a road is an obligatory step for the designer or for the stakeholder of a new or an existing road. In fact, this operation establishes the geometrical standard, orientates future maintenance operations (usually characterized by limited budgets), identifies the right constructive elements to ensure or improve drivers' safety, follows the development of the territory as required by the planning instruments. In general, however, the analyst accomplishes this task by determining only few synthetic indices, often of a qualitative nature, without any analytical detail and so providing a result strongly based on his own experience.
This approach causes not only an approximation of the final outcome, but also a real incapability to identify the contribution of each variable involved in the analysed scenario.
In this paper, we wanted to propose a procedure based on a fuzzy clustering technique in order to make some improvements to the analysis required by the road rules.
First, the choice of a clustering technique of this type takes into account the vagueness of the survey, even in the final results. If, instead, we had applied a hard type technique (K-means clustering, K-medoids, etc.) we would have had a very sharp classification, so inducing the analyst's thought towards a false sense of accuracy of the final result.
Secondly, the vagueness of the outcome of the analysis, when it occurs, allows the analyst to operate through two different ways, both desirable: -acceptance of a condition quite critical but in the awareness of the distance with the theoretical class of the road (in essence, the difference between 1 and the calculated degree of membership); -improvement of the so determined scenario with appropriate measures on the variables identified as the most vulnerable. In both cases, the analyst always has a very significant role, but may base his choice on the support of a sufficiently evolved analysis. This procedure can be applied with profit by the manager of the road and can be further developed to refine some problems of little scientific interest, but of considerable practical importance. It is possible, for example, to increase significantly both the number of variables and the observations, achieving improved reliability of the final results. It is also desirable to implement this technique in a Geographic Information System in order to use the data typically found in these instruments and offer an additional instrument for the knowledge of the road context.
From the scientific point of view, a sufficiently simple development to perform is to study the variables that most affect the studied phenomenon with feature selection techniques. In this way, it is possible to limit the cost of the survey and, above all, the analyst turns his attention only to the parameters that really count for the quantification of the final result. As mentioned, at this stage this step has not been deepened, also because it requires database with a very significant number of observations with respect to the number of the input variables.