PREDICTION FOR TRAFFIC ACCIDENT SEVERITY: COMPARING THE ARTIFICIAL NEURAL NETWORK, GENETIC ALGORITHM, COMBINED GENETIC ALGORITHM AND PATTERN SEARCH METHODS

. This paper focuses on predicting the severity of freeway traffic accidents by employing twelve accident-related parameters in a genetic algorithm (GA), pattern search and artificial neural network (ANN) modelling meth-ods. The models were developed using the input parameters of driver’s age and gender, the use of a seat belt, the type and safety of a vehicle, weather conditions, road surface, speed ratio, crash time, crash type, collision type and traffic flow. The models were constructed based on 1000 of crashes in total that occurred during 2007 on the Tehran–Ghom Freeway due to the fact that the remaining records were not suitable for this study. The GA evaluated eleven equations to obtain the best one. Then, GA and PS methods were combined using the best GA equation. The neural network used multi-layer perceptron (MLP) architecture that consisted of a multi-layer feed-forward network with hidden sigmoid and linear output neurons that could also fit multi-dimensional mapping problems arbitrarily well. The ANN was applied during training, testing and validation and had 12 inputs, 25 neurons in the hidden layers and 3 neurons in the output layer. The best-fit model was selected according to the R -value, root mean square errors (RMSE), mean absolute errors (MAE) and the sum of square error (SSE). The highest R -value was obtained for the ANN around 0.87, demonstrating that the ANN provided the best prediction. The combination of GA and PS methods allowed for various prediction rankings ranging from linear relationships to complex equations. The advantage of these models is improving themselves adding new data.


Introduction
As the world population grows and cars become increasingly common, the number of traffic crashes worldwide is increasing. Traditional measures to reduce crashes include improved geometric design, congestion management strategies and better driver education and enforcement. While these measures are generally effective, they are often not feasible or prohibitively expensive to implement. Many factors are involved in traffic crashes, and some of those have a profound impact on one another, thus preventing transportation safety designers from using only one parameter to fully explain traffic accident severity. Studying parameters involved in traffic crashes using combined modern models that include the interactions of input and output variables can lead to a decrease in the number of traffic crashes. The model of crash prediction (also called the safety performance function) is one of the most important techniques for investigating the relationship between crash occurrence and risk factors associated with various traffic entities. More than 28000 people are killed per year on Iranian roads with economic and social consequences. Factors with a profound impact on traffic accident severity include the demographic or behavioural characteristics of the driver (vehicle speed, driver's age and gender, seat belt use), environmental factors and roadway conditions at the time of the crash (crash time, weather conditions, road surface, crash type, collision type, traffic flow) and technical characteristics of the vehicle itself (vehicle type and safety). The primary goal of this study is to compare various models and select the most accurate one to predict traffic accident severity based on selected parameters; in addition, this research provides a possibility of modifying itself with new added data with regard to twelve parameters and three injury severity levels selected as input and output variables. This paper investigates three modelling techniques for achieving high predictive accuracy. Artificial neural networks are capable of capturing highly nonlinear relationships between predictor variables (crash factors) and the target variable (severity level of injuries). This aspect of neural networks is particularly useful when the relationship between the variables is unknown or complex and therefore difficult to handle statistically.
The second model is a genetic algorithm used for solving both constrained and unconstrained optimization problems based on natural selection, which is the process that drives biological evolution. The third model we investigate is a model combining the genetic algorithm (GA) and pattern search (PS) models. The use of GA and PS models in transportation safety studies is relatively new; therefore, we are going to combine these models in order to improve prediction accuracy.
Past research analyzing accident frequencies has mainly relied on statistical models such as linear regression models, Poisson regression and/or negative binomial regression models because the occurrence of accidents on a highway section can be regarded as a random event.

Background
The main focus of the prior studies has been to identify a defensible statistical relationship between crash counts and exposure. The negative binominal (NB) model arises mathematically (and conveniently) by assuming that unobserved crash heterogeneity (variation) across sites (intersections, road segments, etc.) is Gamma distributed while crashes within sites are Poisson distributed (Washington et al. 2010). Bayesian empirical methods have also been developed (Mahalel et al. 1982;Ng, Sayed 2004;Wright et al. 1988). Poisson, Poisson-Gamma (NB) and other related models are called generalized linear models. Hosseinlou and Aghayan (2009) used fuzzy logic to predict traffic accident severity on the Tehran-Ghom freeway in Iran.
Artificial neural networks (ANN) have been verified to be efficient in many fields. Neural networks are commonly used for non-linear modelling and forecasting. In traffic safety, some studies have applied ANNs to predicting crash rates and analyzing crashes, but none have used twelve parameters, including important factors with detail. Thus, this study attempted to incorporate all relevant parameters into the models to achieve a high percentage of crash forecasting. Mussone et al. (1999) applied artificial neural networks to analyze vehicular crashes that occurred at an intersection in Milan, Italy. A number of studies have attempted to identify groups of drivers at a greater risk of being injured or killed in traffic crashes (Zhang et al. 2000;Valent et al. 2002). Bédard et al. (2002) applied multivariate logistic regression analysis to investigate the effects of a driver, crash and vehicle characteristics on fatal crashes. Ivan et al. (2000) investigated single and multi-vehicle highway crash rates and their relationships with traffic density while controlling for land use, the time of the day and light conditions. Temporal effects were also considered for single-vehicle crashes. Lord et al. (2005) conducted analysis on the relationship among crash, density (vehicles per km per lane) and v/c ratio. They found that along with an increase in v/c ratio, fatal and single-vehicle crashes decreased after some point, and crash rates followed U-shaped relationship. Artificial neural networks have scarcely been used as a modelling approach in the analysis of crash-related injury severity. More recent applications in the transportation field using the ANN have included traffic prediction (Yin et al. 2002;Zhong et al. 2004), the estimation of traffic parameters (Tong, Hung 2002), traffic signal control (Zhang et al. 2001), incident detection (Jin et al. 2002;Yuan, Cheu 2003), travel behaviour analysis (Subba Rao et al. 1998;Hensher, Ton 2000;Vythoulkas, Koutsopoulos 2003) and traffic accident analysis (Mussone et al. 1996(Mussone et al. , 1999Sohn, Lee 2003;Abdel-Aty, Pande 2005). For example, Abdelwahab and Abdel-Aty (2001) used artificial neural networks for modelling the relationship between driver injury severity and crash factors related to the driver, vehicle, roadway, and environmental characteristics. Their study focused on classifying accidents into one of three injury severity levels using the readily available crash factors. These authors limit their domain of study to two vehicle accidents that occurred at intersections with signals. The predictive performance of a multi-layer perceptron (MLP) neural network was compared to the performance of the ordered logit model. The obtained results showed that MLP achieved better classification (correctly classifying 65.6 and 60.4% of cases for training and testing phases respectively) than the ordered logit model (correctly classifying 58.9 and 57.1% of cases for training and testing phases respectively). Abdel-Aty and Pande (2005) applied a probabilistic neural network (PNN) model for predicting crash occurrence on the Interstate-4 corridor in Orlando, Florida. The average and standard deviation from speed around crash sites were extracted from loop data as input variables. The results of this analysis showed that at least 70% of the crashes could be correctly identified by the proposed PNN model.
Genetic algorithms are powerful stochastic search techniques based on the principle of natural evolution. These algorithms were first introduced and investigated by Holland (1992). According to Chang and Chen (2000), regression models generated by genetic programming (GP) are also independent of any model structure. According to Deschaine and Francone (2004), the GP is observed to perform better than classification trees with lower error rates and also outperforms neural networks in regression analysis. Several studies (Park et al. 2000;Ceylan, Bell 2004;Teklu et al. 2007) have used GP methods in the traffic signal system and network optimization.

Artificial Neural Network
Neural networks are composed of simple elements operating in parallel inspired by biological nervous systems. As in nature, connections between elements largely determine the network function. A neural network can be trained to perform a particular function by adjusting the values of connections (weights) between elements.
We used the architecture of a multi-layer perceptron (MLP) neural network that consisted of a multi-layer feed-forward network with sigmoid hidden neurons and linear output neurons. Multi-layers of neurons and the non-linear transfer function allow the network to learn non-linear and linear relationships between input and output vectors. The linear output layer allows the network to produce values outside the range from -1 to +1 so that this network with biases, a sigmoid layer and a linear output layer are capable of approximating any function with a finite number of discontinuities. This network can fit multi-dimensional mapping problems arbitrarily well given consistent data and enough neurons in its hidden layer. The network will be trained applying Levenberg-Marquardt back propagation algorithm. This structure essentially consists of a collection of non-linear neurons organized and connected to each other in a feed-forward multi-layer structure using directed arrows as coefficients (commonly called weight and bias in neural network terminology). The structure usually consists of input nodes, a hidden layer including some neurons and output nodes. The hidden layer is the network layer, which is not connected to the network output (for instance, the first layer of a two-layer feed forward network). This pattern is known to be well-suited to prediction and classification problems.

Genetic Algorithm
A genetic algorithm is a method for solving both constrained and unconstrained optimization problems and is based on natural selection, the process that drives biological evolution. Genetic algorithms repeatedly modify a population of individual solutions. At each step, the genetic algorithm selects individuals at random from the current population to be parents and uses them to produce children for the next generation. Over successive generations, the population 'evolves' toward an optimal solution. Genetic algorithms can be applied to solve a variety of optimization problems that are not well-suited to standard optimization algorithms, including problems in which the objective function is discontinuous, nondifferentiable, stochastic or highly nonlinear. This meth-od was developed by Holland (1992) over the course of the 1960s and 1970s and was finally popularized by one of his students, Goldberg, who was able to solve a difficult problem for his dissertation involving the control of gas-pipeline transmission (Goldberg 1989). Holland was the first to try to develop a theoretical basis for GAs through his schema theorem. The work of De Jong (1975) demonstrated the usefulness of GAs for function optimization and was the first concerted effort to optimize GA parameters.
GA operators are mutation (changes in a randomly chosen bit of a chromosome) and crossover (exchanging randomly chosen slices of a chromosome). Fig. 1 shows a genetic cycle of the GA where the best individuals are continuously selected and operated on by crossover and mutation.

Pattern Search
Direct search is a method of solving optimization problems and does not require any information about the gradient of the objective function. Unlike more traditional optimization methods that use information about the gradient or higher derivatives to search for an optimal point, a direct search algorithm searches a set of points around the current point, looking for one point where the value of the objective function is lower than the value at the current point. Direct search can be used for solving problems when the objective function is not differentiable or even not continuous. Pattern search algorithms are direct search methods well-suited for the global optimization of highly nonlinear, multi-parameter and multimodal objective functions (Lewis, Torczon 1999). The current paper tests a pattern search algorithm based on GPS Positive Basis 2N (Lewis, Torczon 1999;Audet, Dennis 2003).
Pattern Search functions include two main algorithms called the generalized pattern search (GPS) algorithm and the mesh adaptive search (MADS) algorithm. Both are pattern search (PS) algorithms that compute a sequence of points that approach an optimal point. Pattern search algorithms are direct search methods that are capable of solving global optimization problems of irregular, multimodal objective functions without the need to calculate any gradient or curvature information, especially to address problems for which the objective functions are not differentiable, stochastic or even discontinuous (Torczon 1997).
At each step, the algorithm searches for a set of points called a mesh around the current point that was computed in the previous step of the algorithm. The mesh is formed by adding the current point to a scalar multiple of a set of vectors called a pattern. If the pattern search algorithm finds a point in the mesh that improves the objective function at the current point, the new point becomes the current point in the next step of the algorithm. The MADS algorithm is a modification of the GPS algorithm. The algorithms differ in how the mesh is computed. The GPS algorithm uses fixed direction vectors, whereas the MADS algorithm uses a random selection of vectors to define the mesh. The MADS algorithm uses the relationship between mesh size m ∆ and an additional parameter called the poll parameter, p ∆ , to determine stopping criteria.
For positive basis N+1, the poll parameter is N m Ν ∆ , and for positive basis 2N, the poll parameter is m ∆ . The relationship for the MADS stopping criterion is m ∆ ≤ mesh tolerance, where Δ m is the mesh size. At each iteration pattern search polls the points in the current mesh by computing the objective function at the mesh points to see if any points have function values less than the current value. The pattern that defines the mesh is specified by the poll method option. GPS positive basis 2N consists of the following 2N directions, where N is the number of independent variables for the objective function. Pattern searches sometimes run faster using GPS positive basis Np1 as the poll method rather than GPS positive basis 2N because the algorithm searches fewer points at each of the iterations. MADS positive basis Np1 is also faster than MADS positive basis 2N (Lewis, Torczon 2002).

Measures for Goodness-of-Fit Regression Model
Goodness-of-fit (GOF) statistics is useful for comparing results across multiple studies, for examining competing models within a single study and for providing feedback on the extent of knowledge about uncertainty involved in the phenomenon of interest. Four measures of the GOF model are discussed: the sum of squares due to error (SSE), root mean square error (RMSE), correlation coefficient (R), MAE (mean absolute error) (Draper, Smith 1998).

Sum of Squares Due to Error
This statistics measures the total deviation of response values from fit to response values. It is also called the summed square of residuals and is usually labelled as SSE by Eq. (1) in which i y is response value (target output) and ˆi y is prediction response value: SSE value closer to 0 indicates that the model has a smaller random error component and that the fit will be more useful for prediction.

Root Mean Squared Error
This statistics is also known as the fit standard error and the standard error of regression. RMSE is an estimate of a standard deviation from the random component in data and is defined as Eq. (2): where: MSE is the mean square error or the residual mean square, Eq. (3): (3) Just as with SSE, MSE value closer to 0 indicates a fit more useful for prediction, and the root mean square error (RMSE) is a frequently-used measure of differences between the values predicted by a model or an estimator and the observed values.

Mean Absolute Error (MAE)
The average error of estimator ( ) k f x  with respect to estimated parameter k y is defined as the mean of the absolute difference between the estimator and real value, Eq. (4):

Correlation Coefficient (R)
The correlation coefficient matrix represents the normalized measure of the strength of the linear relationship between variables. Matrix R of correlation coefficients was calculated from input matrix X the rows of which are observations and columns are variables. Matrix R is related to covariance matrix C = cov(X) by Eq. (5): The correlation coefficients range from -1 to 1, where values close to 1 suggest that there is a positive linear relationship between data columns. The values close to -1 suggest that one column of data has a negative linear relationship to another column of data (anticorrelation), and the values close to or equal to 0 suggest that no linear relationship exists between data columns (Bevington, Robinson 2002).

Typical Steps in Designing a Model
Fig. 2 describes the principles of the employed models. Initially, 1000 records collected from police records were used for constructing objective functions for these models. Then, the models were able to modify the objective function with regard to each of those 1000 records added to preliminary data. In addition, the optimum coefficients of the objective function (for new records) were the initial optimum vector in combined GA and PS models (for the last records). To achieve optimal results from the ANN model, new weights and biases were calculated from the preliminary weight matrix and bias vector. Therefore, the ANN and GA a well as combined GA and PS models were able to find the minimum even with less than optimum choice for the initial range. Finally, the errors of objective functions were calculated applying these models, and the most appropriate error with respect to its type in each model was selected to determine the final objective function. The advantage of this structure is the ability of the model to improve itself with new added data.

Data Description
The dataset used in this study was derived from a total of 1063 reported traffic crashes in Tehran, the capital of Iran. We selected these crashes from the total number of crashes that occurred on the Tehran-Ghom freeway in 2007 because these were the only complete crash records. These data were used as training and testing data for the artificial neural network, genetic algorithm and combined GA and PS methods. The predictions of these three models were compared. The majority of crashes (74.8%) involved two vehicles. The distribution of driver injuries made 14% of fatal injuries, 38.4% of evident injuries and 47.6% contained no injuries.
Three injury levels were considered for this study (i.e. no injury, evident injury or disabling injury/fatality), and twelve variables were selected from the obtained data. The vehicle speed in police reports was calculated by a camera or breaking distance. Speed ratio was used as one of the input variables defined as the ratio of estimated speed at the time of a crash to posted speed limit at the crash location. Road geometry parameters were not taken into consideration because the selected road had a desirable geometry common to all crashes in the dataset. The input variables have either numerical or dummy values to be used in the program. Table 1 shows coding input and output variables. MATLAB software was used for comparing the performance of three modelling approaches (ANN, GA, and combined GA and PS) discussed earlier.

Multilayer Perceptron Neural Networks
The MLP model consisted of two layers having weight matrix W, bias vector b and output vector i p that 1 i > . Fig. 3 shows the selected final model for each of these layers in the MLP model. The number of the layer was appended as a superscript to the variable of interest.
Superscripts were used for identifying the source (second index) and destination (first index) of various weights and other elements of the network.
The weight matrix connected to input vector 1 p was labelled as input weight matrix (IW 1,1 ) having source 1 (second index) and destination 1 (first index). The elements of layer 1such as its bias, net input and output have superscript 1 to represent that they were associated with the first layer.
The matrices of layer weight (LW) and input weight (IW) were used in the MLP model. Data were randomly divided into three parts: training, testing and validating The MLP model had 12 inputs, 25 neurons in the first layer and 3 neurons in the second layer. The output layer of the MLP model consisted of three neurons representing three levels of injury severity. 70% of the original data were used in the training phase. Validation and testing data sets each contained 15% of the original data. Constant input 1 was fed to the bias of each neuron. Note that the outputs of each intermediate layer were the inputs to the following layer. Thus, layer 2 can be analyzed as one-layer network having 25 inputs, 3 neurons and 3×25 weight matrix W 2 ; under such circumstances, input layer 2 is 2 p . All the vectors and matrices of layer 2 have been identified. The layer can be treated as a single-layer network on its own. The layers of a multi-layer network play different roles in the prediction process. This kind of two-layer network was used extensively in backpropagation. This study suggested that the output of the second layer, 3 p , was the network output of interest and was labelled as y (Rumelhart et al. 1986).
The objective of this network is to reduce error e, which is the difference between t and i p in which 1 i > and t is the target vector. The perceptron learning rule calculates desired changes (target output) in the weights and biases of the perceptron, given input vector 1 p and associated error e. Thus, the goal is to minimize the average of the sum of these errors. The Least Mean Square Error (LMS) algorithm adjusts the weights and biases of the linear network so as to minimize this mean square error.
The error at output neuron j at iteration t can be calculated by the difference between the desired output (target output) and the corresponding real output, ( ) ( ) ( ) j j j e t d t y t = − . Accordingly, Eq. (6) is the total error energy of all output neurons.
The steepest descent of MSE can be used to update weights by Eq. (9) (Yeung et al. 2010): The mean square error performance index for the linear network is a quadratic function as shown in Eq. (8). Thus, the performance index will either have one global minimum, weak minimum or no minimum, depending on the characteristics of input vectors. Specifically, the characteristics of input vectors determine whether or not a unique solution exists (Hagan et al. 1996).
The results of the MLP model are presented in Table 2 in the form of a prediction table. Table 2 depicts the prediction level of injury severity patterns in training, testing and validation phases.  Fig. 4 shows regression plots for the output with respect to training, validating and testing data. The value of the correlation coefficient (R) for each phase was calculated. The R-value was around 0.87 for the total response in the MLP model. Fig. 5 plots training errors, validation errors and testing errors to find validation error in the training window. The best validation performance occurred at iteration 7, and the network at this iteration was returned. The plot in Fig. 5 shows the mean squared error of the network starting at a large value and decreasing to a smaller value, which means that network learning is improving. The plot has three lines, because 1000 input and target vectors were randomly divided into three sets. 70% of the vectors were used for training the network. 15% of those were used for validating how well the network was generalized. Training vectors continues as long as training reduces the network error on validation vectors. After the network memorized the training set (at the expense of generalizing more poorly), training is stopped. This technique automatically avoids the problem of over fitting, which plagues many optimization and learning algorithms. Finally, the last 15% of the vectors provide an independent test of network generalization about data that the network has never seen.

Genetic Algorithm
The genetic algorithm (GA) is an optimization and search technique based on the principles of genetics and natural selection. The genetic algorithm starts with a population of solutions (chromosomes) represented by coded strings (typically 0 and 1 binary bits) as the underlying parameter set of the optimization problem. GAs generate successively improved populations of solutions (better generations) by applying three main genetic operators: selection, crossover and mutation. The selection function chooses parents for the next generation based on their scaled values from the fitness scaling function where the stochastic uniform selection function was used. Crossover is achieved by exchanging coding bits between two mated strings. The chromosomal material of different parents can be combined to produce an individual that could benefit from the strength of both parents. In this case, the applied crossover function was scattered.
Mutation occasionally provides and recovers useful material for chromosomes through the random alteration of the value of a string bit (in the binary case, from 0 to 1 and vice versa). In our case, Gaussian mutation function was used. The following formula was obtained from 1000 police records, and therefore the system was able to modify the formula based on the added records. The goal is to find the solution in the set with the highest (optimum) performance according to our measure of 'goodness' . An objective function can be defined to represent the severity of traffic crash and prediction target that we seek to optimize. The objective functions were selected by checking the values of R, MAE RMSE, and SSE as shown in Table 3.
Thus, we conclude that the objective function given in Eq. (6) has the best results for the GA model, with the R-value around 0.78 because the GA starts up creating a random initial population that contains an individual vector related to the population. The GA process stops when stopping criteria such as the maximum number of generation, stall time, stall generation and fitness limit are met or reach function tolerance values (1.0×10 -6 ). In Table 3, the objective function having higher R is in the first row, and therefore we can change it. By checking the optimized objective function having different initial populations, vectors and stopping criteria, we can get better coefficients related to our model. After checking the multiple of these situations for getting better results of the coefficient, we received the R-value of 0.79. where: x is the coefficient of the optimized objective function and b and out parameters are related to input and output variables respectively. Table (4) presents modified coefficients of the objective function. Fig. 6 displays the best and mean values of the fitness function at each generation. In addition, the best and mean values in the current generation are shown at the top of Fig. 6.

Combination of the Genetic Algorithm and Pattern Search
We combined GA and PS models to determine whether this combined method would achieve better results than the genetic algorithm. This paper is based on GPS Positive Basic 2N, which enhances the performance of pattern search algorithms.
The initial point of this method was obtained from the optimum point of the GA shown in Table 4. Table 5 presents the modified coefficients of the combined model. The combined GA and PS model has the R-value of around 0.79. Fig. 7 shows the value of the objective function at the best point considering each of the iterations. Typical-ly, the value of the objective function improves rapidly in early iterations and then level off as they approach the optimal value. The initial point of this graph is the optimum final result of the GA.
The convergence curve in Fig. 7 is typical of pattern search algorithms. The initial convergence occurred after the first 800 iterations, followed by progressively slower improvements as the optimal solution was approached. Fig. 8 displays mesh size at each iteration as it increased after each successful and decreased after each unsuccessful iteration. The best point did not change following an unsuccessful poll.
As a result, the algorithm halves mesh size with a contraction factor set to 0.5. The computed objective    Iteration function value at iteration 2 was less than the value at iteration 1 in Fig. 1, which indicates that the poll at iteration 2 is successful. Thus, the algorithm doubles mesh size with the expansion factor set to 2 in Fig. 8. Clearly, the poll at iteration 4 was unsuccessful. As a result, the function value remained unchanged from iteration 3, and mesh size was halved. As shown in Fig. 9, after 1297 iterations were completed, the pattern search algorithm performed approximately 98000 function evaluations to locate the most promising region in the solution space containing the global minima.

Discussion
This study used an artificial neural network, a genetic algorithm, combined genetic algorithm and pattern search method to predict the severity of traffic accidents. The final results showed that the ANN performed better than the GA and combined GA and PS models. Table 6 presents correlation coefficient (R), mean absolute error (MAE), RMSE and SSE values. These results demonstrate that the constructed ANN is promising for modelling traffic injury severity. Fig. 10 compares the real output values of crash severity with the predicted values of three models tested in our case. This graphical presentation depicts a considerable overlap between real and predicted graphs showing that the models successfully predict traffic accident severity with high accuracy. Fig. 11 shows regression plots for the output with regard to fatality, evidence injury and no-injury; in addition, the value of correlation coefficient (R) for each level of crash severity was estimated. The R value of no-injury was higher than others which means that the results were compatible with the number of records.

Conclusions
1. This study used the GA, combined GA and PS, and the ANN with MLP architecture to predict traffic injury severity using twelve input parameters and three levels of injury severity. The performance of these methods was compared to find the most suitable method for predicting crash severity at three levels: fatality, evident injury, and no injury. 2. The ANN was applied for training, testing and validation and had 12 inputs, 25 neurons in the hidden layers and 3 neurons in the output layer. Data on training, validation and testing of applying the ANN represented 70%, 15% and 15% of all data on crashes, respectively. The R-value of the ANN was around 0.87. 3. The GA alone as well as combined with the PS model were used for predicting accident severity. The ANN provided the highest prediction accuracy with the R-value of around 0.87 followed by the combination of the GA and PS with the R-value of around 0.79 and GA of 0.79. Therefore, for this dataset, the ANN constructs a better relationship between twelve input parameters of the model and crash severity. On the other hand, the advantage of using the GA or the combined GA and PS model is that the functions and coefficients of relationships are known. Thus, each model has its own advantage, and therefore using more than one method may provide a better understanding of the relationship between input and output variables.
4. The constructed models were able to incorporate additional data. Moreover, the optimum coefficients of the objective function are the initial optimum vector in the combined GA and PS model. In order to reach optimum results using the ANN model, new weight and bias are calculated from the preliminary weight matrix and bias vector. 5. The use of more than one model suggested in this research provided a complete understanding of the relationship between input and output variables (combination of the GA and PS) and allowed for high prediction accuracy (ANN).