NEURO-FUZZY APPROACH TO PREDICTIVE MODELING OF EMISSIONS FROM BIODIESEL POWERED TRANSIT BUSES

The rise in freight passenger transportation is responsible for air pollution, green house gas emissions (especially CO2) and high fuel demand. New engine technology and fuels are discovered and tested throughout the world. Biodiesel, an alternative for diesel, has been seen as a solution. However, the amount of emissions generated by a biodiesel fueled vehicle has not been understood well since most research studies of this kind reported in the literature were conducted in the laboratory. In the present study, emissions (NOx, HC, CO, CO2 and PM) were measured from biodiesel fueled transit buses using an on-road emissions measuring device known as the Portable Emissions Measurement System (PEMS). On-road study is important in terms of understanding the amount of emissions generated under the real traffic and environmental conditions. Emissions were measured on buses fueled with regular diesel (B0), B10 blend (10% biodiesel + 90% diesel) and B20 blend (20% biodiesel + 80% diesel). This paper demonstrates the use of hybrid soft-computing techniques such as the neuro-fuzzy technique for developing emissions prediction models from real-world data. Hybrid soft-computing techniques have been shown to work well in handling data prone to noise and uncertainty, which is characteristic of real-world scenario. Two neuro-fuzzy methodologies were considered in this study: the Adaptive Neuro-Fuzzy Inference System (ANFIS) and the Dynamic Evolving Neuro-Fuzzy Inference System (DENFIS). A brief review of model development, recommended parametric settings, and statistical evaluation of prediction performance of both techniques are discussed. In general, the ANFIS showed better prediction accuracy for the individual emissions compared to DENFIS although the prediction accuracies are comparable.


Introduction
Vehicular emission is the most prominent contributor of air pollution. Oxides of Nitrogen (NO x ), Hydrocarbons (HC), Carbon monoxide (CO), Carbon dioxide (CO 2 ) and Particulate Matter (PM) emissions are identified as criteria pollutants. CO 2 is the major green house gas from automobiles that is responsible for global warming and climate change. NO x and HC assist in generation of ozone and smog and CO forms a stable compound carboxyhemoglobin when combined with inhibiting the oxygen carrying capacity of blood. PM is the contributor of respiratory problems such as bronchitis.
Biodiesel is seen as a solution to air pollution and fuel supply problems. Research on biodiesel has shown that it is a biodegradable, non-toxic (in small quantity), non-hazardous fuel with high indicates the ignition quality of fuel oil), high lubricity and high flash point (combustible but not flammable). Some research findings confirmed that using biodiesel in place of diesel decreases HC, CO, CO 2 and PM, but increases NO x emissions (A Comprehensive Analysis… 2002).
One of the major reasons why biodiesel is becoming popular is that it increases the lubricity of the fuel (NBB, 2000). However, certain compounds in biodiesel can crystallize in cold weather leading to plugging of fuel filters and inhibiting the smooth flow of the fuel. The physical-chemical properties of biodiesel blends strongly influence the combustion process and pollutant formation (Raslavičius, Bazaras 2010a, 2010b. In the present work, emissions (NO x , HC, CO, CO 2 and PM) from biodiesel run transit buses (Ames Transit Agency) were measured using a portable emissions measurement system (PEMS). With the availability of good statistical models, emissions can be predicted without conducting emission tests which are expensive and time consuming. However, the emissions data collected in this study could not be studied using traditional statistical models (Mudgal 2009). Further, the emissions process involves a stochastic chemical reaction and therefore it is not possible to have deterministic models.
The use of soft-computing techniques has emerged as a feasible alternative in many situations when the problem is highly complex, non-linear and stochastic in nature and cannot be handled by traditional methods (Gopalakrishnan et al. 2009). This is attributed mainly to ability of these techniques to admit approximate reasoning, imprecision, uncertainty and partial truth. The term 'soft computing' applies to variants of and combinations under the four broad categories of evolutionary computing, Artificial Neural Networks (ANNs), fuzzy logic, and Bayesian statistics. Although each one has its separate strengths, the complementary nature of these techniques when used in combination (hybrid) makes them a powerful alternative for solving complex problems where conventional mathematical methods fail. Therefore, in this study, a hybrid neuro-fuzzy approach was employed primarily to demonstrate the ability of such techniques in modeling on-road emissions data.
In the recent past, quite a few studies have been conducted to model diesel exhaust engine emissions data mainly using ANNs (De Lucas et al. 2001;Clark et al. 2001;Canacki et al. 2006;Hashemi, Clark 2007;Ghobadian et al. 2009). However, not many of them focused on on-road real-time emissions data which is necessary for evaluating the impact of real-time driving conditions/modes. Also, such studies have not considered important engine parameters such as rpm, temperature and manifold absolute pressure which play a vital role in engine kinetics. In the present research, real time emissions from transit bus powered by various blends of biodiesel were measured and used in developing hybrid neuro-fuzzy emissions prediction models.
Two neuro-fuzzy methodologies, the Adaptive Neuro-Fuzzy Inference System (ANFIS) (Jang 1993;Jang et al. 1997) and the Dynamic Evolving Neuro-Fuzzy System (DENFIS) (Kasabov 1998) were employed. A brief review of both methodologies is presented first and then followed by a description of collected emissions data, model development, evaluation, and finally the study conclusions.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
One of the most important and promising research fields in recent years has been Nature-Inspired Heuristics, an area utilizing some analogies with natural or social systems for deriving non-deterministic heuristic methods to obtain better results in combinatorial optimization problems (Colorni et al. 1996). Fuzzy logic approach (FLA) is one such heuristic method (Zadeh 1965).
In contrast to classical set theory, where membership of the elements are assessed in binary terms (an element either belongs to or does not belong to the set), fuzzy sets are sets whose elements have degrees of membership. The fuzzy set theory permits the gradual assessment of the membership of elements in a set with the aid of a membership function valued in the real unit interval [0, 1].
Fuzzy inference systems (FIS) are powerful tools for the simulation of nonlinear behaviors utilizing fuzzy logic and linguistic fuzzy rules. In the literature, there are several inference techniques developed for fuzzy rulebased systems, such as Mamdani and Sugeno (Mamdani, Assilian 1975;Takagi, Sugeno 1985). In the Mamdani fuzzy inference methodology, inputs and outputs are represented by fuzzy relational equations in canonical rule-based form. In Sugeno FIS, output of the fuzzy rule is characterized by a crisp function and it was developed to generate fuzzy rules from a given input-output data set. Neuro-fuzzy systems are multi-layer feed forward adaptive networks that realize the basic elements and functions of traditional fuzzy logic systems (Jang et al. 1997). Since it has been shown that fuzzy logic systems are universal approximators, neuro-fuzzy control systems, which are isomorphic to traditional fuzzy logic control systems in terms of their functions, are also universal approximators. ANFIS is an extension of the Sugeno fuzzy model. 'Learning' process in ANFIS methodology, namely adaptation of membership functions to emulate the training data, is commonly performed by two techniques: backpropagation and hybrid learning algorithms. The hybrid optimization method is a combination of Least Squares Error (LSE) and backpropagation descent method. In hybrid learning algorithm, consequent parameters are identified in forward computation by LSE algorithm, and premise parameters are adjusted in backward computation using backpropagation algorithm.

Dynamic Evolving Neuro-Fuzzy Inference System (DENFIS)
DENFIS is a type of Evolving Connectionist System (ECOS) developed by Kasabov (1998). ECOS can be considered as open architecture Artificial Neural Networks (ANN) in which the neurons are added to their structures and the connection weights are modified as the system evolves based on a continuous input data stream in an adaptive, life-long, modular way (Watts 2004(Watts , 2009Kasabov, Song 2002). ECOS networks are resistant to catastrophic forgetting, having the ability to adapt to and learn new data as soon as they become available, do not have a limit to the amount of knowledge they can store and learn the examples very quickly compared to traditional Multi-Layered Perceptron BackPropagation Neural Networks (MLP-BP NN). The overall ECOS learning algorithm is based on accommodating new training examples within the evolving layer, either through modification of evolving neuron connection weights, or by adding new neuron to that layer.
DENFIS is a Takagi-Sugeno type of Fuzzy Inference System (FIS) with a Backpropagation (BP) algorithm (Kasabov, Song 2002) developed for both on-line and off-line learning. The DENFIS model forms a FIS dynamically for calculating the output depending on the input vector position in the input space. The dynamically formed FIS is based on fuzzy rules created during the past learning process. The DENFIS model for offline learning in batch mode was used in this paper. Two DENFIS models for offline learning were developed by (Kasabov, Song 2002): 1. a linear model, model I, 2. a Multi-Layer Perceptron (MLP) based model, model II. A first-order Takagi-Sugeno type fuzzy inference engine is employed in model I while model II employs an extended high-order Takagi-Sugeno fuzzy inference engine. In model II, several small-size, two-layer (the hidden layer consists of two or three neurons) MLPs are used to realize the function in the consequent part of each fuzzy rule instead of using a predefined function. The implementation of DENFIS offline learning process is described by Kasabov and Song (2002).

Description of Data
In this study, engine exhaust emissions data were collected from transit buses fueled with biodiesel at three blends -B0 (regular diesel), B10 (10% biodiesel + 90% regular diesel) and B20 (20% biodiesel + 80% regular diesel). Data were collected from April 2008 through July 2008 in Ames, Iowa between 7:30 AM and 5:00 PM on weekdays. The bus route consisted of corridors with frequent-stops, arterial sections with high operating speeds, curved sections, and signalized corridors. This helped in collecting data under various traffic conditions. At a frequency of 1 Hz, emissions (NO x , HC, CO, CO 2 and PM), speed, intake air temperature (T), engine rpm, and manifold absolute pressure (MAP) at the air intake were measured. In addition, the passengers in the bus between consecutive bus stops were also counted. A more detailed description of the data collection process is provided by Mudgal (2009). Frey et al. (2001Frey et al. ( , 2002 found that in general 2.5÷15% of the on-road emissions data would be invalid. This pertains to equipment failure, wrongly placed sample and reference lines and improper calibration. After removing erroneous data, finally about 120000 rows (33 or hours) of data were left for modeling. Out of this, some 11950 datasets were randomly sampled for using in developing the emissions prediction models. A new variable, vehicle specific power (VSP), derived from vehicle dynamics (Frey 1997;Frey et al. 2007) was used as another independent variable. Table 1 summarizes the variables used as inputs for emissions predictive modeling. Passenger count (PC) This represents the number of passengers in the bus. This imparts weight to the whole moving system which is responsible for higher power demand (Frey et al. 2007) Intake air temperature (T) This is the temperature of the air entering the engine chamber. The temperature has influence on emissions (Vijayan et al. 2008) Manifold Absolute pressure This is the pressure in the incoming air. This controls the emissions reactions and has significant effect on emissions (Frey et al. 2007) Histograms and quantile box plots for both the input and output variables are presented in Figs 1 and 2 which highlight the variability in the measured. The spacing between the different parts of the box (markers) in the box plots help indicate the degree of dispersion (spread) and skewness in the data, and identify outliers. Fig. 3 displays the correlation plots between inputs and outputs in a matrix arrangement. The diameter and intensity of the circles in each of the cell of the matrix is an indication of the magnitude of correlation between the corresponding pair of variables. The corre-lation strength of the linear relationships between each pair of the response variables were calculated using the Restricted (or Residual) Maximum Likelihood (REML) method. It is observed that both NO x and HC are more correlated to RPM and MAP than any other variables. In fact, RPM and MAP are among the highly correlated variables to all five emissions followed by VSP. The relationship between RPM, MAP, and NO x and RPM, MAP, and HC are captured in the form of contour plots displayed in Fig. 4.

ANFIS-Based Emission Prediction Models
The development and testing of the ANFIS models were carried out using the ANFIS toolbox in the MATLAB ® (Version 7.10.0 R2010a) environment. Since ANFIS allows only one output, separate ANFIS models were developed for each of the 5 emission outputs (NO x , HC, CO, CO 2 and PM). The inputs to all 5 ANFIS-based emission prediction models consisted of Percentage biodiesel (0, 10 or 20), Speed, Acceleration, RPM, VSP, Passenger count, T and MAP. From the randomly measured sampled data, 10 000 datasets were used for training the ANFIS models and 1950 independent datasets were used for testing. The input parameters were partitioned using the subtractive clustering technique. Based on parametric sensitivity analysis, the optimal values of range of influence and the squash factor for this problem were 0.1 and 1.25. Gaussian input membership functions were used. First order Sugeno FIS with linear output function was selected as the inference system. ANFIS structure was completed by the selection of hybrid learning algorithm and a batch learning scheme was used. In this learning algorithm, the BP algorithm is applied to the learning of premise parameters, while LSE algorithm is applied to the learning of consequent parameters. In the rule base, fuzzy variables were connected with T-norm (fuzzy AND) operators and rules were associated using max-min decomposition technique. The output part of each rule uses a linear defuzzifier formula; the total output of ANFIS is the weighting average of the output of each rule. Fig. 5 displays the final FIS structures of ANFIS prediction models along with the number of inputs, outputs, and the number of fuzzy rules. Table 2 summarizes the training and testing results of ANFIS prediction models. The results include the number of fuzzy rules, Root Mean Square Error (RMSE) values between the actual and predicted values for testing datasets, the standard error of predicted values divided by the standard deviation of measured values (S e /S y ), the coefficient of correlation (R), and the coefficient of determination (R 2 ) with reference to line equality. The formula for these model performance indicators are shown below: The R and R 2 are a measure of correlation between the predicted and the measured values and therefore, determines accuracy of the prediction model (higher R and R 2 equates to higher accuracy). The RMSE and S e /S y indicate the relative improvement in accuracy and thus a smaller value is indicative of better accuracy.   (36) in2 (36) in3 (36) in4 (36) in5 (36) in6 (36) in7 (36) in8 ( (38) in2 (38) in3 (38) in4 (38) in5 (38) in6 (38) in7 (38) in8 ( (37) in2 (37) in3 (37) in4 (37) in5 (37) in6 (37) in7 (37) in8 ( in2 (41) in3 (41) in4 (41) in5 (41) in6 (41) in7 (41) in8 ( in2 (34) in3 (34) in4 (34) in5 (34) in6 (34) in7 (34) in8 (34) (sugeno) sug81 34 rules Based on the prediction performance results for ANFIS-based prediction models, it can be concluded that the prediction accuracies for both NO x and CO 2 are good and for all other emissions, the prediction accuracy is fair based on the R 2 values. It is expected that higher predictive accuracy can be achieved by transformation of input variables and developing separate sub-prediction models for different ranges of emission predictions.

DENFIS-based Emission Prediction Models
The DENFIS-based emissions prediction models were developed in the NeuCom © v0.919 software environment. The NeuCom © v0.919 software was developed at the Knowledge Engineering and Discovery Research Institute (KEDRI), Auckland University of Technology, New Zealand. It is a self-programmable, learning and reasoning computer environment based on connectionist modules.
Several runs were conducted to optimize the parameter settings for DENFIS-based prediction models. First, the parameters to be optimized in the DENFIS model include: 1. Dthr -Distance Threshold which determines the maximum radius of the rule nodes in this network; 2. M-of-N -this determines the number of nodes which are referenced to estimate the output of the current sample; 3. Epochs -the number of epochs used to train or retrain the network originally. What is more, in order to estimate the accuracy of predictions, the DENFIS model outputs three result parameters: 1. NumRn -number of Rule Nodes (RNs) in the network; 2. NDEI -Non-Dimensional Error Index; 3. RMSE -Root Mean Squared Error. In addition, the system also outputs the CPU time (seconds) taken for training the network. Based on parametric sensitivity analysis, it was found that the optimal DENFIS parametric values for this problem are: Dthr = 0.1; M-of-N = 3; and Epochs = 2. Table 3 summarizes the DENFIS-based emission prediction models' statistics. Fig. 6 displays a sample rule which could be easily extracted from the DENFIS model in the form of if-then rules where X1÷X8 correspond to the eight inputs and Y represents the predicted emission output in the form of a simple linear equation. Similar to the performance results achieved for ANFIS prediction models, the prediction accuracies of DENFIS models in forecasting NO x and CO 2 are good while the prediction accuracies for HC and CO are fair based on R 2 values. The DENFIS model's prediction accuracy for PM is poor. In Fig. 7, the prediction performance of both the ANFIS and DENFIS models are compared for all five emissions for a small sampling of data. Both the neuro-fuzzy models tend to capture the trend in measured emissions although the similarity of magnitudes between the predicted and measured trends varies depending on the individual model's prediction accuracy. In general, the ANFIS predictions are more consistent and relatively more accurate compared to DENFIS predictions. In Fig. 8, the ANFIS predicted NO x and DENFIS predicted CO 2 are displayed in the form of response surface plots.

Summary and Conclusions
In this study, diesel engine exhaust emissions (NO x , HC, CO, CO 2 and PM) from biodiesel powered transit measured using a Portable Emissions Measurement System (PEMS) were modeled using the neuro-fuzzy approach. Although, quite a few studies have been conducted to model diesel exhaust engine emissions data using ANNs recently, not many of them focused on on-road real-time emissions data. Also, such studies have not considered important engine parameters such as rpm, temperature and manifold absolute pressure which play a vital role in engine kinetics which were considered in the present study. It was demonstrated that the on-road emissions data could be modeled using the neuro-fuzzy methodology which has the ability to admit approximate reasoning, imprecision, uncertainty and partial truth. The highest prediction accuracies were achieved for NO x and CO 2 using both ANFIS and DENFIS models. In general, both the ANFIS and DEN-FIS modeling methodologies showed similar prediction accuracies.    9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137 145 153 161 169 177 185