Statistical classification based on observations of random Gaussian fields

Abstract The problem of classification of objects located in domain D ⊂ R2 based on observations of random Gaussian fields with a factorized covariance function is considered. The first‐order asymptotic expansion for the expected error regret is presented. Obtained numerical results allow us to compare suggested expansion for some widely applicable models of spatial covariance function.


INTRODUCTION
The notion that data close together in space are likely to be correlated is natural. And one of the most important (sometimes even unique) statistical characteristic of random eld which describes the statistical spatial relationship between observations is a spatial covariance function (r s ) = E f(X (r) ; E (X (r)))(X (s) ; E (X (s)))g, where fX (t) t 2 Dg is an observed random eld. We restrict our attention on covariance functions, which depend only on the distance h = r ; s between points, i.e. we consider only second-order stationary random elds. When (r s ) = (h) is a function on both the magnitude and direction of h, t h e c o variance function is said to be anisotropic otherwise, it is said to be isotropic one.
In geostatistics literature, for the analysis of spatially correlated data the concept of a variogram is used, see, e.g., Matheron (1962), Cressie (1994) and others. This function is similar to the covariance function. By the de nition, the variogram is var(X (r) ; X (s)) = 2 (r ; s), r s 2 D. (The quantity 2 ( ) has been called a variogram (as ( ) -semivariogram) by Matheron (1962).) There is a simple relationship between the semivariogram and covariance function: (h) = (0) ; (h) : So, we i n terchangeably can use one of these concepts. In general, using variograms is better than using covariances because the estimator of variogram obtained by the method-of-moments (Matheron, 1962) is unbiased.
Christensen (1989), Cressie (1994) present several covariance models, which are most often used in geostatistics. We consider three of them.
The isotropic spherical covariance function is given by expression for 0 , 1 , 2 nonnegative. It is anisotropic covariance function, when t 6 = 1 .
In the case of t = 1 it becomes a well known isotropic covariance function often called the Gaussian covariance function. The behavior of the Ornstein { Uhlenbeck model is similar to that of the exponential model. However, the covariances at distances greater than one approach zero much more rapidly than in the exponential model. Also, for small distances, the covariance approaches the value 1 much more rapidly then does the exponential. In our paper we use the correlation functions, which can be easy de ned from covariance function by the relation (h) = (h) = (0).

CLASSIFICATION PROBLEM
Suppose 1 , 2 are two mutually exclusive and exhaustive classes of objects. Let X beap-dimensional feature vector, which is measured on each object. For objects randomly chosen from l , X follows the multivariate distribution with density function p l (x l ) = p l (x), which belongs to the parametric family of regular densities F l = fp l (x l ) l 2 l R m g, l = 1 2.
Discriminant analysis deals with the problem of identifying the class of object for which X is measured. For a zero-one loss function, the Bayes classi cation rule (BCR) d B (x) minimizing the probability of misclassi cation is equivalent to assigning X = x to l if l p l (x) = max k=1 2 k p k (x) where l is the prior probability o f l . Then BCR d B (x) could be de ned as d B (x) = arg max k=1 2 k p k (x) : Let P B denote the probability of misclassi cation for BCR d B (x) o r Bayes error rate (see, e.g., 1]).
In practical applications, the density functions fp l (x)g are seldom completely known. Often they are only known up to the parameters f l g, i.e. we may only assert that p l (x) is one element of a parametric family of density functions F l . Under such conditions, it is customary to estimate l from the training sample T l = fX l1 : : : X lN l g from l , f o r l = 1 2. Put T = T 1 T 2 , N = N 1 + N 2 .
Let b l be the maximum likelihood estimator (MLE) of l from T l (l = 1 2). The estimator of rule d B (x) is called a plug-in rule d B x b 1 b 2 and is de ned by d B x b 1 b 2 = arg max k=1 2 k p k x b k : The actual error rate P A of d B x b 1 b 2 is the probability of misclassifying a randomly selected object with feature X independent o n T and is designated by Definition 2.1. Expected error regret (EER) for d B b 1 b 2 is the expectation of the di erence between P A and P B with respect to the distribution EER= E (P A ) ; P B : The purpose of this article is to nd an asymptotic expansion for EER. The case of independent normally distributed observations in training sample from one of two classes with l = , l = 1 2, was considered in 2]. 3] has been made the generalization for the case of arbitrary number of classes (l 2) and regular class-conditional densities.

MAIN RESULTS
Suppose that any point r = (r 1 r 2 ) 2 D R 2 can be assigned to one of two prescribed above classes 1 ( 1 2 )).
Suppose that X r means the observation of X at point r 2 D. A decision is to be made as to which class the randomly chosen point r 2 D is assigned on the basis of observed value of X r . Let where 1 , 2 2 R p , 1 6 = 2 and the noise r = ; 1 r : : : p r is the observation of the second-order stationary multivariate random eld at location r 2 D with zero-mean vector.
The essential assumption is that f r g is Gaussian eld with spatially factorized covariance. Hence, the common class-conditional covariance between any t wo observations X r and X s at points r s 2 D belonging to l can be factorized as cov ( X r X s =r s 2 l ) = l (h) , (r 6 = s), where l ( ) is the spatial correlation function (l = 1 2), and h = r ; s, = c o v ( r r ).
Also here we assume that the e ect of cross-correlation between samples from di erent classes is negligible. In this paper we suppose, that it is equal to zero, i. e., cov(X r X s =r 2 1 s 2 2 ) = 0 .
Let D l = s l 1 : : : s l N l D be the set of points belonging to class l , l = 1 2. Then X lj means the observation of X at point s l j , i . e . X lj = X ; s l j , j = 1 : : : N l , l = 1 2.
Then the expectation for N l p 1 stacked vector T V l = X = l1 : : : X = lN l = is + l = 1 N l l (l = 1 2) where 1 N l is the N l -dimensional vector of ones, and is the Kronecker product. The covariance matrix of T V l is where C l is the spatial correlation matrix of order N l N l , whose (i,j)th element i s ; s l i ; s l j (i j = 1 : : : N l ).
Suppose that and C l are known and l are unknown (l = 1 2). In this Solving equation @ ln L l @ l = 0 , w e complete the proof of Lemma. MLE under spatial sampling of Gaussian random elds was studied by 4]. They gave the regularity conditions which ensure consistency and asymptotic normality of the parameter estimators. We assume that these conditions hold. tr P (2) l k E ( b l b k ) (3.8) where P (1) l is the vector of the rst-order derivatives of P A by b l evaluated at l (l = 1 2). Similarly, P (2) l k denotes the matrix of the second-order derivatives of P A by b l and b k evaluated at l and k , respectively, (l k = 1 2). In Corollary 3.1. Whether T l consists of statistically independent X lj , j = 1 : : : N l , then c l = N l in formula (3.7).
The corollary holds since C ;1 l = I for statistically independent X lj , j = 1 : : : N l .
The result of the proved theorem could be used in obtaining the optimal sampling design that ensures the minimum of asymptotic EER for the xed training sample size N:

EXAMPLE
As an example we consider the integer regular 2-dimensional lattice and use the second-order neighborhood scheme for training sample.
Also we assume that there are two di erently taken training samples: 1) 4 spatially symmetric observations in training sample for each class 2) 5 observations in training sample for the rst class and 3 for the second one.  In Table 1 values of A EERwith 1 = 2 = 0:5 are presented. Here AEER ind is AEER in the case of independent observations (considered for comparison), AEER s , AEER e , AEER ou are A EERfor spherical ( s ), exponential ( e ) and Ornstein-Uhlenbeck ( ou ) correlation functions, respectively. As it was already mentioned, the spherical correlation function is isotropic one. We c hose the range value 2 = 3 for this function. In general, e and ou are anisotropic functions, but by c hoosing the value of t = 1 w e obtain isotropic Table 1. was used. The third and fourth rows contain AEER for 0 = 0 and 0 = 3 4 , respectively, but in training sample 5 observations for the rst class and 3 for the second one was taken.

Values of A EERfor di erent correlation functions. P B A EER ind A EER s A EER is e AEER anis e AEER is ou A EER anis
For all described cases, the AEER approaches zero when distances increases. As it was expected, AEER for the case of independent observations is the smallest one.
The comparison of AEER in the case of independent observations and in the case of dependent observations (three considered schemes of correlation functions) is presented in Table 2. In the rst row of this table ratios for 0 = 0 (no nugget e ect) and di erent training sample schemes (upper quantity in the cell is for the Scheme 1 and lower one for the Scheme 2) are presented, as in the second row the same situation is presented only the nugget e ect 0 = 3 4 is used. It can be seen from Table 2, that the bigger the nugget e ect, the closer ratio to one (see the second row of Table 2), because with increasing the nugget e ect the situation approaches independent c a s e .
Comparing columns (ratios for the same nuggets), we can determine which of the correlation functions gives smaller AEER. For instance, AEERIND AEERS = 0:3519 and AEERIND AEER IS E = 0 :2688 (for 0 = 0 ) the ratio of these two ratios is equal 1.31, so, Spherical correlation function is better (gives smaller AEER) than Exponential isotropic correlation function for the Scheme 1. In a similar way other functions can be compared. It is easy to see, that Spherical function gives smallest AEER in all considered cases. Also it can be shown, that isotropic correlation functions give smaller AEER than anisotropic do.