The Barzilai and Borwein Gradient Method with Nonmonotone Line Search for Nonsmooth Convex Optimization Problems

. The Barzilai and Borwein gradient algorithm has received a great deal of attention in recent decades since it is simple and eﬀective for smooth optimization problems. Whether can it be extended to solve nonsmooth problems? In this paper, we answer this question positively. The Barzilai and Borwein gradient algorithm combined with a nonmonotone line search technique is proposed for nonsmooth convex minimization. The global convergence of the given algorithm is established under suitable conditions. Numerical results show that this method is eﬃcient


Introduction
Consider the problem min x∈R n f (x), (1.1) where f : R n → R is a possibly nonsmooth convex function. The following problem min x∈R n F (x) (1.2) is the so-called Moreau-Yosida regularization of f , where F (x) = min z∈R n {f (z) + 1 2λ z − x 2 }, λ is a positive parameter, and · denotes the Euclidean norm. It is well known that problems (1.1) and (1.2) are equivalent in the sense that the solution sets of the two problems coincide are the same. Now we review some methods for nonsmooth optimization problems. The classical proximal point algorithm (see [37]) is regarded as a gradient method for solving problem (1.2). The gradient function of F can be proved to be semismooth under some reasonable conditions [22,41]. Based on these features, many algorithms were proposed for (1.2) (see [3,22,41]). The proximal methods have been proved to be effective in dealing with the difficulty of evaluating the function value of F (x) and its gradient ∇F (x) at a given point x (see [2,6,9,31,48,49]). Lemaréchal [32] and Wolfe [50] initiated a giant stride forward in nonsmooth optimization by the bundle concept, which can handle convex and nonconvex f . All bundle methods carry two distinctive features (see Lemaréchal [33] and Zowe [60] in detail). Kiwiel [30] proposed a bundle variant, which is close to bundle trust iteration method (see [45]). Some similar results can be found in [29,30,36]. In the past decades, many trust region algorithms to minimize nonsmooth objective function have been presented (see [7,8,18,23,28,19,39,40,57]).
The spectral gradient method (also named the two-point stepsize method) was originated in [1] for unconstrained optimization problems. This method consists essentially of a steepest descent method, where the choice of the stepsize along the antigradient direction is potentially derived from a two-point approximation to the secant equation underlying the quasi-Newton method [25]. If the objective function is a strictly convex quadratic, Raydan [42] proved that the two-point stepsize gradient method is globally convergent. For the nonquadratic case, Raydan [43] incorporated a globalization scheme of the two-point stepsize gradient method by using the technique of nonmonotone line search. Dai et al. extended this method to box-constrained quadratic programming [11] and unsymmetric linear equations [16], respectively. Some authors use the method for solving constrained optimization problems (see [38,55]) and nonlinear equations (see [54,58]). The effectiveness of the classical spectral gradient method has been significantly improved by incorporating it with new and fast nonmonotone line search techniques (e.g. [15]). The spectral gradient method does not guarantee a descent in the objective function at each iteration, but performs better than the classical steep descent (SD) method in practice. An interesting fact is that an alternating strategy that uses the SD step and spectral gradient step alternately can accelerate the convergent rate of the spectral gradient method. An important work on this scheme is due to the cycle Barzilai-Borwein (CBB) method [10], see also [12]. An implementation of the CBB method, combined with a nonmonotone line search, shows that this method performs better than the existing spectral gradient method. It is even competitive to some other well-known standard codes (see [12]). Due to its simplicity and numerical efficiency, the spectral gradient method has received a great deal of attention in recent decades (see [4,13,14,21,27,51,52]). However, the spectral gradient algorithms are only used to solve smooth optimization problems.
The first nonmonotone line search framework was developed by Grippo, Lampariello, and Lucidi in [24] for Newton's methods. Many subsequent papers have exploited nonmonotone line search techniques of this nature (see [5,26,34,59]). There are some spectral gradient methods with nonmonotone line search technique for optimization problems (see [47,53]). Although these nonmonotone technique work well in many cases, there are some drawbacks. First, a good function value generated in any iteration is essentially discarded due to the max in the nonmonotone line search technique. Second, in some cases, the numerical performance is very dependent on the choice of M , where M > 0 is integer (see [24,46]). In order to overcome these two drawbacks, Dai and Zhang [17] proposed an adaptive nonmonotone line search which is combined with the two-point gradient method for optimization problems. Moreover, Zhang and Hager [56] presented a new nonmonotone line search technique. Numerical results show that the new nonmonotone technique is better than the normal nonmonotone technique and the monotone technique.
It is well known that the trust region methods, the Newton and quasi-Newton methods, and the proximal gradient methods which were firstly used to solve smooth optimization problems are wildly used in nonsmooth fields. The question is whether the spectral gradient method can be extended to nonsmooth problems. In this paper, we answer this question positively. Motivated by the above observations, we present a spectral gradient method which combines with a nonmonotone line search technique for nonsmooth optimization problems. The main attributes of this presented method are stated as follows.
• All search directions are sufficiently descent, which shows that the functions are decreasing. All search directions belong to a trust region, which hints that this method has a good convergent result.
• This method possesses the global convergence.
• Numerical results show that this method is more effective than the standard method.
This paper is organized as follows. In the next section, we briefly review some basic results about the objective function of (1.2). In Section 3, the new algorithm is stated. In Section 4, we prove the global convergence of the proposed method. Numerical results are reported in Section 5. Throughout this paper, without specification, · denotes the Euclidean norm of vectors or matrices.

Results of Convex Analysis and Nonsmooth Analysis
Some basic results in convex analysis and nonsmooth analysis, which will be used later, are reviewed in this section. Let and denote p(x) = argmin θ(z). Then p(x) is well-defined and unique since θ(z) is strongly convex. By the definition of F (x), we have In what follows, we denote the gradient of F by g. Some features about F (x) can be found in [6,9].
(i) The function F is finite-valued, convex, and everywhere differentiable with Moreover, the gradient mapping g : R n → R n is globally Lipschitz continuous with modulus λ, i.e., (iii) By the Rademacher theorem and the Lipschitzian property of ∇F , for each x ∈ R n , we conclude that the set of generalized Jacobian matrices It is obviously that F (x) and g(x) can be obtained through the optimal solution of argmin z∈R n θ(z). However, p(x) is difficult or even impossible to solve exactly. Thus we can not apply the exact value of p(x) to define F (x) and g(x). Fortunately, for each x ∈ R n and any ε > 0, there exists a vector p α (x, ε) ∈ R n satisfying Thus, we can use p α (x, ε) to define approximations of F (x) and g(x) by respectively. Some implementable algorithms to find p α (x, ε) for a nondifferentiable convex function are introduced in [8]. A remarkable feature of F α (x, ε) and g α (x, ε) is given as follows [22].
The above proposition says that we can compute approximately F α (x, ε) and g α (x, ε), by choosing parameter ε small enough, which may be arbitrarily close to F (x) and g(x), respectively.

Algorithm
The following iterative formula is used by spectral gradient method ε k ) is the search direction, and two choices of the scalar α k are . These two formulas are motivated by [1].
Step 1. Termination Criterion. Stop if x k satisfies termination condition g α (x k , ε k ) < . Otherwise go to the next step.
Remark. It is not difficult to see that J k+1 is a convex combination of J k and F α (x k+1 , ε k+1 ). Since J 0 = F α (x 0 , ε 0 ), it follows that J k is a convex combination of the function values F α (x 0 , ε 0 ), F α (x 1 , ε 1 ), . . . , F α (x k , ε k ). The choice of ρ controls the degree of nonmonotonicity. If ρ = 0, the line search is the usual monotone Armijo line search. If ρ = 1, is the average function value (these cases have been analyzed by Yu-Hong Dai or see [56]).

Properties and Global Convergence
In this section, we turn to the behavior of Algorithm 1 when it is applied to problem (1.1). In order to establish the global convergence result, the following assumptions are needed. where the matrix V k ∈ ∂ B g(x k ).
(ii) F is bounded from below.
(iii) For sufficiently large k, ε k converges to zero.
From the definition of d k = −g α (x k , ε k ), we can get The above two relations (4.2) and (4.3) show that the search direction possesses the sufficiently descent property and belongs to a trust region. Based on (4.2) and (4.3), similar to Lemma 1.1 in [56], it is not difficult to get the following lemma. So we only state it as follows but omit the proof.
Proof. In order to get the results of this theorem, we first show that lim k→∞ g α (x k , ε k ) = 0 (4.8) holds. Suppose that (4.8) is not true. Then there exist constants 1 > 0 and By (3.1), (4.2), (4.4), and (4.9), we have By the definition of J k+1 , we obtain Since F α (x, ε) is bounded from below and F α (x k , ε k ) ≤ J k for all k, we conclude that J k is bounded from below. From (4.10), we have By the definition of E k+1 , we have E k+1 ≤ k + 2. Then the relation (4.11) contradicts to this case. So (4.8) holds. Using (2.6), we get Which in view of Assumption A(iii), yields lim k→∞ g(x k ) = 0. From properties of F (x), we get g(x k ) = (x k − p(x k ))/λ. By (4.12) and (4.13), we have x * = p(x * ). Therefore x * is an optimal solution of (1.1).

Numerical Results
In this section, we test the numerical behavior of Algorithm 1. All the nonsmooth problems with initial point of Table 1 can be found in [35]. Table 1 contains the names of the test problems and global minimum values of the functions, where f ops (x) is a global minimum value of a function. The algorithm is implemented by Matlab 7.6, all experiments are run on a PC with CPU Intel Pentium Dual E7500 2.93GHz, 2G bytes of SDRAM memory, and Windows XP operating system. The parameters were chosen as s = 0.5, λ = 1, ρ = 0.75, σ = 0.9, and ε k = 1/(NI + 2) 2 (NI is the iteration number). The program is stopped when the condition g α (x, ε) ≤ 10 −10 was satisfied. In order to show the performance of the given algorithm, we also list the recent results of paper [44] (New trust region method, BT(S-F)). For BT(S-F) method, the parameters were chosen as ρ = 0.45 and Δ = 0.5. The columns of Table 2  Dolan and Moré [20] gave a new tool to analyze the efficiency of Algorithms. In order to show their performance, this technique will be used in this paper. From Table 2, it is easy to see that Algorithm 1 performs better than the BT(S-F) method for most of the test problems. Compared with the optimization value, the final function value is acceptable for both of these methods. Figures 1, 2, and 3 show that the performance of the iteration number, the function number, and the CPU time respectively. It is not difficult to see that the given algorithm is more competitive than the new trust region method. Overall, the preliminary numerical results indicate that the proposed method is competitive to the other method.

Conclusions
In this paper, we propose a spectral gradient method for nonsmooth convex minimization. The global convergence is established under suitable conditions. Numerical results show that this method is interesting. Considering the simplicity and numerical efficiency of the spectral gradient method, we propose Algorithm 1 for nonsmooth problems. The main work of this paper is to ex- tend the spectral gradient method to solve nonsmooth problems. From Algorithm 1, it is easy to see that this method is not difficult to be performed. We can conclude that it may become one of the most simple and efficient methods for nonsmooth problems. The parameters λ > 0 and s > 0 may influence the performance of the method, so the choice of the positive constants λ and s are our further work.