Fashion & Beauty

A new robust line search technique based on Chebyshev polynomials

Description
A new robust line search technique based on Chebyshev polynomials
Published
of 14
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A new robust line search technique based on Chebyshev polynomials K.T. Elgindy a, * , Abdel-Rahman Hedar b a Department of Mathematics, Faculty of Science, Assiut University, Assiut, Egypt  b Department of Computer Science, Faculty of Computer and Information Sciences, Assiut University, Assiut, Egypt  a r t i c l e i n f o Keywords: Unconstrained optimizationUnivariate optimizationNewton’s methodTest functionsInitial pointSpectral methodsDifferentiation matrixChebyshev polynomialsChebyshev points a b s t r a c t Newton’s method is an important and basic method for solving nonlinear, univariate andunconstrained optimization problems. In this study, a new line search technique basedon Chebyshev polynomials is presented. The proposed method is adaptive where it deter-mines a descent direction at each iteration and avoids convergence to a maximum point.Approximations to the first and the second derivatives of a function using high orderpseudospectral differentiation matrices are derived. The efficiency of the new method isanalyzed in terms of the most popular and widely used criterion in comparison with New-ton’s method using seven test functions.   2008 Published by Elsevier Inc. 1. Introduction Optimizationhas beenexpandinginmanydirectionsat anastonishingrateduring thelast fewdecades. Newalgorithmicand theoretical techniques have been developed, the diffusion into other disciplines has proceeded at a rapid pace, and ourknowledge of all aspects of the field has grown even more profound. At the same time, one of the most striking trends inoptimizationistheconstantlyincreasingemphasisontheinterdisciplinarynatureofthefield.Optimizationhasbeenabasictool in all areas of applied mathematics, engineering, medicine, economics and other sciences [30].The last 30 years has seen the development of a powerful collection of algorithms for unconstrained optimization of smoothfunctions.Allalgorithmsforunconstrainedminimizationrequiretheusertosupplyastartingpoint,whichisusuallydenotedby  x 0 .Beginningat  x 0 ,optimizationalgorithmsgenerateasequenceofiterates f  x k g 1 k ¼ 0  thatterminatewheneithernomoreprogresscanbemadeorwhenitseemsthatasolutionpointhasbeenapproximatedwithsufficientaccuracy.Therearetwo fundamental strategies [23] for moving from the current point  x k  to a new iterate  x k þ 1 , namely, line search and trustregion strategies.The method described in this paper belongs to the class of line search procedures. Line search methods are fundamentalalgorithms in nonlinear programming. Their theory started with Cauchy [8] and they were implemented in the first elec-troniccomputersinthelate1940sandearly1950s.Theyhavebeenintensivelystudiedsincethenandtodaytheyarewidelyused by scientists and engineers. Their convergence theory is well developed and is described at length in many good sur-veys, as [22], and even in text books, like [5] and [23]. Line search strategy (also called one-dimensional search, refers to an optimizationprocedure for univariable functions. Itis the base of multivariable optimization [30]), plays an important role in multidimensional optimization problems. Linesearchmethodsformthebackboneofnonlinearprogrammingalgorithms,sincehigherdimensionalproblemsareultimatelysolved by executing a sequence of successive line searches [19]. In particular, iterative algorithms for solving such 0096-3003/$ - see front matter   2008 Published by Elsevier Inc.doi:10.1016/j.amc.2008.08.013 *  Corresponding author. E-mail addresses:  kelgindy@aun.edu.eg (K.T. Elgindy), hedar@aun.edu.eg (A.-R. Hedar). Applied Mathematics and Computation 206 (2008) 853–866 Contents lists available at ScienceDirect Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc  optimization problems typically involve a ‘‘line search” at every iteration [9]. In the line search strategy, the algorithmchoosesadirection,calledasearchdirection,  p k  andsearchesalongthisdirectionfromthecurrentiterate  x k  foranewiteratewithalowerfunctionvalue. Thedistancetomovealong  p k  canbefoundbyapproximatelysolvingthefollowingone-dimen-sional minimization problem [23] to find a step length  a : / ð a Þ ¼  min a > 0  f  ð  x k  þ a  p k Þ :  ð 1 : 1 Þ Hence, one of the reasons for discussing one-dimensional optimization is that some of the iterative methods for higherdimensional problems involve steps of searching extrema along certain directions in  R n [32]. Finding the step size,  a k , alongthe direction vector  p k  involves solving the subproblem to minimize (1.1) which is a one-dimensional search problem in  a k [15]. Hence, theone-dimensional searchmethodsaremost indispensableandtheefficiencyof anyalgorithmpartlydependson them [25].The search methods minimize  / ð a Þ  subject to  a 6 a 6 b . Since the exact location of the minimum of   /  over  ½ a ; b   is notknown, this interval is called the interval of uncertainty[4]. Inthe derivative methods suchas Bisection, Cubic interpolation[19], and Regula-Falsi interpolation [27] to find only one search point in the interval of uncertainty is sufficient. Evaluating the derivative  / 0 ð a Þ  at this point, a new interval of uncertainty can be defined [17]. It is known that the Cubic interpolationmethodwhichis essentially a curve fitting technique is moreefficient method thanthe others but in all interpolationmeth-ods, the speed and reliability of convergence depends on trial function [15].The non-derivative methods use at least two function evaluations in each iteration [17]. This is the minimumnumber of functionevaluations needed for reducingthe lengthof the interval of uncertainty ineach iteration. Aportionof this intervalis discarded by comparing with the function values at the interior points [16].In most problems, functions possess a certain degree of smoothness. To exploit this smoothness, techniques based onpolynomialapproximationaredevised.Avarietyofsuchtechniquescanbedeviseddependingonwhetherornotderivativesofthefunctionsaswellasvaluesarecalculatedandhowmanypreviouspointsareusedtoestablishtheapproximatemodel.Methods of this class have orders of convergence greater than unity [17].Our main purpose in this paper is to present a new robust line search technique based on Chebyshev polynomials. Theuseful properties of Chebyshev polynomials enable the proposed method to performin an efficient way regarding the num-ber of function evaluations, convergence rate and accuracy. The new technique is adaptive, where the movement at eachiterationisdeterminedviaadescentdirectionchoseneffectivelysoastoavoidconvergencetoamaximumpoint. Thederiv-atives of the functions are approximated via high order pseudospectral differentiation matrices. The efficiency of the newmethodisanalyzedintermsof themost popular andwidelyusedcriterionincomparisonwiththeclassical Newton’smeth-od using seven test functions. The results show the superiority and efficiency of our proposed method.The rest of this paper is organized as follows. In the following section, we devise an efficient algorithm for solving non-linear, univariate and unconstrained optimization problems. In Section 3, we highlight the two standard methods: the finitedifferenceandthepseudospectraldifferentiationmatricesmethods,forcalculatingderivativesofafunction.InSection4, weprovide a detailed presentation for computing higher derivatives of Chebyshev polynomials. In Section 5, we show how toconstruct the entries of the differentiation matrices. In Section 6, a comparison between pseudospectral approximation andfinite difference methods is introduced showing the advantages of the first approach over the later. In Section 7, we presentournewmethodbasedonChebyshevpolynomials.Finally,inSection8,wepresentnumericalresultsdemonstratingtheeffi-ciency and accuracy of our method followed by some concluding remarks and future work. 2. The method Linesearchprocedurescanbeclassifiedaccordingtothetypeofderivativeinformationtheyuse.Algorithmsthatuseonlyfunction values can be inefficient, since to be theoretically sound, they need to continue iterating until the search for theminimizer is narrowed down to a small interval. In contrast, knowledge of derivative information allows us to determinewhether a suitable step length has been located [23].Before we introduce our new method, we highlight two important methods that require knowledge of derivative infor-mation,namely,Newton’smethodandtheSecantmethod.Inthesetwomethods,weassumethat / ð a Þ  isaunimodalsmoothfunction in the interval of search and it has a minimumat an interior point of the interval. The problem of finding the min-imum then becomes equivalent to solving the equation  / 0 ð a Þ ¼  0 [21].  2.1. Newton–Raphson method Onecouldsaythat Newton’smethodfor unconstrainedunivariateoptimizationissimplythemethodfor nonlinearequa-tions applied to  / 0 ð a Þ ¼  0. While this is technically correct if   a 0  is near a minimizer, it is utterly wrong if   a 0  is near a max-imum. A more precise way of expressing the idea is to say that  a 1  is a minimizer of the local quadratic model of   /  about  a 0 , m 0 ð a Þ ¼  / ð a 0 Þ þ  / 0 ð a 0 Þð a  a 0 Þ þ 12 / 00 ð a 0 Þð a  a 0 Þ 2 : If   / 00 >  0, then the minimizer  a 1  of   m 0  is the unique solution of   m 0 0 ð a Þ ¼  0. Hence, 854  K.T. Elgindy, A.-R. Hedar/Applied Mathematics and Computation 206 (2008) 853–866   0  ¼  m 0 0 ð a 1 Þ ¼  / 0 ð a 0 Þ þ ð a 1   a 0 Þ / 00 ð a 0 Þ : Therefore, a 1  ¼  a 0    / 0 ð a 0 Þ = / 00 ð a 0 Þ :  ð 2 : 1 Þ Then(2.1) simplysaysthat a 1  ¼  a 0  þ s . However,if   a 0  isfarfromaminimizer, / 00 ð a 0 Þ  couldbenegativeandthequadraticmodelwillnothavelocalminimizers.Moreover, m 0 ,thelocallinearmodelof  / 0 about a 0 ,couldhaverootswhichcorrespondtolocal maximaorinflectionpointsof   m 0 . Hence, wemusttakecarewhenfarfromaminimizerinmakingacorrespondencebetween Newton’s method for minimization and Newton’s method for nonlinear equations [18]. The following theoremstates that Newton’s iteration converges  q -quadratically to  a  under certain assumptions.  Theorem 2.1.  Let B ð : Þ  denotes an open ball containing a local minimum  a  and assume that  (i)  /  is twice differentiable and  j / 00 ð  x Þ  / 00 ð  y Þj 6 c j  x   y j , (ii)  / 0 ð a  Þ ¼  0 , (iii)  / 00 ð a  Þ  >  0 .Then there is  d  >  0  such that if   a 0  2  B ð d Þ , the Newton iteration a n þ 1  ¼  a n    / 0 ð a n Þ = / 00 ð a n Þ ; converges q-quadratically to  a . Proof.  See [18, p. 16].  h Newton’s method possesses a very attractive local convergence rate, by comparison with other existing methods. How-ever, the methodmaysuffer fromimplementationdifficulties and its convergenceis sometimes questionabledue to the fol-lowing drawbacks:   Drawbacks in Newton’s method  :– Convergence to a maximum or an inflection point is possible. Moreover, further progress to the minimum is not pos-sible because the derivative is zero.– When  / 00 ¼  0, Newton’s search direction  s  ¼  / 0 = / 00 is undefined.– Newton’s method works well if   / 00 ð a Þ  >  0 everywhere. However, if   / 00 ð a Þ 6 0 for some  a , Newton’s method may fail toconverge to a minimizer. Hence, if   / 00 6 0, then  s  may not be a descent (downhill) direction.– The need to derive and code expressions for the second derivative, a process which is always susceptible to humanerror and the derivative may not be analytically obtainable.– There is a requirement to start close to the minimumin order to guarantee convergence. This leaves the possibility of divergence open.  2.2. The secant method The secant method is a minimization algorithm that uses a succession of roots of secant lines to better approximate alocal minimum of a function  / . It is a simplification of Newton’s method which is the most popular method because of its simplicity, flexibility, and speedy. The method will converge to  a  with an order of convergence at least two. A potentialproblem in implementing it is the evaluation of the first and second derivatives in each step. We know that the condition / 00 ð a Þ – 0 in a neighborhood of   a  is required. To avoid these problems, an attempt is that  / 00 ð a k Þ  is replaced by / 00 ð a k Þ   / 0 ð a k Þ   / 0 ð a k  1 Þ a k   a k  1 ;  ð 2 : 2 Þ whichis a linear approximationof   / 00 ð a k Þ . The Secant method as a simplificationof Newton’s methodis obtainedas follows: a k þ 1  ¼  a k   ð a k   a k  1 Þ / 0 ð a k Þ   / 0 ð a k  1 Þ / 0 ð a k Þ ;  k  ¼  1 ; 2 ; . . .  ð 2 : 3 Þ Ascanbeseenfromtherecurrencerelation,thesecantmethodrequirestwoinitialvalues, a 0  and a 1 ,whichshouldideallybechosento lie close to the local minimum. The main limitationof this method withrespect to Newton method is the order. Itis  q -superlinearly convergent with order 1.618, the golden mean.We propose a new line search technique based on Chebyshev polynomials. In particular, we describe a line search thatuses second-orderinformationinanefficient manner. This informationis introducedthroughthe computationof anegativecurvature directionin each iteration. The approach proposed in this paper will be applied to the one-dimensional problems,although the underlying ideas can also be adapted to large dimensional problems with limited modifications.As we have shown before, Newton’s method, despite it possesses the luxury of quadratic local convergence, suffers fromsome problems. First, we focus on the first drawback mentioned before. We refer to the following definition. K.T. Elgindy, A.-R. Hedar/Applied Mathematics and Computation 206 (2008) 853–866   855  Definition 2.1  ( Descent direction ). Let thefunction  f   : R n ! R [ f1g  begiven. Let  x  2 R n beavector suchthat  f  ð  x Þ  isfinite.Let  x  2 R n . We say that the vector  d  2 R n is a descent direction with respect to  f   at  x  if   9 d  >  0 such that  f  ð  x þ td Þ  <  f  ð  x Þ  forevery  t   2 ð 0 ; d   holds [1]. Proposition 2.2.  [Sufficient condition for a descent direction]  Let f   : R n ! R be differentiable at x  2 R n . If there exists a vector d  2 R n such that  h d ; r  f  ð  x Þi ¼  d T r  f  ð  x Þ  <  0 ;  ð 2 : 4 Þ then d is called a descent direction of f at x [ 30 ]. The reason for Definition 2.1 is quite simple. Just notice that if we define  h ð t  Þ ¼  f  ð  x  þ td Þ , then by the chain rule, h 0ð 0 Þ ¼  r  f  ð  x Þ d .Therefore,if  d isadescentdirection,thisderivativeisnegative,andhencethevaluesof   f  decreaseaswemovealong  d  from  x , at least locally [24]. In order to guarantee global convergence, we require sometimes that  d  satisfies the suf-ficientdescentcondition d T  r  f  ð  x Þ 6  c  k r  f  ð  x Þk 2 , where c   >  0isaconstant [28]. Hence, intheone-dimensional case, thisdef-inition means that the multiplication of the search direction with the first derivative must be less than zero. In Newton’smethod, the search direction is given by s  ¼   / 0 ð a Þ / 00 ð a Þ ; so, in order to satisfy condition (2.4), we must have s / 0 ð a Þ ¼  / 0 2 ð a Þ / 00 ð a Þ  <  0 ; which implies that  / 00 ð a Þ  >  0. This motivates us to enforce  / 00 to be positive at each iteration in order to ensure that we areheading towards the desired minimum point. We propose to flip the sign of   / 00 whenever  / 00 <  0, that is, we set  s k  ¼ :   s k ,then we compare the function values at the last two approximations  a k  1  and  a k . If   / ð a k Þ  >  / ð a k  1 Þ , set s k  ¼ :  b s k ; b  2  I     R þ and repeat the process again.  I   ¼ ð 0 ; 1 Þ  may be an appropriate choice. This procedure guarantees thatthe method will not converge to a maximum point. But, the method may still converge to an inflection point. This can alsobe handled by setting  s  ¼  b  whenever  / 00 ¼  0. If   / ð a k  1  þ s Þ  <  / ð a k  1   s Þ , we set  a k  ¼  a k  1  þ s . Otherwise, we set a k  ¼  a k  1   s . The following Algorithm summarizes the preceding discussion.  Algorithm 2.3. Step 1  choose  a 0 ; b ; c ; e  and set the iteration counter  k  ¼  0. Step 2  calculate  / 0 ð a k Þ  and  / 00 ð a k Þ . Step 3  if   j / 0 ð a k Þj  <  e  then do steps 4–6. Step 4  if   / 00 ð a k Þ  >  0 then output( a k );stop. Step 5  if   / ð a k  þ b Þ  <  / ð a k   b Þ  thenset  a k þ 1  ¼  a k  þ b ,else set  a k þ 1  ¼  a k   b . Step 6   set  b  ¼  c b ; k  ¼  k  þ 1, and go to step 2. Step 7   set  s  ¼  / 0 ð a k Þ = / 00 ð a k Þ . Step 8  if   / 00 ð a k Þ  >  0 thenset  a k þ 1  ¼  a k  þ s ,else set  a k þ 1  ¼  a k   s . Step 9  set  k  ¼  k  þ 1, and go to step 2. Remarks 2.1.  Algorithm2.3avoidsconvergingtoamaximumpoint.Evenifwestartwithalocalmaximumpoint,Algorithm2.3 can successfully progress to a local minimum. We prefer to choose the parameter  c  ¼  2. 3. Calculating derivatives Most algorithms for nonlinear optimization and nonlinear equations require knowledge of derivatives. Sometimes thederivatives are easy to calculate by hand, and it is reasonable to expect the user to provide code to compute them. In othercases, the functions are too complicated, so we look for ways to calculate or approximate the derivatives automatically [23].  3.1. Finite difference method Finite differences play an important role, they are one of the simplest ways of approximating a differential operator, andare extensively used in solving differential equations. A popular formula for approximating the first derivative  / 0 ð  x Þ  at a gi-ven point  x  is the forward-difference, or one-sided-difference, approximation, defined as 856  K.T. Elgindy, A.-R. Hedar/Applied Mathematics and Computation 206 (2008) 853–866   / 0 ð  x Þ   / ð  x  þ  h Þ   / ð  x Þ h  :  ð 3 : 1 Þ Thisprocessrequiresevaluationof  / atthepoint  x  aswellastheperturbedpoint  x  þ h .Theforwarddifferencederivativecanbe turnedinto a backwarddifference derivative byusinga negativevalue for  h . Noticetwothings about this approach. First,wehaveapproximatedalimitbyanevaluationwhichwehopethatitisclosetothelimit.Second,weperformadifferenceinthenumeratoroftheproblem.Thisgivesustwosourcesoferrorintheproblem.Largevaluesof  h willleadtoerrorduetoourapproximation, and small values of   h  will lead to round-off error in the calculation of the difference. The error due to thefinite difference approximation is proportional to the width  h . We refer to the error as being of order O ð h Þ .A more accurate approximation to the derivative can be obtained by using the central difference formula, defined as / 0 ð  x Þ   / ð  x  þ  h Þ   / ð  x    h Þ 2 h  ;  ð 3 : 2 Þ with local truncation error  E  ð h Þ ¼  h 2 6  / 000 ð n Þ  where  n  lies between  x   h  and  x  þ h . The optimal step-size  h  can be shown tobe  h  ¼  ffiffiffiffi 3 d M  3 q   where  d  is the round-off error and  M   ¼  max  x j / 000 ð  x Þj  [31]. The second derivative can also be given by the centraldifference formula / 00 ð a Þ   / ð a   h Þ  2 / ð a Þ þ  / ð a þ  h Þ h 2  ;  ð 3 : 3 Þ with local truncationerror  E  ð h Þ ¼  h 2 12 / ð 4 Þ ð n Þ  where  n  lies between  x  h  and  x þ  h . The optimal step-size  h  can be shown tobe  h  ¼  ffiffiffiffiffiffi 48 d L 4 q   where  L  ¼  max  x j / ð 4 Þ ð  x Þj  [31].In the following section we discuss a more powerful way for calculating derivatives.  3.2. Pseudospectral differentiation matrices The spectral methods are extremely effective and efficient techniques for the solution of differential equations. They cangivetrulyphenomenal performancewhenappliedtoappropriateproblems. Themainadvantageof spectral methodsistheirsuperior accuracy for problems whose solutions are sufficiently smooth functions. They converge exponentially fast com-pared withalgebraicconvergencerates for finite difference and finite element methods [3]. Inpractice this means that goodaccuracycanbeachievedwithfairlycoarsediscretizations.Thefamilyoftechniquesknownasthemethodofweightedresid-uals have been used extensively to perform approximate solutions of a wide variety of problems. The so-called spectralmethod is a specialization of this set of general techniques.Chebyshev polynomials  T  n ð  x Þ  are usually taken with the associated Chebyshev–Gauss–Lobatto nodes in the interval[  1, 1] given by  x k  ¼  cos  k p n   ;  k  ¼  0 ; 1 ; . . . ; n . Chebyshev polynomials are used widely in numerical computations. One of the advantages of using Chebyshev polynomials as expansion functions is the good representation of smooth functions byfinite Chebyshev expansions, provided that the function  f  ð  x Þ  is infinitely differentiable [11]. Chebyshev polynomialsas basisfunctions have a number of useful properties like being easily computed, converge rapidly, completeness, whichmeans thatany solutioncanbe representedto arbitrarilyhighaccuracyby retainingfinitenumber of terms. Inpractice, one of the mainreasons for the use of a Chebyshev polynomial basis is the good conditioning that frequently results. A number of compar-isons have beenmadeof the conditioning of calculationsinvolvingvarious polynomial bases, including  x k  and T  n ð  x Þ . A paperby Gautschi [14] gives a particularly effective approach to this topic. If a Chebyshev basis is adopted, there are usually threegains [20]:   the coefficients generally decrease rapidly with the degree  n  of polynomial;   the coefficients converge individually with  n ;   the basis is well conditioned, so that methods such as collocation are well behaved numerically.Sinceinterpolation,differentiationandevaluationarealllinearoperations,theprocessofobtainingapproximationstothevaluesofthederivativeofafunctionat thecollocationpointscanbeexpressedasamatrix–vectormultiplication;thematri-ces involved are called spectral differentiation matrices. The concept of a differentiation matrix is developed in the last twodecades and it has proven to be a very useful tool in the numerical solution of differential equations [7,13]. For the explicitexpressions of the entries of the differentiation matrices and further details of collocation methods, we refer to Refs.[6,7,13,33]. 4. Higher derivatives of Chebyshev polynomials Chebyshev polynomials is of leading importance among orthogonal polynomials, perhaps to Legendre polynomials(which have a unit weight function), but having the advantage over the Legendre polynomials that the locations of theirzeros are known analytically. Moreover, along with the Legendre polynomials, the Chebyshev polynomials belong to anexclusive band of orthogonal polynomials, known as Jacobi polynomials, which correspond to weight functions of the form ð 1   x Þ a ð 1 þ  x Þ b and whichare solutions of Sturm–Liouville equations. Before we introduceour newmethod, we will requireseveral results from approximation theory. The Chebyshev polynomial of degree  n ; T  n ð  x Þ  is defined as [20] K.T. Elgindy, A.-R. Hedar/Applied Mathematics and Computation 206 (2008) 853–866   857
Search
Similar documents
View more...
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks