Instruction manuals

EM323: a line search based algorithm for solving high-dimensional continuous non-linear optimization problems

Description
EM323: a line search based algorithm for solving high-dimensional continuous non-linear optimization problems
Published
of 13
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Soft Computing manuscript No. (will be inserted by the editor) EM323 : A Line Search based algorithm for solvinghigh-dimensional continuous non-linear optimizationproblems Vincent Gardeux  ·  Rachid Chelouah  ·  Patrick Siarry  ·  Fred Glover Received: date / Accepted: date Abstract  This paper presents a performance study of an one-dimensional search algorithm for solving gen-eral high-dimensional optimization problems. The pro-posed approach is a hybrid between a line search algo-rithm of Glover (2010) and an improved variant of aglobal method of Gardeux et al (2009) that uses linesearch algorithms as sub-routines. The resulting algo-rithm, called EM323, was tested on 19 scalable bench-mark functions, with a view to observing how optimiza-tion techniques for continuous optimization problemsrespond with increasing dimension. To this end, we re-port the algorithm’s performance on the 50, 100, 200,500 and 1000-dimension versions of each function. Com-putational results are given comparing our method withthree evolutionary algorithms. We find our approachobtains better results on almost all of the tested func-tions. Keywords  metaheuristic  ·  line search  ·  optimization  · continuous  ·  high-dimension Vincent Gardeux and Rachid ChelouahEISTI, L@risCergy-Pontoise, France(corresponding author E-mail: vincent.gardeux@eisti.fr)Patrick SiarryUniversit´e de Paris 12, LiSSiCr´eteil, FranceFred GloverOptTek Systems, Inc.2241 17th StreetBoulder, CO 80302, USA 1 Introduction We address the continuous nonlinear function optimiza-tion problem in the form:Minimize  f  ( x ) :  L  ≤  x  ≤  U   (1)where  x  = ( x 1 ,...,x n ) is a vector of real-valued vari-ables, and the vectors  L  and  U   are assumed finite andto satisfy  L < U  . The function  f  ( x ) is characteristi-cally assumed to be multi-modal, so that local optimado not in general correspond to global optima. We de-note the index set for the components of   x ,  L  and  U   by N   =  { 1 ,...,n } .There has recently been considerable interest in solvinginstances of (1) of large dimensions (i.e., large values of  n ), motivated by the emergence of applications in bio-computing, web and data mining (Lee (2007), Nasiriet al (2009)).Many current optimization techniques can efficientlysolve instances of (1) of small to moderate dimension,involving up to 50 variables. However, most of them arenot well designed to efficiently solve high-dimensionalproblems, particularly those containing more than 100variables. Moreover, the performance of these algorithmsdecreases with increasing dimension. We address thechallenge of solving higher dimensional problems byproposing an algorithm that can be applied to any blackbox function without requiring Hessian or gradient in-formation. Our goal is to produce a method that isboth dimensionally robust and easy to implement. Asa foundation for doing this, we have adopted a decom-position theme, which we carry to the farthest extremeby decomposing the  n -dimensional problem into  n  one-dimensional problems, optimizing the objective func-tion on each dimension separately.One-dimensional optimization is a classical problem.  2 One of the first algorithms for solving it is Newton’smethod (1669). The family of one-dimensional algo-rithms to which Newton’s method belongs, called  Line Search  , is ideally suited to provide subroutines for solv-ing multidimensional problems. This type of optimiza-tion is particularly used in the steepest descent methodwhich computes a direction according to the deriva-tive of the objective function, as exemplified by the ap-proach of Hestenes and Stiefel (1952). Contrary to thistype of approach, however, our proposed method doesnot make use of an objective function derivative anddoes not try to approximate one. On high-dimensionalproblems, the calculation of a derivative or its approxi-mation can be highly time-consuming. Other algorithmsthat have made explicit use of a one-dimensional op-timization algorithm as a component strategy includethose of Grosan and Abraham (2007) and Tseng andChen (2008).In order to adapt a line search algorithm to a mul-tidimensional problem, we employ a simple relaxationmethod that takes the unit vectors ( e 1 ; e 2 ; ... ; e n ) to pro-vide a set of directions. Within this setting, line searchoperates by moving along the first direction until reach-ing a minimum (maximum), then along the second di-rection to its minimum (maximum) point, and so on,cycling through the set of directions as many times asnecessary.Our approach incorporates a fast global optimizationprocedure based on the EUS (Enhanced UnidirectionalSearch) algorithm of Gardeux et al (2009). EUS is para-meter-free and designed to converge quickly to a localoptimum. We combine EUS with the 3-2-3 line searchprocedure, recently developed by Glover (2010).In order to adapt this algorithm to non-separable prob-lems, we enhanced the global relaxation component of EUS to incorporate several features that will be de-scribed later. In order to handle multi-modal objec-tive functions, which pose the danger of entrapping thesearch in a local optimum, we use a restart procedurebased on a dispersion algorithm inspired by the scat-ter search (SS) algorithm of Marti et al (2006). TheSS algorithm component we employ computes the dis-persion over all local optima found, in order to avoidre-initializing the solution near local optima already ob-tained.Our proposed approach is tested on a suite of 19 scal-able functions and compared to three evolutionary algo-rithms that constitute leading methods for this class of problems. Analysis using the Wilcoxon statistical testdiscloses that our method outperforms the other meth-ods by a statistically significant margin. The remainderof the paper is organized as follows. Section 2 describesthe two main algorithms that we have combined to cre-ate EM323, whose detailed form is then presented inSection 3. Section 4 describes the experimental setup,followed by computational comparisons of our methodwith the three evolutionary algorithms in Section 5. Fi-nally, our conclusions and directions for future work aregiven in Section 6. 2 Two algorithms based on line search: EUSand 3-2-3 2.1 Line SearchLine search problems may be expressed as that of min-imizing  f  ( y ) on a line segment denoted by LS( x ′ ,  x ′′ ),which passes through points  x ′ and  x ′′ , whereLS( x ′ ,x ′′ ) =  { x  =  x ( θ ) :  x ( θ ) =  x ′ + ( x ′′ − x ′ ) θ for  θ min  ≤  θ  ≤  θ max }  (2)The values  θ min  and  θ max  are computed so that LS( x ′ , x ′′ ) lies within a bounded convex region defining a set of feasible solutions either for an srcinal problem of inter-est or for a mathematical relaxation of such a problem.Hence the points  x ′ and  x ′′ are not necessarily end-points of LS( x ′ ,  x ′′ ).2.2 EUS : Enhanced Unidirectional SearchThe EUS algorithm developed by Gardeux et al (2009)is designed to be easy to implement and robust in han-dling high-dimensional problems. It’s a global methodinspired from a classical relaxation method, using asimple line search procedure on each dimension succes-sively. The algorithm starts from a randomly generatedinitial solution  x . A difference (”delta”) vector  δ   is cre-ated and initialized by setting  δ   =  U   −  L , followed byshrinking the size of   δ   on subsequent iterations and con-tinuing until a stopping criterion is reached.At each iteration, the algorithm focuses on optimizingthe current solution x on only one dimension i , by allow-ing only the component  x i  to be changed. The methodthen selects the best neighbor from the two alternatives” x up ” and ” x down  ” given by x u =  x + δ  i e i  (3a) x d =  x − δ  i e i  (3b)where  e i  is the unit vector with a 1 in position  i  and0’s elsewhere. (Hence  x u and  x d are the same as  x  ex-cept for the ith component, where  x ui  =  x i  +  δ  i  and x di  =  x i  − δ  i .) The value  x ui  is reset to  U  i  if   x i + δ  i  > U  i and  x di  is reset to  L i  if   x i  − δ  i  < L i .The search compares  x  with its 2 neighbor solutions  x u  3 and  x d and updates  x  to be the best of these, hencesetting  x  =  arg min ( f  ( x ) ,f  ( x u ) ,f  ( x d )). Successive di-mensions  i  are treated in the same manner.Each iteration consists of examining every dimension  i ,and at its conclusion the algorithm finds a restricted lo-cal optimum relative to the precision given by the vector δ  . (We call this local optimum restricted, since if anychange is produced during the iteration it is possiblethat a better solution could be produced by a new passof the dimension  i .) We then test to determine whetherthe solution has been improved on at least one dimen-sion, and if not,  δ   is multiplied by a  ratio  value fixedto 0 . 5, therefore shrinking the size of its components.The  δ   vector continues to shrink in this fashion untilit satisfies  δ < δ  min , whose components are all fixed to1 e − 15 in order to obtain a suitable precision. The pa-rameter  ratio  can be tuned but experiments show thatthe fixed value of 0 . 5 is suitable.The pseudo-code in Figure 1 details an iteration of thealgorithm. The EUS algorithm obtains good results not Procedure EUS δ  =  U   − L beginfor  i  = 1  to  nx u =  x  +  δ i e i x d =  x − δ i e i (update x to be the best of the 3 solutions) x  =  argmin ( f  ( x ) ,f  ( x u ) ,f  ( x d )) endif   no improvement has been found δ  =  δ  ∗ 0 . 5 until  δ < δ min endendFig. 1  An iteration of the EUS algorithm only on separable functions but on non separable func-tions too. It doesn’t make an intensive optimizationon each dimension but only a quick approximation ateach pass. So on the next pass, the function to opti-mize relative to a given dimension can change due tochanges that have occurred on other dimensions. Butcomplex non-separable functions, which have many in-terrelations between the variables, remain difficult tooptimize in reasonable time. For such problems the con-vergence of the EUS method is too slow.2.3 The Restart ProcedureThe algorithm is not a population-based method, so toavoid becoming trapped in a local minimum, we usea restart procedure that keeps the best solution foundso far and re-initializes  δ   to a new starting value af-ter reaching the termination point given by  δ < δ  min .Increasing the vector  δ  min  may increase the potentialnumber of restarts of the algorithm, but tends to de-crease the error accuracy. In order to better explorethe search space, when the restart procedure is acti-vated, a new solution is generated, that lies far froma reference set. This technique uses the diversificationgeneration method from the Scatter Search algorithmof Marti et al (2006). Each time the termination pointis achieved, the solution found ( s i ) is added to a refer-ence set  S   of previously visited restricted local optima.The restart procedure then generates a collection of di-verse trial solutions in the search space as in the Martiet al approach and selects the one farthest from  S  . Thedistance  dist  used is a classic distance between a pointand a set : dist ( x,S  ) =  inf  { euclideanDist ( x,s ) /s  ∈  S  }  (4)2.4 A Line Search algorithm : 3-2-3We decided to replace the simple examination of thetwo neighbors  x u and  x d used in EUS with a more effec-tive determination of candidates to replace the currentsolution  x  by making use of a line search procedure.For this, we selected the 3-2-3 procedure which has astructure that is highly suited for exploitation by ourgeneral design. The procedure begins by adopting anapproach commonly used in nonlinear line search pro-cedures, which consists in identifying a succession of points  x ( θ 0 ),  x ( θ 1 ), ...,  x ( θ s ) on  LS  ( x ′ ,x ′′ ), for  s  ≥  2such that θ 0  < θ 1  < ... < θ s ,  where  θ 0  =  θ min ,θ s  =  θ max  (5)The  θ h  values are parameters that identify the locationof   x  (or more particularly  x ( θ h )) on the line segment LS  ( x ′ ,x ′′ ) joining  x ′ and  x ′′ . We may suppose, for ex-ample, that  x ′ =  x (0) and  x ′′ =  x (1), with  θ min  and θ max  selected to satisfy 0  ≤  θ min  < θ max  ≤  1.A straightforward way to generate the  θ h  values is tosubdivide the interval [ θ min ,  θ max ] into  s  equal subin-tervals so that θ h  =  θ h − 1  + ∆ (=  θ 0  + h∆ ) for  h  = 1 ,...,s, where  ∆  =  θ max  − θ min s  (6)The 3-2-3 method focuses on sequences of points onthe subdivided line defined by reference to the  θ  val-ues that have either the form of a pair ( x ( θ h − 1 ),  x ( θ h ))for 1  ≤  h  ≤  s  or a triple ( x ( θ h − 1 ),  x ( θ h ),  x ( θ h +1 )) for1  ≤  h  ≤  s − 1. (Hence the points  x ( θ 0 ) and  x ( θ s ) can be  4 endpoints of intervals defined by such pairs and triples.)The 3-2-3 algorithm starts from an initial constructionsimilar to that used by Golden Section methods forunimodal function optimization on a line, consistingof triples ( x ( θ h − 1 ),  x ( θ h ),  x ( θ h +1 )) but allowing morethan one starting interval and more complex ways foroperating on the intervals considered.The approach can be particularly appropriate when thedistances between successive points  x ( θ h ) are relativelysmall, or when the goal is to refine a coarse search of theline by focusing more thoroughly on the region aroundthe point  x ( θ q ).We select a current instance of such a triple ( x ( θ q − 1 ), x ( θ q ),  x ( θ q +1 )) by requiring that it satisfies f  ( x ( θ q ))  f  ( x ( θ q − 1 )) and  f  ( x ( θ q ))  f  ( x ( θ q +1 )) (7)Such a triple always exists unless the local optima fromamong the points  x ( θ 0 ), ...,  x ( θ s ) occur only at one orboth of the points  x ( θ 0 ) and  x ( θ s ). Barring this excep-tional case, the 3-2-3 algorithm is applied to each triplesatisfying (7) for which  f  ( x ( θ q )) does not exceed theglobally minimum  f  ( x ( θ )) value over the points  x ( θ 1 ),...,  x ( θ s )) by more than a relatively small amount. Toavoid duplicated effort, each point denoted by  x ( θ q ) isonly permitted to lie in one of the chosen sequences.If the exceptional case exists that prevents (7) fromholding, however, the 3-2-3 algorithm is applied by firstlaunching a preliminary algorithm called the 2-1-2 al-gorithm.In both the preliminary 2-1-2 algorithm and the main3-2-3 algorithm, we initialize  x ∗ =  x ( θ q ), where  x ∗ de-notes a candidate for a ”best solution” obtained by thesearch. The three starting cases are summarized as fol-lows, where Case 1 and Case 2 are only considered inthe exceptional situation where no triple exists satisfy-ing (7): –  Case 1 :  f  ( x ( θ 0 )) is a global minimum over thepoints  x ( θ 0 ), ...,  x ( θ s ) (and  f  ( x ( θ 1 )) is not). Let x a =  x ( θ 0 ),  x c =  x ( θ 1 ) (hence  f  ( x a )  < f  ( x c )). Set x ∗ =  x a and execute the preliminary 2-1-2 algo-rithm. –  Case 2 :  f  ( x ( θ s )) is a global minimum over thepoints  x ( θ 0 ), ...,  x ( θ s ) (and  f  ( x ( θ s − 1 )) is not). Let x a =  x ( θ s ),  x c =  x ( θ s − 1 ) (hence  f  ( x a )  < f  ( x c )).Set  x ∗ =  x a and execute the preliminary 2-1-2 algo-rithm. –  Case 3 : (7) is satisfied for a selected  q   such that1  ≤  q   ≤  s  −  1. Let  x a =  x ( θ q − 1 ),  x b =  x ( θ q ),  x c = x ( θ q +1 ) (hence  f  ( x b )  ≤  f  ( x a ),  f  ( x c )). Set  x ∗ =  x b and execute the main 3-2-3 algorithm.The preliminary algorithm is called the ”2-1-2” algo-rithm because each iteration starts with 2 points, gen-erates 1 new point, and then discards one of the startingpoints to end with 2 points (one being the new point).The pseudo-code in Figure 2 details this preliminaryalgorithm, where  maxIter  is a parameter representingthe stopping criterion. Similarly, the main algorithm Procedure 2-1-2 x ∗ =  x a begin (Start with and maintain  f  ( x a )  < f  ( x c )) while  iter < maxIteriter  =  iter  + 1 x b = 0 . 5 ∗ ( x a +  x c ) if   f  ( x b )  ≤  f  ( x a )  then (( x a , x b , x c ) has the proper form for the 3-2-3algorithm) x ∗ =  x b Terminate and execute the 3-2-3 procedure else ( f  ( x a )  < f  ( x b ))Designate ( x a , x b ) to be the new ( x a , x c ) end if end whileendFig. 2  Preliminary 2-1-2 algorithm is called the ”3-2-3” algorithm because each iterationstarts with 3 points, generates 2 new points, and thendiscards 2 points to end with 3 points (including at leastone of the new points). Viewed from the perspective of intervals, the algorithm might be called the ”2-4-2” al-gorithm, because each iteration starts with the 2 adja-cent intervals [ x a , x b ], [ x b , x c ], expands them to producethe 4 adjacent subintervals [ x a , x a 1 ], [ x a 1 , x b ], [ x b , x b 1 ],[ x b 1 , x c ], and then finally shrinks the latter collectionto again obtain 2 adjacent intervals. If the distances d ( x a ,x b ) and  d ( x b ,x c ) that define the lengths of thetwo initial intervals [ x a , x b ] and [ x b , x c ] are the same,then the lengths of the two intervals at the end of aniteration will be half the lengths of the two intervals atthe start of the iteration. The pseudo-code in Figure 3details an iteration of the 3-2-3 line search algorithm.We illustrate the 2-1-2 and the 3-2-3 procedures in Fig-ures 4 and 5, respectively, using a 2-dimensional repre-sentation.Figure 4(a) shows the starting configuration for the 2-1-2 procedure, where  f  ( x a )  < f  ( x c ). A line has beendrawn connecting the points  f  ( x a ) and  f  ( x c ) to clarifytheir relationship, but the line has no role in the proce-dure itself. Remaining components of Figure 4 includereference not only to  x a and  x c , but also to the point x b and its function value  f  ( x b ).Figure 4(b) illustrates the case where  f  ( x b )  ≤  f  ( x a ).The lines successively joining  f  ( x a ),  f  ( x b ), and  f  ( x c )are accentuated to indicate that this configuration qual-  5(a) Starting Case (b) Case  f  ( x b )  ≤  f  ( x a ) (c) Case  f  ( x a )  < f  ( x b ) (d) Case  f  ( x a )  < f  ( x b ) Fig. 4  The different cases of the 2-1-2 procedure(a) Starting Case (b) Case  f  ( x b )  ≤  f  ( x a ) (c) Case  f  ( x a )  < f  ( x b ) (d) Case  f  ( x a )  < f  ( x b ) Fig. 5  The different cases of the 3-2-3 procedure Procedure 3-2-3 x ∗ =  x b begin (Start with and maintain  f  ( x b )  ≤  f  ( x a ) ,f  ( x c )) x a 1 = 0 . 5 ∗ ( x a +  x b ) x b 1 = 0 . 5 ∗ ( x b +  x c ) if   f  ( x b )  ≤  f  ( x a 1 )  and  f  ( x b )  ≤  f  ( x b 1 )  then Designate ( x a 1 , x b , x b 1 ) to be the new ( x a , x b , x c ) else if   f  ( x a 1 )  ≤  f  ( x b 1 )  then x ∗ =  x a 1 Designate ( x a , x a 1 , x b ) to be the new ( x a , x b , x c ) else x ∗ =  x b 1 Designate ( x b , x b 1 , x c ) to be the new ( x a , x b , x c ) end if endFig. 3  One iteration of 3-2-3 algorithm ifies as a 3-2-3 configuration, and hence the 2-1-2 pro-cedure terminates at this point and the 3-2-3 procedurebegins.Figures 4(c) and 4(d) illustrate two versions of the samecase, where  f  ( x a )  < f  ( x b ). In both instances, the line joining  f  ( x a ) and  f  ( x b ) is accentuated, to indicate thatthe resulting configuration qualifies as a 2-1-2 config-uration. Hence the 2-1-2 procedure continues by refer-ence to the accentuated portion of the diagram, and x b becomes the new  x c . (A third instance of the casefor  f  ( x a )  < f  ( x b ) is also possible, where in addition f  ( x b )  > f  ( x c ). The treatment is the same as illustratedin Figures 4(c) and 4(d).)Figure 5(a) shows the starting configuration for the 3-2-3 procedure, where  f  ( x b )  ≤  f  ( x a ) and  f  ( x b )  ≤  f  ( x c ).Again the points identifying the function values are con-nected by a broken line for purposes of illustration. Re-maining components of Figure 5 additionally includethe points  x a 1 and  x b 1 , together with their functionvalues  f  ( x a 1 ) and  f  ( x b 1 ).Figure 5(b) illustrates the situation in which  f  ( x b )  ≤ f  ( x a 1 ) and  f  ( x b )  ≤  f  ( x b 1 ). In this case  x b retains itsidentity as the point having a smallest  f  ( x ) value, andthe sequence  x a 1 ,  x b ,  x b 1 qualifies as a 3-2-3 configura-tion as indicated by accentuating the lines successively joining  f  ( x a 1 ),  f  ( x b ), and  f  ( x b 1 ). The next iteration of the 3-2-3 procedure therefore resumes with the current x a 1 becoming the new  x a and the current  x b 1 becomingthe new  x c .Figures 5(c) and 5(d) illustrate two versions of the casewhere the condition of Figure 5(b) is not satisfied (hence f  ( x a 1 )  < f  ( x b ) or  f  ( x b 1 )  < f  ( x b )), and in addition f  ( x a 1 )  ≤  f  ( x b 1 ). Now  x a 1 qualifies to become the new x b , and in both of the Figures 5(c)and 5(d) we haveaccentuated the broken line joining  f  ( x a ),  f  ( x a 1 ) and f  ( x b ), thus identifying the configuration that qualifiesas a 3-2-3 configuration for the next iteration.There remains the case where the conditions of bothFigure 5(b) and of Figure 5(c) (and 5(d)) are all not sat-isfied, and hence we have  f  ( x b 1 )  < f  ( x b ) and  f  ( x b 1 )  <f  ( x a 1 ). This situation is the same as the one illustratedin Figures 5(c) and 5(d), with the roles of   x b 1 and  x a 1 interchanged, and hence we have not included an addi-
Search
Similar documents
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x