Documents

PSO Algorithm

Description
Details of PSO Algorithm
Categories
Published
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Benchmarking the Differential Evolution with AdaptiveEncoding on Noiseless Functions Petr Pošík Czech Technical University in PragueFEE, Dept. of CyberneticsTechnická 2, 16627 Prague 6, Czech Republic petr.posik@fel.cvut.czVáclav Klemš Czech Technical University in PragueFEE, Dept. of CyberneticsTechnická 2, 16627 Prague 6, Czech Republic vaclav.klems@gmail.com ABSTRACT The differential evolution (DE) algorithm is equipped withthe recently proposed adaptive encoding (AE) which makesthe algorithm rotationally invariant. The resulting algo-rithm, DEAE, should exhibit better performance on non-separable functions. The aim of this article is to assess whatbenefits the AE has, and what effect it has for other func-tion groups. DEAE is compared against pure DE, an adap-tive version of DE (JADE), and an evolutionary strategywith covariance matrix adaptation (CMA-ES). The resultssuggest that AE indeed improves the performance of DE,particularly on the group of unimodal non-separable func-tions, but the adaptation of parameters used in JADE ismore profitable on average. The use of AE inside JADE isenvisioned. Categories and Subject Descriptors G.1.6 [ Numerical Analysis ]: Optimization— global opti-mization, unconstrained optimization  ; F.2.1 [ Analysis of Algorithms and Problem Complexity ]: Numerical Al-gorithms and Problems General Terms Algorithms Keywords Benchmarking, Black-box optimization, Differential evolu-tion, Evolution strategy, Covariance matrix adaptation, Adap-tive encoding 1. INTRODUCTION Differential evolution (DE) [9] is a population-based op-timization algorithm, popular thanks to its simplicity andgood results on many practical problems. To create an off-spring individual, it uses a mutation operator followed by acrossover. The mutation operators are usually rotationally Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee. GECCO’12 Companion,  July 7–11, 2012, Philadelphia, PA, USA.Copyright 2012 ACM 978-1-4503-1178-6/12/07 ...$10.00. invariant, however, the crossover is not. On separable func-tions, the crossover helps to properly mix the good values of solution components in the population. On non-separablefunctions, however, it mostly only destroys the potentiallygood combinations of values generated by the mutation.There are several possibilities how to overcome the cross-over issue for non-separable functions. (1) Turn off thecrossover operator completely. The DE then relies on themutation operator only and may have worse performanceon (partially) separable functions. (2) Choose the suitableoperators adaptively. There are several algorithms [8, 1, 10]able to choose suitable DE operators and their parametersduring the optimization run. For non-separable functions,they may actually find that the use of crossover is not prof-itable at all and may switch it off effectively. (3) Use adap-tive encoding. If we were able to perform the crossover ina suitable coordinate system, we may enjoy the benefits of crossover even for the non-separable functions.In this article, the last listed possibility is explored. Wechose the recently proposed adaptive encoding (AE) proce-dure [2] which adapts the coordinate system in a step-wisemanner during the search. The goal of this paper is to assesshow AE affects the DE algorithm, what benefits and whatdownsides it has, and also to compare the potential of pa-rameter adaptation as used in JADE on the one hand, andencoding adaptation brought by AE on the other hand.The rest of this article is organized as follows. Section 2reviews the DE algorithm, and describes the use of AE insideDE, i.e. the proposed DEAE algorithm. Section 3 describesthe experiment carried out, together with the COCO bench-marking framework. The results are presented in Sec. 4 anddiscussed in Sec. 5. Sec. 6 concludes the paper and pointsout some directions for future work. 2. ALGORITHMS The following paragraphs review the DE algorithm andthe AE procedure, introduce the DEAE algorithm and shortlydescribe the reference algorithms used in this paper. Differential evolution (DE)  [9] is a simple and easy-to-implement optimization algorithm (see the unshaded lines inAlg. 1). DE mutation operators create the donor individuals v i  as a linear combination of several individuals randomlychosen from the current population. v i  =  x best  + F   · ( x r 1  − x r 2 ) ,  (1)Eq. 1 describes the so called “best/1” mutation operator, ahighly exploitative mutation variant, where  F   is the muta-tion factor (a positive number typically chosen from [0 . 5 , 1]). 189  The crossover creates the offspring  u i  by taking some so-lution components from the parent  x i  and other componentsfrom the donor  v i . Eq. (2) describes the binomial crossover.It creates the offspring individual  u i  = ( u i, 1 ,...,u i,D ) asfollows: u i,j  =   v i,j  if   r j  ≤  CR i  or  j  =  j i, rand ,x i,j  otherwise, (2)where  r j  is a random number uniformly distributed in [0 , 1], CR i  ∈  [0 , 1] is the crossover probability representing theaverage proportion of components the offspring gets fromits donor, and  j i, rand  is the randomly chosen index of thesolution component surely donated from the donor.Due to the crossover, DE is biased towards separable func-tions, and is not rotationally invariant. This bias, however,can be controlled with the parameter  CR . The tuning of   CR is part of many adaptive DE variants which try to find theright operators and/or parameter values [8, 1, 10] to makethe resulting algorithm more robust. DE and adaptive encoding.  The adaptive encoding(AE) framework [2] is a general method that makes an op-timization algorithm rotationally invariant. It maintains alinear transformation of the coordinate system—the candi-date solutions are evaluated in the srcinal space, but theoffspring creation takes place in a different space given bythe linear transformation. Alg. 1 shows a simple combina-tion of the basic DE algorithm with AE, i.e. the DEAEalgorithm, first proposed in [6]. The shaded lines are themodifications needed for AE. Algorithm 1:  DE with Adaptive Encoding 1  Initialize the population  P   ← { x i } NP i =1 . 2  Initialize the transformation matrix  B  ∈ R D × D 3  while  stopping criteria not met   do 4  Transform  P  :  P   ← { x  i | x  i  ←  B − 1 x i } . 5  for  i  ←  1  to  NP   do 6  v  i  ←  mutate( i ,  P   )  (Eq. 1) 7  u  i  ←  crossover( x  i ,  v  i )  (Eq. 2) 8  Transform offspring back:  u i  ←  Bu  i . 9  if   f  ( u i )  < f  ( x i )  then 10  x i  ←  u i 11  end 12  end 13  B  ←  update( B ,  x (1) ,..., x ( µ ) ) 14  end The forward and backward linear transformations are im-plemented by matrix multiplication (using the transforma-tion matrix  B ). The procedure for updating  B  is crucialfor the algorithm success. We adopted the method derivedfrom the CMA-ES algorithm (we refer the reader to [2] formore details). Reference algorithms.  JADE [10] serves as a referenceadaptive DE algorithm. It was chosen because it was re-ported [10] to have a better performance than other adaptiveDE variants. JADE uses a special mutation strategy called“current-to-  p best”, but most importantly it adapts the cross-over probability  CR  and mutation factor  F   to values whichturned out to be beneficial in recent generations. This algo-rithm thus does not adapt the coordinate system, does notadaptively select the operators it uses, but thanks to theadaptation of   CR , it can effectively turn off the crossover.CMA-ES, evolution strategy with covariance matrix adap-tation [5] was chosen for the comparison because the AEprocedure is largely based on this algorithm. The algorithmsamples new candidate solutions from a multivariate Gaus-sian distribution and adapts its mean and covariance ma-trix (i.e. it actually uses the adaptation of the coordinatesystem). The algorithm CMA-ES used in this paper is aconventional multistart version. 3. EXPERIMENT DESIGN In the experiments, we compare DE, DEAE, JADE, andCMA-ES. By comparing DE to DEAE, we can assess theperformance boost the DE algorithm can gain using AE. Bycomparing DEAE with CMA-ES, we can get some insight if the sampling process of CMA-ES (drawing points from nor-mal distribution) is more suitable than the sampling processof DE (using mutation and crossover). The comparison of DEAE with JADE shall reveal which of the two differenttypes of adaptation is more suitable for which kinds of func-tions.Each of the algorithms was run on 15 instances of allthe 24 functions in dimensions 2, 3, 5, 10, 20, and 40.The evaluations budget was set to 5  ·  10 4 D  for each run.All algorithms were restarted when they stagnate for morethan 30 generations and the population diversity measure 1 D  Di =1 Var ( X  i )  <  10 − 10 .The multistart CMA-ES algorithm was benchmarked anewwith its default settings using the BBOB 2012 procedure.For most parameters of DE and JADE, default valuesfrom the literature were used. For DE: the binomial cross-over with  CR  = 0 . 5, the“best”mutation strategy with  F   ∼ U  (0 . 5 , 1) (sampled anew each generation). For JADE: initial µ CR  = 0 . 5, initial  µ F   = 0 . 5, the parameter of the “current-to-  p best”mutation is  p  = 0 . 1, the archive size  | A |  = 0 . 1 NP  .The population size was set to  NP   = 5 D  for both algo-rithms after a small systematic study performed on JADEand DE using the values (3 , 4 , 5 , 6 , 8 , 10 , 15 , 20) · D . Values of  NP   lower than 5 D  gave erratic behavior even on uni-modalfunctions, values larger than 5 D  wasted evaluations on uni-modal functions and did not bring significant advantages onmulti-modal functions.The DEAE algorithm inherited the parameters of DE. TheAE part of DEAE uses a learning rate parameter  α c  = 8 cho-sen after testing the values 1, 4, 8, 10, 15, and 20 (increasingthe learning rate from 1 to 8 brought significant speedups,further increase provided questionable advantage only). 4. RESULTS Results from experiments according to [3] on the bench-mark functions given in [4] are presented in Figures 1, 2 and3 and in Tables 1 and 2. The  expected running time(ERT) , used in the figures and table, depends on a giventarget function value,  f  t  =  f  opt  +∆ f  , and is computed overall relevant trials as the number of function evaluations exe-cuted during each trial while the best function value did notreach  f  t , summed over all trials and divided by the numberof trials that actually reached  f  t  [3, 7].  Statistical signifi-cance is tested with the rank-sum test for a given target ∆ f  t (10 − 8 as in Figure 1) using, for each trial, either the number 190  of needed function evaluations to reach ∆ f  t  (inverted andmultiplied by − 1), or, if the target was not reached, the best∆ f  -value achieved, measured only up to the smallest num-ber of overall function evaluations for any unsuccessful trialunder consideration. 4.1 CPU Timing Experiments The timing experiments were carried out with  f  8  on amachine with Intel Core 2 Duo processor, 2.4 Ghz, with4 GB RAM, on Windows 7 64bit in MATLAB R2009b 64bit.The average time per function evaluation in 2, 3, 5, 10, 20,40 dimensions was about 52, 35, 21, 12, 8, and 7 × 10 − 6 s forDE, about 70, 45, 28, 16, 9, 10 × 10 − 6 s for JADE, and 68,45, 27, 15, 9, 10 for DEAE, i.e. the cost of AE updates isnegligible. 5. DISCUSSION Considering the comparison of DEAE and DE, it can bestated that the application of AE to DE generally helps theDE algorithm to solve a higher percentage of problems, i.e.to find more precise optima of the functions, and to solvethem faster, especially in the group of non-separable uni-modal functions (for the ill-conditioned functions, speedupfactors of 10 are observed in 5D, the percentage of solvedproblems arose from about 20% to 100% in 20-D), whichis an expected result. In case of multi-modal functions, thedifference is not that large, but DEAE is only seldom worsethan the pure DE. The only exception in this comparison isthe group of separable functions (namely  f  3  and  f  4 ), wherethe application of AE actually destroys the initially idealcoordinate system and prevents the DEAE algorithm fromsolving these functions.The comparison of DEAE to CMA-ES reveals that onthe group of unimodal functions, the multistart CMA-ES isusually faster than DEAE (about 2 to 5 times faster, de-pending on dimensionality), probably thanks to its muchsmaller population. The exception are the functions  f  3  and f  4  (where neither of the 2 algorithms is competitive), and  f  7 and  f  13  (where the DEAE profits from its larger populationsize). On the group of multi-modal functions with adequatestructure, DEAE performs better (larger population), whileon the group of weakly structured functions, CMA-ES iscomparable or better (thanks to larger number of restarts).Comparing DEAE to JADE, the first observation is thatJADE has an advantage in case of separable functions. Fornon-separable unimodal functions, DEAE is (up to 5 times)faster. For multimodal functions, the results are quite mixed.In general (and especially in higher dimensions), JADE is ex-pected to solve a larger proportion of functions than DEAE. 6. SUMMARY AND CONCLUSIONS The search space representation is a key issue when de-signing a well performing optimization algorithm. In thiswork, the AE procedure was applied to the DE algorithm.The resulting DEAE algorithm was compared with a con-ventional DE algorithm, JADE, an adaptive version of DE,and with CMA-ES.The application of AE significantly improved the perfor-mance of the DE algorithm for moderate and ill-conditionedunimodal functions, as expected, but also had a positive (al-though less pronounced) effect on multimodal functions.JADE (with a different kind of adaptation than DEAE)also showed quite competitive results. The two forms of adaptation are based on different principles and are in factcomplementary. Implementing the AE procedure inside JADEmay be very profitable: JADE may adapt the probability of applying AE in a similar way it adapts the  CR  and  F   pa-rameters. The evaluation of such approach remains a topicfor the future work. Acknowledgements This work was supported by the Ministry of Education,Youth and Sports of the Czech Republic with the grantNo. MSM6840770012 entitled “Transdisciplinary Researchin Biomedical Engineering II”. 7. REFERENCES [1] J. Brest, S. Greiner, B. Boskovic, M. Mernik, andV. Zumer. Self-Adapting control parameters indifferential evolution: A comparative study onnumerical benchmark problems.  Evolutionary Computation, IEEE Transactions on  , 10(6):646–657,Dec. 2006.[2] N. Hansen. Adaptive encoding: How to render searchcoordinate system invariant. In G. Rudolph, editor, Parallel Problem Solving from Nature – PPSN X  ,volume 5199 of   LNCS  , pages 205–214. Springer, 2008.[3] N. Hansen, A. Auger, S. Finck, and R. Ros.Real-parameter black-box optimization benchmarking2012: Experimental setup. Technical report, INRIA,2012.[4] N. Hansen, S. Finck, R. Ros, and A. Auger.Real-parameter black-box optimization benchmarking2009: Noiseless functions definitions. Technical ReportRR-6829, INRIA, 2009. Updated February 2010.[5] N. Hansen and A. Ostermeier. Completelyderandomized self-adaptation in evolution strategies. Evolutionary Computation  , 9(2):159–195, 2001.[6] V. Klemˇs. Differential evolution with adaptiveencoding. Master’s thesis, Czech Technical Universityin Prague, 2011. Available online,  http://cyber.felk.cvut.cz/research/theses/papers/177.pdf .[7] K. Price. Differential evolution vs. the functions of thesecond ICEO. In  Proceedings of the IEEE International Congress on Evolutionary Computation  ,pages 153–157, 1997.[8] A. K. Qin and P. N. Suganthan. Self-adaptivedifferential evolution algorithm for numericaloptimization. In  Evolutionary Computation, 2005. The 2005 IEEE Congress on  , volume 2, pages 1785–1791Vol. 2. IEEE, 2005.[9] R. Storn and K. Price. Differential evolution — asimple and efficient heuristic for global optimizationover continuous spaces.  Journal of Global Optimization  , 11(4):341–359, Dec. 1997.[10] J. Zhang and A. C. Sanderson. JADE: Adaptivedifferential evolution with optional external archive. Evolutionary Computation, IEEE Transactions on  ,13(5):945–958, Oct. 2009. 191  Figure 1: Expected running time ( ERT  in number of   f  -evaluations) divided by dimension for target functionvalue  10 − 8 as  log 10  values versus dimension. Different symbols correspond to different algorithms given inthe legend of   f  1  and  f  24 . Light symbols give the maximum number of function evaluations from the longesttrial divided by dimension. Horizontal lines give linear scaling, slanted dotted lines give quadratic scaling.Black stars indicate statistically better result compared to all other algorithms with  p <  0 . 01  and Bonferronicorrection number of dimensions (six). Legend:  ◦ : CMAES,   : DE,   : JADE,   : DEAE. 192
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks