Benchmarking the Differential Evolution with AdaptiveEncoding on Noiseless Functions
Petr Pošík
Czech Technical University in PragueFEE, Dept. of CyberneticsTechnická 2, 16627 Prague 6, Czech Republic
petr.posik@fel.cvut.czVáclav Klemš
Czech Technical University in PragueFEE, Dept. of CyberneticsTechnická 2, 16627 Prague 6, Czech Republic
vaclav.klems@gmail.com
ABSTRACT
The diﬀerential evolution (DE) algorithm is equipped withthe recently proposed adaptive encoding (AE) which makesthe algorithm rotationally invariant. The resulting algorithm, DEAE, should exhibit better performance on nonseparable functions. The aim of this article is to assess whatbeneﬁts the AE has, and what eﬀect it has for other function groups. DEAE is compared against pure DE, an adaptive version of DE (JADE), and an evolutionary strategywith covariance matrix adaptation (CMAES). The resultssuggest that AE indeed improves the performance of DE,particularly on the group of unimodal nonseparable functions, but the adaptation of parameters used in JADE ismore proﬁtable on average. The use of AE inside JADE isenvisioned.
Categories and Subject Descriptors
G.1.6 [
Numerical Analysis
]: Optimization—
global optimization, unconstrained optimization
; F.2.1 [
Analysis of Algorithms and Problem Complexity
]: Numerical Algorithms and Problems
General Terms
Algorithms
Keywords
Benchmarking, Blackbox optimization, Diﬀerential evolution, Evolution strategy, Covariance matrix adaptation, Adaptive encoding
1. INTRODUCTION
Diﬀerential evolution (DE) [9] is a populationbased optimization algorithm, popular thanks to its simplicity andgood results on many practical problems. To create an oﬀspring individual, it uses a mutation operator followed by acrossover. The mutation operators are usually rotationally
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for proﬁt or commercial advantage and that copiesbear this notice and the full citation on the ﬁrst page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior speciﬁcpermission and/or a fee.
GECCO’12 Companion,
July 7–11, 2012, Philadelphia, PA, USA.Copyright 2012 ACM 9781450311786/12/07 ...$10.00.
invariant, however, the crossover is not. On separable functions, the crossover helps to properly mix the good values of solution components in the population. On nonseparablefunctions, however, it mostly only destroys the potentiallygood combinations of values generated by the mutation.There are several possibilities how to overcome the crossover issue for nonseparable functions. (1) Turn oﬀ thecrossover operator completely. The DE then relies on themutation operator only and may have worse performanceon (partially) separable functions. (2) Choose the suitableoperators adaptively. There are several algorithms [8, 1, 10]able to choose suitable DE operators and their parametersduring the optimization run. For nonseparable functions,they may actually ﬁnd that the use of crossover is not profitable at all and may switch it oﬀ eﬀectively. (3) Use adaptive encoding. If we were able to perform the crossover ina suitable coordinate system, we may enjoy the beneﬁts of crossover even for the nonseparable functions.In this article, the last listed possibility is explored. Wechose the recently proposed adaptive encoding (AE) procedure [2] which adapts the coordinate system in a stepwisemanner during the search. The goal of this paper is to assesshow AE aﬀects the DE algorithm, what beneﬁts and whatdownsides it has, and also to compare the potential of parameter adaptation as used in JADE on the one hand, andencoding adaptation brought by AE on the other hand.The rest of this article is organized as follows. Section 2reviews the DE algorithm, and describes the use of AE insideDE, i.e. the proposed DEAE algorithm. Section 3 describesthe experiment carried out, together with the COCO benchmarking framework. The results are presented in Sec. 4 anddiscussed in Sec. 5. Sec. 6 concludes the paper and pointsout some directions for future work.
2. ALGORITHMS
The following paragraphs review the DE algorithm andthe AE procedure, introduce the DEAE algorithm and shortlydescribe the reference algorithms used in this paper.
Diﬀerential evolution (DE)
[9] is a simple and easytoimplement optimization algorithm (see the unshaded lines inAlg. 1). DE mutation operators create the donor individuals
v
i
as a linear combination of several individuals randomlychosen from the current population.
v
i
=
x
best
+
F
·
(
x
r
1
−
x
r
2
)
,
(1)Eq. 1 describes the so called “best/1” mutation operator, ahighly exploitative mutation variant, where
F
is the mutation factor (a positive number typically chosen from [0
.
5
,
1]).
189
The crossover creates the oﬀspring
u
i
by taking some solution components from the parent
x
i
and other componentsfrom the donor
v
i
. Eq. (2) describes the binomial crossover.It creates the oﬀspring individual
u
i
= (
u
i,
1
,...,u
i,D
) asfollows:
u
i,j
=
v
i,j
if
r
j
≤
CR
i
or
j
=
j
i,
rand
,x
i,j
otherwise, (2)where
r
j
is a random number uniformly distributed in [0
,
1],
CR
i
∈
[0
,
1] is the crossover probability representing theaverage proportion of components the oﬀspring gets fromits donor, and
j
i,
rand
is the randomly chosen index of thesolution component surely donated from the donor.Due to the crossover, DE is biased towards separable functions, and is not rotationally invariant. This bias, however,can be controlled with the parameter
CR
. The tuning of
CR
is part of many adaptive DE variants which try to ﬁnd theright operators and/or parameter values [8, 1, 10] to makethe resulting algorithm more robust.
DE and adaptive encoding.
The adaptive encoding(AE) framework [2] is a general method that makes an optimization algorithm rotationally invariant. It maintains alinear transformation of the coordinate system—the candidate solutions are evaluated in the srcinal space, but theoﬀspring creation takes place in a diﬀerent space given bythe linear transformation. Alg. 1 shows a simple combination of the basic DE algorithm with AE, i.e. the DEAEalgorithm, ﬁrst proposed in [6]. The shaded lines are themodiﬁcations needed for AE.
Algorithm 1:
DE with Adaptive Encoding
1
Initialize the population
P
← {
x
i
}
NP i
=1
.
2
Initialize the transformation matrix
B
∈
R
D
×
D
3
while
stopping criteria not met
do
4
Transform
P
:
P
← {
x
i

x
i
←
B
−
1
x
i
}
.
5
for
i
←
1
to
NP
do
6
v
i
←
mutate(
i
,
P
)
(Eq. 1)
7
u
i
←
crossover(
x
i
,
v
i
)
(Eq. 2)
8
Transform oﬀspring back:
u
i
←
Bu
i
.
9
if
f
(
u
i
)
< f
(
x
i
)
then
10
x
i
←
u
i
11
end
12
end
13
B
←
update(
B
,
x
(1)
,...,
x
(
µ
)
)
14
end
The forward and backward linear transformations are implemented by matrix multiplication (using the transformation matrix
B
). The procedure for updating
B
is crucialfor the algorithm success. We adopted the method derivedfrom the CMAES algorithm (we refer the reader to [2] formore details).
Reference algorithms.
JADE [10] serves as a referenceadaptive DE algorithm. It was chosen because it was reported [10] to have a better performance than other adaptiveDE variants. JADE uses a special mutation strategy called“currentto
p
best”, but most importantly it adapts the crossover probability
CR
and mutation factor
F
to values whichturned out to be beneﬁcial in recent generations. This algorithm thus does not adapt the coordinate system, does notadaptively select the operators it uses, but thanks to theadaptation of
CR
, it can eﬀectively turn oﬀ the crossover.CMAES, evolution strategy with covariance matrix adaptation [5] was chosen for the comparison because the AEprocedure is largely based on this algorithm. The algorithmsamples new candidate solutions from a multivariate Gaussian distribution and adapts its mean and covariance matrix (i.e. it actually uses the adaptation of the coordinatesystem). The algorithm CMAES used in this paper is aconventional multistart version.
3. EXPERIMENT DESIGN
In the experiments, we compare DE, DEAE, JADE, andCMAES. By comparing DE to DEAE, we can assess theperformance boost the DE algorithm can gain using AE. Bycomparing DEAE with CMAES, we can get some insight if the sampling process of CMAES (drawing points from normal distribution) is more suitable than the sampling processof DE (using mutation and crossover). The comparison of DEAE with JADE shall reveal which of the two diﬀerenttypes of adaptation is more suitable for which kinds of functions.Each of the algorithms was run on 15 instances of allthe 24 functions in dimensions 2, 3, 5, 10, 20, and 40.The evaluations budget was set to 5
·
10
4
D
for each run.All algorithms were restarted when they stagnate for morethan 30 generations and the population diversity measure
1
D
Di
=1
Var
(
X
i
)
<
10
−
10
.The multistart CMAES algorithm was benchmarked anewwith its default settings using the BBOB 2012 procedure.For most parameters of DE and JADE, default valuesfrom the literature were used. For DE: the binomial crossover with
CR
= 0
.
5, the“best”mutation strategy with
F
∼
U
(0
.
5
,
1) (sampled anew each generation). For JADE: initial
µ
CR
= 0
.
5, initial
µ
F
= 0
.
5, the parameter of the “currentto
p
best”mutation is
p
= 0
.
1, the archive size

A

= 0
.
1
NP
.The population size was set to
NP
= 5
D
for both algorithms after a small systematic study performed on JADEand DE using the values (3
,
4
,
5
,
6
,
8
,
10
,
15
,
20)
·
D
. Values of
NP
lower than 5
D
gave erratic behavior even on unimodalfunctions, values larger than 5
D
wasted evaluations on unimodal functions and did not bring signiﬁcant advantages onmultimodal functions.The DEAE algorithm inherited the parameters of DE. TheAE part of DEAE uses a learning rate parameter
α
c
= 8 chosen after testing the values 1, 4, 8, 10, 15, and 20 (increasingthe learning rate from 1 to 8 brought signiﬁcant speedups,further increase provided questionable advantage only).
4. RESULTS
Results from experiments according to [3] on the benchmark functions given in [4] are presented in Figures 1, 2 and3 and in Tables 1 and 2. The
expected running time(ERT)
, used in the ﬁgures and table, depends on a giventarget function value,
f
t
=
f
opt
+∆
f
, and is computed overall relevant trials as the number of function evaluations executed during each trial while the best function value did notreach
f
t
, summed over all trials and divided by the numberof trials that actually reached
f
t
[3, 7].
Statistical signiﬁcance
is tested with the ranksum test for a given target ∆
f
t
(10
−
8
as in Figure 1) using, for each trial, either the number
190
of needed function evaluations to reach ∆
f
t
(inverted andmultiplied by
−
1), or, if the target was not reached, the best∆
f
value achieved, measured only up to the smallest number of overall function evaluations for any unsuccessful trialunder consideration.
4.1 CPU Timing Experiments
The timing experiments were carried out with
f
8
on amachine with Intel Core 2 Duo processor, 2.4 Ghz, with4 GB RAM, on Windows 7 64bit in MATLAB R2009b 64bit.The average time per function evaluation in 2, 3, 5, 10, 20,40 dimensions was about 52, 35, 21, 12, 8, and 7
×
10
−
6
s forDE, about 70, 45, 28, 16, 9, 10
×
10
−
6
s for JADE, and 68,45, 27, 15, 9, 10 for DEAE, i.e. the cost of AE updates isnegligible.
5. DISCUSSION
Considering the comparison of DEAE and DE, it can bestated that the application of AE to DE generally helps theDE algorithm to solve a higher percentage of problems, i.e.to ﬁnd more precise optima of the functions, and to solvethem faster, especially in the group of nonseparable unimodal functions (for the illconditioned functions, speedupfactors of 10 are observed in 5D, the percentage of solvedproblems arose from about 20% to 100% in 20D), whichis an expected result. In case of multimodal functions, thediﬀerence is not that large, but DEAE is only seldom worsethan the pure DE. The only exception in this comparison isthe group of separable functions (namely
f
3
and
f
4
), wherethe application of AE actually destroys the initially idealcoordinate system and prevents the DEAE algorithm fromsolving these functions.The comparison of DEAE to CMAES reveals that onthe group of unimodal functions, the multistart CMAES isusually faster than DEAE (about 2 to 5 times faster, depending on dimensionality), probably thanks to its muchsmaller population. The exception are the functions
f
3
and
f
4
(where neither of the 2 algorithms is competitive), and
f
7
and
f
13
(where the DEAE proﬁts from its larger populationsize). On the group of multimodal functions with adequatestructure, DEAE performs better (larger population), whileon the group of weakly structured functions, CMAES iscomparable or better (thanks to larger number of restarts).Comparing DEAE to JADE, the ﬁrst observation is thatJADE has an advantage in case of separable functions. Fornonseparable unimodal functions, DEAE is (up to 5 times)faster. For multimodal functions, the results are quite mixed.In general (and especially in higher dimensions), JADE is expected to solve a larger proportion of functions than DEAE.
6. SUMMARY AND CONCLUSIONS
The search space representation is a key issue when designing a well performing optimization algorithm. In thiswork, the AE procedure was applied to the DE algorithm.The resulting DEAE algorithm was compared with a conventional DE algorithm, JADE, an adaptive version of DE,and with CMAES.The application of AE signiﬁcantly improved the performance of the DE algorithm for moderate and illconditionedunimodal functions, as expected, but also had a positive (although less pronounced) eﬀect on multimodal functions.JADE (with a diﬀerent kind of adaptation than DEAE)also showed quite competitive results. The two forms of adaptation are based on diﬀerent principles and are in factcomplementary. Implementing the AE procedure inside JADEmay be very proﬁtable: JADE may adapt the probability of applying AE in a similar way it adapts the
CR
and
F
parameters. The evaluation of such approach remains a topicfor the future work.
Acknowledgements
This work was supported by the Ministry of Education,Youth and Sports of the Czech Republic with the grantNo. MSM6840770012 entitled “Transdisciplinary Researchin Biomedical Engineering II”.
7. REFERENCES
[1] J. Brest, S. Greiner, B. Boskovic, M. Mernik, andV. Zumer. SelfAdapting control parameters indiﬀerential evolution: A comparative study onnumerical benchmark problems.
Evolutionary Computation, IEEE Transactions on
, 10(6):646–657,Dec. 2006.[2] N. Hansen. Adaptive encoding: How to render searchcoordinate system invariant. In G. Rudolph, editor,
Parallel Problem Solving from Nature – PPSN X
,volume 5199 of
LNCS
, pages 205–214. Springer, 2008.[3] N. Hansen, A. Auger, S. Finck, and R. Ros.Realparameter blackbox optimization benchmarking2012: Experimental setup. Technical report, INRIA,2012.[4] N. Hansen, S. Finck, R. Ros, and A. Auger.Realparameter blackbox optimization benchmarking2009: Noiseless functions deﬁnitions. Technical ReportRR6829, INRIA, 2009. Updated February 2010.[5] N. Hansen and A. Ostermeier. Completelyderandomized selfadaptation in evolution strategies.
Evolutionary Computation
, 9(2):159–195, 2001.[6] V. Klemˇs. Diﬀerential evolution with adaptiveencoding. Master’s thesis, Czech Technical Universityin Prague, 2011. Available online,
http://cyber.felk.cvut.cz/research/theses/papers/177.pdf
.[7] K. Price. Diﬀerential evolution vs. the functions of thesecond ICEO. In
Proceedings of the IEEE International Congress on Evolutionary Computation
,pages 153–157, 1997.[8] A. K. Qin and P. N. Suganthan. Selfadaptivediﬀerential evolution algorithm for numericaloptimization. In
Evolutionary Computation, 2005. The 2005 IEEE Congress on
, volume 2, pages 1785–1791Vol. 2. IEEE, 2005.[9] R. Storn and K. Price. Diﬀerential evolution — asimple and eﬃcient heuristic for global optimizationover continuous spaces.
Journal of Global Optimization
, 11(4):341–359, Dec. 1997.[10] J. Zhang and A. C. Sanderson. JADE: Adaptivediﬀerential evolution with optional external archive.
Evolutionary Computation, IEEE Transactions on
,13(5):945–958, Oct. 2009.
191
Figure 1: Expected running time (
ERT
in number of
f
evaluations) divided by dimension for target functionvalue
10
−
8
as
log
10
values versus dimension. Diﬀerent symbols correspond to diﬀerent algorithms given inthe legend of
f
1
and
f
24
. Light symbols give the maximum number of function evaluations from the longesttrial divided by dimension. Horizontal lines give linear scaling, slanted dotted lines give quadratic scaling.Black stars indicate statistically better result compared to all other algorithms with
p <
0
.
01
and Bonferronicorrection number of dimensions (six). Legend:
◦
: CMAES,
: DE,
: JADE,
: DEAE.
192