A SemiDeﬁnite programmingbased Underestimation method for globaloptimization in molecular docking
∗
Ioannis Ch. Paschalidis
†
,
Member, IEEE
, Yang Shen
‡
, Sandor Vajda
§
, Pirooz Vakili
¶
,
Member, IEEE
Abstract
—The paper introduces a new global optimizationmethod that is targeted to solve molecular docking problems,an important class of problems in computational biology. Thesearch method is based on ﬁnding general convex quadraticunderestimators to the binding energy function that is funnellike. Finding the optimum underestimator requires solving asemideﬁnite programming problem, hence the name SemiDeﬁnite programming based Underestimation (SDU). The optimal underestimator is used to bias sampling in the searchregion. A detailed comparison of SDU with a related methodof Convex Global Underestimator (CGU), a discussion of theconvergence properties of SDU, and computational results of the application of SDU to a number of rigid proteinproteindocking problems are provided.
Index Terms
—Computational biology, Global optimization,Semideﬁnite programming, Molecular docking.
I. I
NTRODUCTION
T
HE solution of a number of important problems incomputational biology rests on ﬁnding global minimaof energy functions that are funnellike. These are functionswith multiple nonconvex funnels and a huge number of localminima of less depth that are spread over the domain of the function. For example, protein folding is the problemof predicting the 3dimensional native structure (or “conformation”) of proteins from their 1dimensional amino acidsequences. It is known that proteins when they fold canfollow multiple paths on the energy landscape [1] which isfunnellike shaped. Similar energy funnels are also found inother problems such as proteinprotein docking [2].Global optimization methods such as simulated annealingand genetic algorithms have been applied in some of theseareas but they are very slow and easily trapped in kineticmoves. A number of recent approaches have attempted, withsome success, to use the funnellike shape to guide theglobal search to the vicinity of the global minimum. Forexample, the SemiGlobal Simplex (SGS) algorithm usessimplex moves on surfaces spanned by the local minima
* Research partially supported by the NSF under a CAREER awardANI9983221 and grants DMI0330171, ECS0426453, CNS0435312,DMI0300359, and EEC0088073, and by the ARO under the ODDR&EMURI2001 Program Grant DAAD190110465 to the Center for Networked Communicating Control Systems.
†
Center for Information & Systems Eng., and Dept. of ManufacturingEng., Boston University, 15 St. Mary’s St., Brookline, MA 02446, email:
yannisp@bu.edu
, url:
http://ionia.bu.edu/
.
‡
Center for Information & Systems Engineering, and Dept. of Manufacturing Eng., Boston University, email:
yangshen@bu.edu
.
§
Department of Biomedical Engineering, Boston University, email:
vajda@bu.edu
.
¶
Center for Information & Systems Engineering, and Dept. of Manufacturing Eng., Boston University, email:
vakili@bu.edu
.
rather than on the free energy itself [3]. Or, the SmoothDock approach [4] uses the strategy of descending on the “smooth”components of the energy function to which one slowlyadds higher frequency components. Of most relevance tothis paper is the Convex Global Underestimator (CGU)method where convex quadratic underestimators are usedto approximate the envelope spanned by the local minimaof the energy function [5]. The vicinity of the minimizerof the underestimator is viewed as the potential location of the global minimum of the energy function. The problem of ﬁnding the optimal underestimator is formulated and solvedas a Linear Programming (LP) problem.It has been shown that CGU does not perform well in somecases and that its performance deteriorates as the dimensionof the search space increases [3]. We contend that a criticalreason for this poor performance is the restricted class of underestimators used in CGU. This restriction amounts to alack of ﬂexibility in capturing the overall shape of the energyfunnels and hence an inability to locate promising regions tosearch for the global minimum.We use the same strategy of using quadratic convexfunctions to underestimate the envelope spanned by the localminima of the energy function. However, we consider theclass of general convex quadratic functions for underestimation. In this case, given a ﬁnite set of local minima,ﬁnding the optimal underestimator amounts to solving aSemiDeﬁnite Programming (SDP) problem, hence the termSemiDeﬁnite programmingbased Underestimation (SDU).We show, theoretically as well as experimentally, that SDUoutperforms CGU, often signiﬁcantly. Using some preliminary experimental results, we show that SDU is a promisingmethod for solving molecular docking problems.The rest of the paper is organized as follows. Sec. IIpresents some background material on molecular docking.Sec. III presents our SDU method. Comparisons with CGUare in Sec. IV. SDU’s convergence properties are discussedin Sec. V. Some results on docking proteins are discussedin Sec. VI. We end with conclusions in Sec. VII.II. P
RELIMINARIES
Next we review key properties of the free energy functionsthat are to be minimized in molecular docking problems.We start with their biophysical properties and then abstractcharacteristic mathematical properties that are important inthe development of appropriate optimization strategies.
Proceedings of the44th IEEE Conference on Decision and Control, andthe European Control Conference 2005Seville, Spain, December 1215, 2005
TuC07.1
0780395689/05/$20.00 ©2005 IEEE
3675
A. Biophysical OriginFree energy evaluation models:
At ﬁxed temperatureand pressure, a complex of two molecules adopts the conformation that corresponds to the lowest Gibbs free energyof the system that includes the component molecules andthe solvent – usually water – surrounding them. Thus, indocking calculations the natural target function to minimizeis an approximation of the Gibbs free energy,
G
RL
, of thereceptorligand complex, or that of the binding free energy,
∆
G
[6]. In particular,
∆
G
=
G
RL
−
G
R
−
G
L
, where
G
R
and
G
L
are the free energies of the (free) receptor and ligand,respectively, and both
G
R
and
G
L
are independent of theconformation of the complex; hence, minimizing
G
RL
isequivalent to minimizing
∆
G
.We use free energy evaluation models that combine molecular mechanics with continuum electrostatics and empiricalsolvation terms. In the most general case the binding freeenergy is decomposed according to
∆
G
= ∆
E
elec
+∆
E
vdw
+∆
E
int
+∆
G
∗
des
−
T
∆
S
sc
+∆
G
o
,
(1)where
∆
E
elec
is the change in electrostatic energy uponbinding,
∆
G
∗
des
is the desolvation free energy,
∆
E
vdw
is thechange in van der Waals energy, and
∆
E
int
is the change ininternal energy due to any ﬂexing/straining of the backboneand side chains. The entropy term,
−
T
∆
S
sc
, accounts forthe decrease in entropy experienced by the interface sidechains upon binding. The term,
∆
G
o
, accounts for all otherchanges in the binding free energy that occur upon binding,which is considered to depend weakly on the conformationand will be treated as a constant (15 kcal/mol in this work).The internal (bonded) energy,
∆
E
int
, is the sum of bondstretching, angle bending, torsional, and improper terms.
B. Mathematical Properties Multifrequency behavior:
The free energy function canbe regarded as the sum of three components with differentfrequencies. First, the sum of electrostatic, desolvation, andentropic terms changes relatively slowly along any reactionpath, and hence we deﬁne the “smooth” free energy, or thesmooth component of the free energy by
∆
G
s
= ∆
E
elec
+ ∆
G
des
−
T
∆
S
sc
+ ∆
G
o
(2)where the desolvation free energy
∆
G
des
does not includethe solventsolute van der Waals term.
∆
G
s
is much lesssensitive to structural perturbations than the terms
∆
E
vdw
and
∆
E
int
. The internal energy
∆
E
int
changes with anintermediate frequency, and the frequency of change is veryhigh for
∆
G
vdw
.In local minima in which the internal and van der Waalsterms are close to zero, the free energy surface is essentiallydetermined by the “smooth” free energy component
∆
G
s
.However, an arbitrary pathway in the conformational spacegoes through nonnative states at which the
∆
E
vdw
and
∆
E
int
are high, resulting in the funnellike shape shownin Fig. 1.III. SDU: T
HE
S
EMI
D
EFINITE
U
NDERESTIMATOR
M
ETHOD
In this section we introduce the SDU method. We ﬁrstintroduce some notational conventions we will be using.
Notational Conventions:
All vectors are assumed to becolumn vectors. We use lower case boldface letters to denotevectors and for economy of space we write
x
= (
x
1
,...,x
n
)
for the column vector
x
.
x
′
denotes the transpose of
x
,
0
the vector of all zeroes,
e
the vector of all ones, and
e
i
the
i
th unit vector. For any vector
x
we write
x
1
for the L1 norm, i.e.,
x
1
=
ni
=1

x
i

, and
x
for theEuclidean norm. We use upper case boldface letters to denotematrices. Speciﬁcally, we write
A
= (
A
i,j
)
ni,j
=1
for thematrix with
(
i,j
)
th
element equal to
A
i,j
. We denote bydiag
(
x
)
the diagonal matrix with elements
x
1
,...,x
n
inthe main diagonal and zeroes elsewhere. We also denote bydiag
(
A
,
B
)
the block diagonal matrix with matrices
A
and
B
in the main diagonal and zeroes elsewhere. We deﬁne
F
•
Y
△
=
n
i
=1
n
j
=1
F
i,j
Y
i,j
.
(3)Finally, for any event
S
we use
1
{
S
}
to denote theindicator function of this event, that is,
1
{
S
}
equals onewhen the event occurs and zero otherwise.We are now prepared to describe the two key componentsof the SDU algorithm.
A. Constructing an Underestimator
Let us denote by
f
:
R
n
→
R
the function we seek to minimize and assume we have obtained a set of
K
local minima
φ
1
,...,
φ
K
of
f
(
·
)
. Let the underestimatorbe deﬁned by,
U
(
φ
)
△
=
φ
′
Q
φ
+
b
′
φ
+
c,
(4)where
Q
∈
R
n
×
n
is a positive semideﬁnite matrix,
b
∈
R
n
, and
c
is a scalar. The positive semideﬁniteness of
Q
guarantees the convexity of
U
(
·
)
.Using an L1 norm as a distance metric the problem of ﬁnding the tightest possible such underestimator
U
(
·
)
canbe formulated as follows:min
K j
=1
(
f
(
φ
j
)
−
c
−
φ
j
′
Q
φ
j
−
b
′
φ
j
)
s.t.
f
(
φ
j
)
≥
c
+
φ
j
′
Q
φ
j
+
b
′
φ
j
, j
= 1
,...,K,
Q
0
,
(5)where the decision variables are
Q
,
b
, and
c
, and “
0
”denotes positive semideﬁniteness.Let vectors
b
+
,
b
−
≥
0
and scalars
c
+
,c
−
≥
0
satisfying
b
=
b
+
−
b
−
and
c
=
c
+
−
c
−
. Let
s
= (
s
1
,...,s
K
)
and
Y
be the block diagonal matrix given by
Y
△
=
diag
(
Q
,
diag
(
b
+
)
,
diag
(
b
−
)
,c
+
,c
−
,
diag
(
s
))
.
(6)Note that
Y
∈
R
(3
n
+
K
+2)
×
(3
n
+
K
+2)
. Let
F
0
△
=
diag
(
diag
(
0
)
,
−
diag
(
e
))
, where
0
is the
(3
n
+2)
dimensionalzero vector, and
e
is the
K
dimensional vector of ones. Also,for
j
= 1
,...,K
we deﬁne
F
j
△
=
diag
(
φ
j
φ
j
′
,
diag
(
φ
j
)
,
−
diag
(
φ
j
)
,
1
,
−
1
,
diag
(
e
j
))
.
In addition, let
E
i,j
denote the
(3
n
+
K
+2)
×
(3
n
+
K
+2)
matrix with all elements equal to zero except the
(
i,j
)
th
3676
element which equals
1
. Then, (5) can be written as follows:(
SDPP
) max
F
0
•
Y
s.t.
F
j
•
Y
=
f
(
φ
j
)
, j
= 1
,...,K,
E
i,j
•
Y
= 0
, j
= 1
,...,i
−
1
,i
=
n
+ 1
,...,
3
n
+
K
+ 2
,
Y
0
,
(7)where the decision variable is the matrix
Y
. Problem(
SDPP
) in (7) is a
SemiDeﬁnite Programming (SDP)
problem [7]. SDP problems can be solved efﬁciently usinginteriorpoint methods [7] (in polynomial time).The dual to (
SDPP
) in (7) is the problem(
LMID
) min
K
j
=1
x
j
f
(
φ
j
)
s.t.
Z
=
K
j
=1
x
j
F
j
+
3
n
+
K
+2
i
=
n
+1
i
−
1
j
=1
w
i,j
E
i,j
−
F
0
,
Z
0
,
(8)where the decision variables are
x
j
’s and
w
i,j
’s. Problem(
LMID
) can be seen as the problem of minimizing a linearfunction subject to the
Linear Matrix Inequality (LMI)
K
j
=1
x
j
F
j
+
3
n
+
K
+2
i
=
n
+1
i
−
1
j
=1
w
i,j
E
i,j
−
F
0
0
.
Our main result on underestimating a set of local minima issummarized in the following theorem.
Theorem III.1
Consider a function
f
:
R
n
→
R
and a set of local minima
φ
1
,...,
φ
K
of
f
(
·
)
. Let
(
Q
,
b
+
,
b
−
,c
+
,c
−
,
s
)
form an optimal solution
Y
of Problem (
SDPP
) in (7),where
Y
is deﬁned as in (6). Let
U
(
φ
)
△
=
φ
′
Q
φ
+ (
b
+
−
b
−
)
′
φ
+(
c
+
−
c
−
)
. Then
U
(
·
)
satisﬁes
f
(
φ
j
)
≥
U
(
φ
j
)
for all
j
= 1
,...,K
while minimizing
(
f
(
φ
1
)
,...,f
(
φ
K
))
−
(
U
(
φ
1
)
,...,U
(
φ
K
))
1
. Moreover, the dual to Problem(
SDPP
) is the LMI problem (
LMID
) in (8).
Hereafter, we will say that a function
U
(
·
)
satisfying thestatement of Theorem III.1
underestimates
f
(
·
)
at points
φ
1
,...,
φ
K
. Figure 1 illustrates such an underestimator.
Fig. 1. A funnellike shaped function and a quadratic function underestimating the surface spanned by the local minima.
B. Sampling
Suppose we are seeking the native conformation in someregion
B
. Using a set of local minima
φ
1
,...,
φ
K
∈
B
of
f
(
·
)
we construct an underestimator
U
(
·
)
as described inSection IIIA. Let
φ
P
the minimizer of
U
(
·
)
. Notice thatthe underestimator contains information on the location of the nearnative energy valley. We are interested in samplingconformations such that conformations close to
φ
P
are morelikely to be selected. In addition, conformations with highenough energy can be ignored. This can be achieved by usingthe following probability density function (pdf) in
B
:
g
(
φ
) =
U
(
φ
)
−
U
max
B
(
U
(
φ
)
−
U
max
)
d
φ
△
=
U
(
φ
)
−
U
max
A .
(9)In the expression above
U
max
= max
B
U
(
φ
)
and weintroduced the normalizing constant
A
to denote the integralin the denominator.To generate random samples in
B
using the above pdf we will use the so called
rejection method
. In particular let
h
(
φ
) = 1
/V
be the uniform pdf in
B
where
V
=
B
d
φ
isthe volume of
B
. Notice that
g
(
φ
)
≤
V
(
U
(
φ
P
)
−
U
max
)
A h
(
φ
)
,
∀
φ
∈
B
and set
R
(
φ
)
equal to the ratio of the left hand side over theright hand side of the above, that is,
R
(
φ
)
△
=
U
(
φ
)
−
U
max
U
(
φ
P
)
−
U
max
.
(10)In order to discard high energy conformations we are interested in sampling points in
B
with associated probabilitydensity in some interval
[
ζ,
1]
, where
ζ
∈
[0
,
1)
. Thealgorithm in Fig. 2 outputs such a sample point. To seethat notice that in Step 1 we generate uniformly distributedsamples in the set
{
(
x
,y
)

x
∈
B
,y
∈
[
ζ,
1]
}
. The rejectionrule of Step 2 accepts samples that are uniformly distributedin
{
(
x
,y
)

x
∈
B
,ζ
≤
y
≤
g
(
x
)
}
. Thus, the output
φ
of the algorithm is distributed in
B
according to
g
(
·
)
.1) Generate a uniformly distributed random variables
x
1
∈
B
and
x
2
∈
[
ζ,
1]
.2) If
x
2
≤
R
(
x
1
)
, stop and output
φ
=
x
1
; otherwise,return to Step 1.
Fig. 2. An algorithm generating a sample in
B
drawn from
g
(
·
)
withassociated density in
[
ζ,
1]
.
We ﬁnally note that the algorithm in Fig. 2 requires knowing
U
max
. In many cases this is straightforward to obtain.Consider for instance the case where
B
is a polyhedron.Then, since
U
(
·
)
is convex it achieves its maximum at an extreme point of the polyhedron
B
. Hence, it sufﬁces to searchover all extreme points which in lowdimensional problems(e.g., rigid docking) are not that many. Alternatively, one canuse an estimate of
U
max
, e.g.,
max
i
U
(
φ
i
)
.
C. The SDU Algorithm
We now have all the ingredients to present our SDUalgorithm. The algorithm seeks a global minimum of the freeenergy function
f
(
·
)
in some region
B
of the conformationalspace; it is presented in Figure 3. Throughout the algorithmwe maintain a set
L
of interesting local minima obtainedso far as well as the best such local minimum denoted by
3677
1)
Initialization
: Starting from
K
(
K
≥
2
n
+1)
randompoints in
B
perform local minimization to obtain
K
local minima
φ
1
,...,
φ
K
of
f
(
·
)
. Set
L
=
{
φ
1
,...,
φ
K
}
and
φ
G
= argmin
i
=1
,...,n
f
(
φ
i
)
.2)
Underestimation
: Solve Problem (
SDPP
) in (7) toobtain the underestimator
U
(
φ
)
. Set the predictiveconformation equal to a minimizer of
U
(
·
)
, that is,when
Q
is invertible
φ
P
=
−
12
Q
−
1
b
.3)
Elimination
: Discard unfavorable local minima from
L
; more speciﬁcally, set
L
:=
L
\{
φ
∈
L

R
(
φ
)
<ζ
and
φ
=
φ
G
}
.4)
Focalization
: Deﬁne a neighborhood
N
(
φ
P
)
⊆
B
of
φ
P
. Set
B
:=
N
(
φ
P
)
.5)
Exploration
:a) Start from
φ
P
and use local minimizationto obtain a local minimum
ˆ
φ
P
of
f
(
·
)
. If
ˆ
φ
P
∈
B
set
L
:=
L
∪ {
ˆ
φ
P
}
and
φ
G
:=argmin
{
f
(
φ
G
)
,f
(ˆ
φ
P
)
}
.b) Obtain
m
samples from the sampling algorithmof Fig. 2. Using these samples as starting pointsperform local minimization to obtain
m
localminima
x
1
,...,
x
m
of
f
(
·
)
. Set
L
:=
L
∪{
x

x
=
x
1
,...,
x
m
,
x
∈
B
}
and
φ
G
:= arg min
φ
=
φ
G
,
x
1
,...,
x
m
φ
∈
B
f
(
φ
)
.
6)
Termination
: If
φ
G
−
φ
P
<
then stop; otherwisego to Step 2.
Fig. 3. The SDU algorithm.
φ
G
. The evolution of the algorithm in Fig. 3 depends on theparameters
K,ζ
∈
[0
,
1]
,m
and
, as well as the way wedeﬁne the neighborhood
N
(
φ
P
)
in Step 5. These will beappropriately tuned in every problem instance.A couple of remarks on the proposed SDU algorithm are inorder. The algorithm combines exploration with focalizationin energy favorable regions of the conformational space(energy funnels). The exploration is in fact biased towardsthese energy favorable funnels. This is motivated by thedesire to avoid computationally expensive exploration inareas of the conformational space that are not likely tocontain the native structure.We should point out that we make no claims that the SDUalgorithm will converge to the global minimum of
f
(
·
)
. Infact, it is straightforward to see that it will not ﬁnd the globalminimum if we do not use enough local minima to determinethe underestimator or when
f
(
·
)
is arbitrary and does nothave a funnellike shape. However, later in the paper we willprovide arguments that guarantee convergence for funnellikeshaped functions under a suitable set of conditions.IV. CGU
AND AND ITS
L
IMITATIONS
The CGU algorithm [5] can be viewed as a special case of the SDU algorithm under the following key modiﬁcations:1)
Underestimation
. In deriving the underestimator
U
(
·
)
impose the constraint that the matrix
Q
is a diagonalpositive semideﬁnite matrix. Then the semideﬁniteconstraint can be replaced by a nonnegativity constraint for all diagonal entries. It follows that Problem(
SDPP
) can be reformulated as a
linear programming problem (LP)
which can be easily solved.2)
Sampling
. Replace our biased sampling method withrandom (uniform) sampling in the neighborhood
N
(
φ
P
)
⊆
B
of
φ
P
.We will argue that these two differences between CGUand SDU drastically affect the performance of the CGUalgorithm. In particular, limiting the underestimator search tothe class of canonical parabolas (diagonal
Q
) substantiallyreduces the efﬁciency and accuracy of CGU for generalproblems where the surface spanned by the local minima isnot typically aligned with the canonical coordinates deﬁningthe underestimating parabola. [3] reports many such caseswhere CGU performs poorly. Some attempts in addressingthis limitation have been made in [8] but they are only ableto handle very special cases.We start our study of CGU limitations by providing asimple example where CGU fails. Consider the function
f
(
φ
) = 100
φ
21
−
10
φ
1
φ
2
+
φ
22
whose global minimum isat the srcin. We use CGU to underestimate this function.In Fig. 4 we plot contours of the function
f
(
·
)
and itsresulting CGU underestimator
U
CGU
(
·
)
. More speciﬁcally,
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4−2024681012x
1
x
2
U
CGU
(x)f (x)
(a)
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1051015202530x
1
x
2
U
CGU
(x)f(x)
(b)Fig. 4. CGU yields different results depending on the sample region.
in Fig. 4(a) we randomly (and uniformly) sampled the region
[
−
1
,
10]
×
[
−
1
,
10]
to obtain a large set of points whichwe used to construct the CGU underestimator. The underestimator
U
CGU
(
·
)
has a global minimum (to be referredto as
prediction
) at
(0
,
10)
. Notice that CGU constrainsits prediction within the sampling region. In Fig. 4(b) weperformed the same experiment but used
[
−
1
,
20]
×
[
−
1
,
20]
as the sampling region and CGU’s prediction was
(0
,
20)
. Inboth cases, CGU’s prediction is at the boundary because theminimization of
U
CGU
(
·
)
is constrained within the samplingregion; unconstrained minimization produces an even worseresult. It is evident that the prediction heavily depends on theinitial sampling region which in most cases is set arbitrarily.In the next subsection we analyze the CGU underestimatingapproach and compare to the one we employ in SDU.
3678
A. Comparing the CGU and SDU underestimators
As we discussed in Section III a quadratic underestimatorwill not be informative if either (
i
)
f
(
·
)
is not funnellike andthe envelope of local minima can not be well approximatedby a convex quadratic, and (
ii
) if we do not use a rich enoughset of local minima in constructing
U
(
·
)
. In the followingwe wish to remove these two potential sources of poorperformance in order to better assess the underestimatingpower of CGU and SDU. More speciﬁcally, we consider the“ideal” case of underestimating a convex quadratic given by
f
(
t
) =
t
′
¯
Qt
+¯
b
′
t
+¯
c
, where
¯
Q
0
. Further, we assume thatan inﬁnite number of sample points of
f
(
·
)
in some compactsampling region
B
is at our disposal when we construct theunderestimator. The construction of the underestimator basedon utilizing all sample points in
B
can be formulated as thefollowing (inﬁnite dimensional) optimization problem:minimize
t
∈
B
(
f
(
t
)
−
U
(
t
))
d
t
subject to
f
(
t
)
≥
U
(
t
)
,
t
∈
B
,
(11)where the decision variables are the (yet unspeciﬁed) parameters deﬁning
U
(
t
)
.Suppose ﬁrst that we use the SDU underestimating approach and seek to construct a function
U
(
t
) =
t
′
Qt
+
b
′
t
+
c
, where
Q
0
. Consider the problem in (11) forsuch a
U
(
t
)
. The next proposition is immediate.
Proposition IV.1
SDU can underestimate
f
(
·
)
exactly, in particular,
(
Q
,
b
,c
) = (¯
Q
,
¯
b
,
¯
c
)
is an optimal solution of (11).
We next consider the CGU underestimation approach.Speciﬁcally, we seek to construct a function
U
(
t
) =
t
′
Dt
+
b
′
t
+
c
, where
D
is diagonal positivesemideﬁnite matrix. Namely,
D
=
diag
(
d
1
,...,d
n
)
where
d
i
≥
0
for
i
= 1
,...,n
. For simplicityof the exposition
B
=
B
1
× ··· ×
B
n
where
B
i
= [
l
i
,u
i
]
and
u
i
−
l
i
=
T
for all
i
= 1
,...,n
.We denote
a
(
t
) = (
t
21
,...,t
2
n
,t
1
,...,t
n
,
1)
,
h
=(
t
1
∈
B
1
t
21
dt
1
/T,...,
t
n
∈
B
n
t
2
n
dt
n
/T,
t
1
∈
B
1
t
1
dt
1
/T,...,
t
n
∈
B
n
t
n
dt
n
/T,
1)
, and
z
= (
d
1
,...,d
n
,b
1
,...,b
n
,c
)
.In this case, the optimization problem in (11) is equivalentto the following problem:(
LSIPP
) max
h
′
z
s.t.
a
′
(
t
)
z
≤
f
(
t
)
,
t
∈
B
,
(12)where
z
is the decision vector. Note that it involves an inﬁnitenumber of constraints. A problem with such a structureis known as the
Linear SemiInﬁnite Programming (LSIP)
problem [9]. Its dual can be formulated in measure space asfollows:(
LSIPD
) min
B
f
(
t
)
dµ
(
t
)
s.t.
B
a
(
t
)
dµ
(
t
) =
h
, µ
∈
M
+
(
B
)
,
(13)where
M
+
(
B
)
denotes the set of nonnegative regular Borelmeasures on
B
.It can be shown (we omit the details due to space limitations) that the underestimator obtained by solving (
LSIPP
)in (12) is the limit (as
K
→ ∞
) of the CGU underestimatorsderived based on function values
f
(
t
1
)
,...,f
(
t
K
)
at a setof samples
t
1
,...,
t
K
from
B
. This is insightful becauseit suggests that when we use enough samples the quality of the CGU underestimator does not depend on sample selectionbut rather on the fundamental structure of the underestimatorfunction. Our main result in this section is the followingtheorem; the proof is omitted in the interest of space.
Theorem IV.2
Let
f
(
t
) =
t
′
¯
Qt
, where
¯
Q
0
. Further,let
U
∗
(
t
) =
t
′
D
∗
t
+
b
∗
′
t
+
c
∗
, be the optimal solutionto (
LSIPP
), i.e., the optimal CGU underestimator to
f
(
t
)
.Then, in general,
b
∗
= 0
. In other words the minimizer of the underestimator is different from the minimizer of
f
(
t
)
.
V. O
N
SDU’
S
C
ONVERGENCE
In this section we give the result that under appropriateconditions the SDU algorithm converges to the global minimum of the function
f
(
·
)
. Such (free energy) functions arising in molecular docking applications, as we have explained,possess key structural properties. Therefore, we will imposea set of structural assumptions on
f
(
·
)
and the search region
B
that reﬂect the properties of the free energies functions weseek to minimize. We denote by epi
(
f
)
the epigraph of
f
(
·
)
which is deﬁned as epi
(
f
) =
{
(
φ
,w
)

φ
∈
B
,f
(
φ
)
≤
w
}
.We also denote by conv
(
S
)
the convex hull of any set
S
.
Assumption A
Assume that
f
(
φ
)
satisﬁes the following set of conditions:(
i
) it is continuously differentiable; (
ii
)
f
(
·
)
has a uniqueglobal minimum in
B
; (
iii
)
B
is compact; (
iv
) for all localminima
φ
of
f
(
·
)
there exists an open set such that
φ
isthe unique minimizer of
f
(
·
)
within this set; (
iv
) the extreme points of conv
(
epi
(
f
))
lie on a quadratic function
˜
U
(
φ
) =
φ
′
˜
Q
φ
+ ˜
b
′
φ
+ ˜
c
; (
v
)
˜
U
(
φ
)
has a unique global minimumwhich is identical with the global minimum of
f
(
·
)
in
B
.
For functions that satisfy Assumption A we will say thatthey have a
funnellike shape
(see Fig. 1 for an illustration).As we argued in Section II, Assumption A is not overlyrestrictive for the free energy functions we are interested inminimizing.The following theorem establishes that given sufﬁcientsampling of the search region
B
the SDU underestimationprocedure can locate the global minimum of
f
(
·
)
whichwe denote by
φ
∗
. The proof is omitted again due to spacelimitation.
Theorem V.1
Let Assumption A prevail. Consider the SDU algorithm provided in Fig. 3 and assume that
B
containsat least
(
n
+1)(
n
+2)2
local minima of
f
(
·
)
which are extreme points of conv
(
epi
(
f
))
. Suppose that in Step 1 of the algorithm we obtain
K
uniformly distributed samples in
B
and for each one of those we perform local minimizationto obtain
K
local minima
φ
1
,...,φ
K
of
f
(
·
)
. Then, theglobal minimum
φ
P
of the underestimator
U
(
·
)
obtained inStep 2 of the algorithm converges in probability to the globalminimum
φ
∗
of
f
(
·
)
as
K
→ ∞
, namely,
lim
K
→∞
P
[
φ
P
=
φ
∗
] = 1
.
3679