Universal Quadratic Lower Bounds on Source Coding Error Exponents
Cheng Chang and Anant Sahai
Abstract
—We consider the problem of blocksize selection to achieve a desired probability of error for universalsource coding. While Baron,
et al
in [1], [9] studied thisquestion for rates in the vicinity of entropy for
known
distributions using centrallimittheorem techniques, we areinterested in all rates for
unknown
distributions and useerrorexponent techniques. By adapting a technique of Gallager from the exercises of [7], we derive a universallower bound to the sourcecoding error exponent thatdepends only on the alphabet size and is quadratic in thegap to entropy.
I. I
NTRODUCTION
In [10], the lossless source coding with decodersideinformation problem, as shown in Figure 1, isintroduced. The source and decoder sideinformationsequence
(
x
n
1
,
y
n
1
)
are drawn iid from a joint distribution
p
xy
on a ﬁnite alphabet
X ×Y
. If the decoder knows
y
n
1
, the error probability
Pr
(ˆ
x
n
1
=
x
n
1
)
, goes to
0
, as thecode length
n
goes to inﬁnity, for any rate
R > H
(
p
x

y
)
,where
H
(
p
x

y
)
is the conditional entropy of
x
given
y
.Encoder
ˆ
x
n
1
“Lossless”reconstructionDecoder
b
(
x
n
1
)
Encoded bits
 6
x
n
1
y
n
1
SourceSideinformation
(
x
i
,
y
i
)
∼
p
xy
6
?
Fig. 1. Lossless source coding with decoder sideinformation
The performance of the coding system, i.e. how fastthe error probability converges to
0
with block length
n
,when the coding rate is above the minimum requiredrate, is studied in [5], [6], [7]. We summarize therelevant error exponent results from the literature in thefollowing.
Theorem 1:
[6] Assume a decoder with access to theside information, where the memoryless source and sideinformation are generated from a distribution
p
xy
. A random binning encoder and jointly ML decoding system,shown in Figure 1, has error probability
Pr
(ˆ
x
n
1
=
x
n
1
)
≤
C. Chang and A. Sahai are with the Department of Electrical Engineering and Computer Science, University of California at Berkeley.
Email: cchang@eecs.berkeley.edu,sahai@eecs.berkeley.edu
e
−
nE
r
(
R
)
, where
E
r
(
R
) = max
0
≤
ρ
≤
1
ρR
−
¯
E
0
(
ρ
)
(1)where
¯
E
0
(
ρ
) = ln(
y
(
x
p
xy
(
x,y
)
11+
ρ
)
1+
ρ
)
Without decoder side information, the Gallager function
¯
E
0
simpliﬁes
1
to:
E
0
(
ρ
) = (1 +
ρ
)ln(
x
∈X
p
x
(
x
)
11+
ρ
)
In Theorem 1, the random binning scheme at theencoder is uniform, and thus universal in nature [6].However for the ML decoding rule, the decoder needsto know the statistics of the source. In [5], a universalsystem based on minimum entropy decoding is shownto achieve the same error exponent asymptotically. Forthe universal decoder,
Pr
(ˆ
x
n
1
=
x
n
1
)
≤
e
−
n
(
E
r
(
R
)
−
φ
(
n
))
(2)Where
φ
(
n
)
is the vanishing term
X
ln
nn
for the casewithout sideinformation and
φ
(
n
) =
XY
ln
nn
for thecase with decoder sideinformation.
A. Motivation and related work
For ﬁxed block source coding systems, block lengthis an important parameter as it is related to both systemdelay and complexity. Suppose there is a systemlevelrequirement that the block error probability
Pr
(
x
n
1
=ˆ
x
n
1
)
be below some constant
P
e
>
0
. If the distributionsare known, the minimum block length can be calculatedfrom Theorem 1. However, the exact distribution neednot be available to the encoder since that knowledgeis not needed to do uniform binning. Thus, a universalestimate to the error exponent is desirable.A related problem is studied in [1], [9]. They turnthe question around and ask: for nonasymptotic lengthsource coding with sideinformation, what is the minimum rate required to achieve block error
P
e
assumingthat the distribution is in fact known? A more quantitative discussion of the relation between the problem in[1] and our work here is deferred to Section IIA.
1
This is the source coding counterpart of the channel coding resultin Theorem 5.6.4 [7].
B. A universal bound on channel coding exponents
In Exercise 5.23 [7], Gallager gives a quadratic lowerbound on the random channel coding error exponentfor a discrete memoryless channel
P
(
··
)
with outputalphabet size
J
. If
Q
is the distribution that achievesthe channel capacity
C
, then the random coding errorexponent
E
cr
(
R,Q
)
, deﬁned in Theorem 5.6.4 of [7] islower bounded by the following quadratic function of the gap to capacity
(
C
−
R
)
for all
R < C
.
E
cr
(
R,Q
)
≥
(
C
−
R
)
2
8
/e
2
+ 4[ln
J
]
2
(3)This bound can be further tightened and we give the newresult as a corollary to Lemma 1. This bound is universalin the sense that it only depends on the size of outputalphabet and the gap to capacity and not on the detailedchannel statistics.Following Gallager’s techniques, we derive universalquadratic bounds on the random source coding errorexponent with and without decoder sideinformation. Forboth cases, the quadratic bounds are only determined bythe gap to entropy and the size of the source alphabet
X
. The results are summarized in Theorem 2. The proof details are in Section III.II. M
AIN
R
ESULTS AND
D
ISCUSSION
Theorem 2:
For a memoryless source
x
and decoderside information
y
, jointly generated iid from
p
xy
withconditional entropy
H
(
p
x

y
) =
h
on ﬁnite alphabet
X ×Y
, the random coding error exponent
E
r
(
R
)
, deﬁned in(1), is lower bounded by a quadratic function,
∀
R
∈
[
H
(
p
x

y
)
,
ln
X
)
:
E
r
(
R
)
≥
G
h
(
R
)
where
G
h
(
R
) =
(
R
−
h
)
2
2(ln
X
)
2
if
X≥
3
(
R
−
h
)
2
2ln2
if
X
= 2
(4)Because the bound depends only on the gap to entropy,we can also write it as
G
(
R
−
h
)
. Furthermore, if there isno sideinformation, the source is from
p
x
, s.t.
H
(
p
x
) =
h
and the same bound applies.It is interesting to note that this quadratic bound onthe error exponent
E
r
(
R
)
has no dependence on the sizeof the sideinformation alphabet
Y
.
A. Discussion and Examples
For sources with
H
(
p
x
) =
h
, it is easy to see that
R
−
h
is always an upper bound to
E
r
(
R
)
and henceTheorem 2 implies:
G
h
(
R
)
≤
min
p
x
:
H
(
p
x
)=
h
E
r
(
R
)
≤
max
p
x
:
H
(
p
x
)=
h
E
r
(
R
)
≤
R
−
h
To illustrate the looseness of our bounds, consider
X
= 3
, and distributions
p
x
, s.t.
H
(
p
x
) =
h
= 0
.
394
.Since the alphabet size is so small, we can use bruteforce optimization to obtain the upper and lower contoursof possible
E
r
(
R
)
. These are plotted along with theuniversal quadratic lower bound
G
(
R
−
h
) =
(
R
−
h
)
2
2(ln3)
2
and the linear upper bound
R
−
h
in Figure 2.
00.10.20.30.40.50.60.7
Rate R
E r r o r E x p o n e n t s
Universal linear boundUpper contour E
r
(R)Lower contour on E
r
(R)Universal quadratic bound
H(p
x
)lnX
Fig. 2. Plot of the error exponents bounds for a threeletter alphabet. In order from top to bottom:
R
−
H
(
p
x
)
,
max
p
x
:
H
(
p
x
)=
h
E
r
(
R
)
,
min
p
x
:
H
(
p
x
)=
h
E
r
(
R
)
, and
G
h
(
R
)
For ML decoding where the decoder knows the distribution, the encoder can pick a block length sufﬁcientto achieve block error probability
Pr
(
x
n
1
=ˆ
x
n
1
)
≤
P
e
,knowing only the gap to entropy
R
−
h
and the sourcealphabet size
X
by choosing
n
≥−
ln
P
e
/G
(
R
−
h
)
The size of the sideinformation alphabet
Y
is notneeded at all! If the decoder is also ignorant of the joint distribution, then
φ
(
n
)
in (2) must be taken intoconsideration.
n
can be chosen by solving:
nG
(
R
−
h
)
−
nφ
(
n
)
≥−
ln
P
e
Since
nφ
(
n
) =
XY
ln
n
, this implies that determining
n
from our bound requires the encoder to know the sideinformation alphabet size. At low probabilities of error,the dependence is relatively weak however since the
ln
n
is dominated by the
nG
(
R
−
h
)
term.In [1], it is shown that, for source coding withsideinformation, the required rate is
H
(
p
x

y
) +
K
(
P
e
)
−
ln
P
e
/n
+
o
(
−
ln
P
e
/n
)
to achieve block error probability
P
e
with ﬁxed block length
n
— wherethe
O
(
−
ln
P
e
/n
)
is called the redundancy rate. Theexact constant
K
(
P
e
)
is also computed and clearlydepends on the probability distribution of the source. The
converse, proved in [1] for binary symmetric sources,is not universal and moreover, cannot be made so. Asimple counterexample is that given
y
=
y
, suppose
x
is uniform on some subset of
S
y
⊂ X
, where

S
y

=
K
<
X
for all
y
∈ Y
. In this case, therandom coding error exponent
E
r
(
R
)
is a straight line
E
r
(
R
) =
R
−
ln

K

. For this example, the redundancyrate needs to be
ln
P
e
/n
when using random coding, andcould potentially be zero with some other scheme.Theorem 2 tells us that for block length
n
, rate
R
=
H
(
p
x

y
) +
K
(
X
)
−
ln
P
e
/n
then the block error is smaller than
P
e
, no matter what distributionis encountered. While not tight, it does show that aredundancy of
O
(
−
ln
P
e
/n
)
sufﬁces for universality.III. P
ROOF OF
T
HEOREM
2In this section we prove Theorem 2. First, we need atechnical lemma and deﬁnitions of tilted distributions[6].
A. Lemmas and Deﬁnitions
In this paper we use the following lemma to upperbound a nonconcave function
f
E
(
·
)
.
Lemma 1:
For constant
E
≥
0
, write
f
E
(
ω
) =
J
j
=1
ω
j
(ln
ω
j
−
E
)
2
(5)
ω
∈S
J
, where
S
J
=
{
ω
∈R
J

k
ω
k
= 1
,
and
ω
j
≥
0
,
∀
j
}
is the probability simplex of dimension
J
. Thenfor any distribution
ω
∈S
J
,
f
E
(
ω
)
≤
E
2
+ 2
E
(ln
J
) + (ln
J
)
2
if
J
≥
3
E
2
+ 2
E
(ln2) +
T
if
J
= 2
(6)where
T
=
t
1
(ln
t
1
)
2
+
t
2
(ln
t
2
)
2
(7)and
t
1
=1 +
√
1
−
4
e
−
2
2;
t
2
=1 +
√
1 + 4
e
−
2
2
T
≈
0
.
563
>
(ln2)
2
and
T <
ln2
.The proof is in the appendix. The challenge in the proof lies in the nonconcavity of
f
E
(
ω
)
. In Figure 3, for
J
=2
thus
ω
= (
x,
1
−
x
)
and
E
= 0
, we plot
f
0
((
x,
1
−
x
))
.The maximum occurs at
x
=
t
1
or
t
2
which are deﬁnedabove.
Deﬁnition 1:
Tilted distributions: For a distribution
p
x
on a ﬁnite alphabet
X
,
ρ
∈
(
−
1
,
∞
)
, we denote the
ρ
−
tilted distribution by
p
ρ
x
, where
p
ρ
x
(
x
) =
p
x
(
x
)
11+
ρ
s
∈X
p
x
(
s
)
11+
ρ
0 0.5100.10.20.30.40.50.60.7
x
f
0
( ( x , 1 − x ) ) = x ( l n x )
2
+ ( 1 − x ) l n ( 1 − x )
2
t
1
t
2
Fig. 3. Non concaveness of
f
E
(
ω
)
deﬁned in (5) ,
E
= 0
,
J
= 2
.
For a distribution
p
xy
on a ﬁnite alphabet
X × Y
, wedenote
x
−
y
tilted distribution of
p
xy
by
¯
p
ρ
xy
,
¯
p
ρ
xy
(
x,y
) =[
s
∈X
p
xy
(
s,y
)
11+
ρ
]
1+
ρ
t
∈Y
[
s
∈X
p
xy
(
s,t
)
11+
ρ
]
1+
ρ
×
p
xy
(
x,y
)
11+
ρ
s
∈X
p
xy
(
s,y
)
11+
ρ
Obviously
p
0
x
=
p
x
and
¯
p
0
xy
=
p
xy
. Write the marginaldistribution of
y
under distribution
¯
p
ρ
xy
as
¯
p
ρ
y
and theconditional distribution of
x
given
y
under distribution
¯
p
ρ
xy
as
¯
p
ρ
x

y
, then from the deﬁnition:
¯
p
ρ
xy
(
x,y
) = ¯
p
ρ
y
(
y
)¯
p
ρ
x

y
(
x

y
)¯
p
ρ
y
(
y
) =[
s
∈X
p
xy
(
s,y
)
11+
ρ
]
1+
ρ
t
∈Y
[
s
∈X
p
xy
(
s,t
)
11+
ρ
]
1+
ρ
¯
p
ρ
x

y
(
x

y
) =
p
xy
(
x,y
)
11+
ρ
s
∈X
p
xy
(
s,y
)
11+
ρ
Denote the entropy of
p
ρ
x
by
H
(
p
ρ
x
)
and the conditional entropy of
x
given
y
under distribution
¯
p
ρ
xy
by
H
(¯
p
ρ
x

y
)
, then
H
(¯
p
0
x

y
) =
H
(
p
x

y
)
. Write
H
(¯
p
ρ
x

y
=
y
)
as the conditional entropy of
x
given
y
=
y
, then:
H
(¯
p
ρ
x

y
=
y
) =
−
x
¯
p
ρ
x

y
(
x

y
)ln ¯
p
ρ
x

y
(
x

y
)
.
B. Proof of the case without sideinformation
As in the solution to Gallager’s problem 5.23 in[7], we use the Taylor expansion for
E
0
(
R
)
to ﬁnd aquadratic bound on
E
r
(
R
)
.
Proof:
From the mean value theorem, we expand
E
0
(
ρ
)
at
0
, where
ρ
≤
1
,
∃
ρ
1
∈
[0
,ρ
]
s.t.
E
0
(
ρ
) =
E
0
(0) +
ρE
0
(0) +
ρ
2
2
E
0
(
ρ
1
)
From basic calculus, as shown in the appendix of [3], itcan be shown that
E
0
(
ρ
) =
dE
0
(
ρ
)
dρ
=
H
(
p
ρ
x
)
(8)and hence
E
0
(0) =
H
(
p
x
)
. Note that
E
0
(0) = 0
, and so
E
0
(
ρ
)
≤
ρH
(
p
x
) +
ρ
2
2
α
(9)where
α >
0
is any upper bound of
E
0
(
ρ
1
)
that holds,
∀
ρ
1
∈
[0
,
1]
. Substitute (9) into the deﬁnition of
E
r
(
R
)
to get
E
r
(
R
)= max
0
≤
ρ
≤
1
ρR
−
E
0
(
ρ
)
≥
max
0
≤
ρ
≤
1
ρR
−
ρH
(
p
x
)
−
ρ
2
2
α
= max
0
≤
ρ
≤
1
−
α
2(
ρ
−
R
−
H
(
p
x
)
α
)
2
+(
R
−
H
(
p
x
))
2
2
α
=(
R
−
H
(
p
x
))
2
2
α
(10)for
R
−
H
(
p
x
)
≤
α
. In the last step, we note that
ρ
=
R
−
H
(
p
x
)
α
is the maximizer, which is within
[0
,
1]
. To ﬁnd
α
, an upper bound on
E
0
(
ρ
)
,
∀
ρ
∈
[0
,
1]
, we expand
E
0
(
ρ
) =
dH
(
p
ρ
x
)
dρ
=
−
x
(1 + ln
p
ρ
x
(
x
))
dp
ρ
x
(
x
)
dρ
=
x
(1 + ln
p
ρ
x
(
x
))
p
ρ
x
(
x
)1 +
ρ
(ln
p
ρ
x
(
x
) +
H
(
p
ρ
x
))=11 +
ρ
x
p
ρ
x
(
x
)(ln
p
ρ
x
(
x
))
2
+
p
ρ
x
(
x
)ln
p
ρ
x
(
x
)+
p
ρ
x
(
x
)
H
(
p
ρ
x
) +
p
ρ
x
(
x
)
ln
p
ρ
x
(
x
)
H
(
p
ρ
x
)
=11 +
ρ
x
p
ρ
x
(
x
)(ln
p
ρ
x
(
x
))
2
−
H
(
p
ρ
x
)
2
1 +
ρ
(11)Since the last term in (11) is negative
2
and
ρ >
0
, bythe deﬁnition of
f
E
(
·
)
in (5),
E
0
(
ρ
)
≤
x
p
ρ
x
(
x
)(ln
p
ρ
x
(
x
))
2
=
f
0
(
p
ρ
x
)
Lemma 1 tells us that
E
0
(
ρ
)
≤
α
, where
α
=
(ln
X
)
2
if
X≥
3ln2
if
X
= 2
(12)Here we replace the
T
from Lemma 1 with a looserupper bound
ln2
. Since
(ln
X
)
2
>
ln
X
for
X≥
3
,
2
This is a loose analysis. For
X ≥
3
, the upper bound on theﬁrst term is achieved when
p
ρ
x
is uniform on
X
, giving the maximumat
(ln
X
)
2
as shown in (12). The actual value of (11) is
0
for theuniform distribution.
we have
R
−
H
(
p
x
)
≤
α
,
∀
R
∈
[
H
(
p
x
)
,
ln
X
)
.Combining (10) and (12), for the case without sideinformation the theorem is proved.
3
C. Proof in General
The general proof is parallel:
Proof:
Once again, we expand
¯
E
0
(
ρ
)
and basiccalculus as shown in the appendix of [3], reveals that
¯
E
0
(
ρ
) =
d
¯
E
0
(
ρ
)
dρ
=
H
(¯
p
ρ
x

y
)
(13)and hence
¯
E
0
(0) =
H
(
p
x

y
)
. Thus:
¯
E
0
(
ρ
)
≤
ρH
(
p
x

y
) +
ρ
2
2
α
(14)where
α >
0
is any upper bound of
¯
E
0
(
ρ
1
)
that holds
∀
ρ
1
∈
[0
,
1]
. Substituting as before shows that
E
r
(
R
)
≥
max
0
≤
ρ
≤
1
ρR
−
ρH
(
p
x

y
)
−
ρ
2
2
α
=(
R
−
H
(
p
x

y
))
2
2
α
(15)for
R
−
H
(
p
x

y
)
≤
α
. To ﬁnd
α
, an upper bound on
¯
E
0
(
ρ
)
,
∀
ρ
∈
[0
,
1]
, we expand
¯
E
0
(
ρ
) =
dH
(
p
ρ
x

y
)
dρ
=
ddρ
y
∈Y
¯
p
ρ
y
(
y
)
H
(¯
p
ρ
x

y
=
y
)=
y
∈Y
¯
p
ρ
y
(
y
)
dH
(¯
p
ρ
x

y
=
y
)
dρ
+
y
∈Y
d
¯
p
ρ
y
(
y
)
dρH
(¯
p
ρ
x

y
=
y
)
(16)By basic calculus
4
, we have:
dH
(¯
p
ρ
x

y
=
y
)
dρ
=11 +
ρ
x
¯
p
ρ
x

y
(
x

y
)(ln ¯
p
ρ
x

y
(
x

y
))
2
−
11 +
ρ
H
(¯
p
ρ
x

y
=
y
)
2
(17)and,
y
∈Y
d
¯
p
ρ
y
(
y
)
dρH
(¯
p
ρ
x

y
=
y
)=
y
¯
p
ρ
y
(
y
)
H
(¯
p
ρ
x

y
=
y
)
2
−
H
(¯
p
ρ
x

y
)
2
(18)
3
Although the upper bound on
E
0
(
ρ
)
is not tight as we dropthe negative term in (11), it has the right order on
X
. For adistribution
p
=
{
12
,
12(
X−
1)
,...,
12(
X−
1)
}
, the evaluation of (11)is
∼
14
(ln
X
)
2
for large
X
, thus the upper bound in (12) of
(ln
X
)
2
has the right order.
4
The tedious details of the derivation are in the proofs of Lemma 10and Lemma 11, in the appendix of [3].
Substituting (17) and (18) in (16), we have
¯
E
0
(
ρ
)=11 +
ρ
y
¯
p
ρ
y
(
y
)[
x
¯
p
ρ
x

y
(
x

y
)(ln ¯
p
ρ
x

y
(
x

y
))
2
]
−
11 +
ρ
y
¯
p
ρ
y
(
y
)
H
(¯
p
ρ
x

y
=
y
)
2
+
y
¯
p
ρ
y
(
y
)
H
(¯
p
ρ
x

y
=
y
)
2
−
H
(¯
p
ρ
x

y
)
2
=11 +
ρ
y
¯
p
ρ
y
(
y
)[
x
¯
p
ρ
x

y
(
x

y
)(ln ¯
p
ρ
x

y
(
x

y
))
2
]+
ρ
1 +
ρ
y
¯
p
ρ
y
(
y
)
H
(¯
p
ρ
x

y
=
y
)
2
−
H
(¯
p
ρ
x

y
)
2
(19)Since
x
¯
p
ρ
x

y
(
x

y
) = 1
for any
y
∈ Y
, Lemma 1 tellsus,
x
¯
p
ρ
x

y
(
x

y
)(ln ¯
p
ρ
x

y
(
x

y
))
2
≤
α
(20)where
α
=
(ln
X
)
2
if
X≥
3ln2
if
X
= 2
It is clear that:
H
(¯
p
ρ
x

y
=
y
)
2
≤
(ln
X
)
2
≤
α,
∀
y
(21)Substituting (20) and (21) in (19) and dropping the lastterm in (19) which is negative, we have
¯
E
0
(
ρ
)
≤
11 +
ρ
y
¯
p
ρ
y
(
y
)
α
+
ρ
1 +
ρ
y
¯
p
ρ
y
(
y
)
α
=
α
(22)Since
(ln
X
)
2
>
ln
X
for
X ≥
3
, we have
R
−
H
(
p
x

y
)
≤
α
,
∀
R
∈
[
H
(
p
x

y
)
,
ln
X
)
. Combining (15)and (22), the general theorem is proved.
IV. C
ONCLUSIONS AND
F
UTURE
W
ORK
In this paper we have derived a universal lower boundto random source coding error exponents. This boundhas the quadratic form
a
(
R
−
h
)
2
, where
a
, determiningthe shape of the quadratic function, is determined by thesize of the source alphabet, and
R
−
h
is the excess ratebeyond the relevant entropy. It quantiﬁes the intuitiveidea that driving the probability of error to zero comesat the cost of either greater rate or longer blocklengths.These results are the source coding counterparts to thequadratic bounds on channel coding error exponents inExercise 5.23 of [7], which can also be tightened slightlyby using Lemma 1 as shown in [4]. Interestingly, thesideinformation alphabet size plays no role in the bound.Numerical investigation reveals that this bound isloose and so it remains an open problem to see if itcan be tightened while still maintaining an easy closedform expression. This will involve solving the nonconcave maximization problem in (11) exactly instead of dropping the negative term. We also suspect that similaruniversal bounds exist for all sorts of error exponents.It would be interesting to ﬁnd a uniﬁed treatment thatcould also give a universal bound on the error exponentfor lossy source coding investigated in [8].A
PPENDIX
A. Proof of Lemma 1Proof:
We prove Lemma 1 by solving the followingmaximization problem for
f
E
(
ω
)
with constraint
ω
∈S
J
.
max
ω
∈S
J
f
E
(
ω
) = max
ω
∈S
J
J
j
=1
ω
j
(ln
ω
j
−
E
)
2
We have one equality constraint
J j
=1
ω
j
= 1
and
J
inequality constraints,
ω
j
≥
0
,
∀
j
= 1
,
2
,...,J
,for the maximization problem. Note that
f
E
(
ω
)
is abounded differentiable function and
S
J
is a compact setin
R
J
. Thus, there exists a point in
S
J
, to maximizeit. We examine the necessary conditions for a point
ω
∗
, in
S
J
, to maximize
f
E
(
ω
)
. By the KarushKuhnTucker necessary conditions [2], there exist
γ
j
≥
0
,
j
= 1
,
2
,...,J
and
λ
≥
0
, s.t.
f
E
(
ω
∗
) +
J
j
=1
γ
j
ω
∗
j
+
λ
J
j
=1
ω
∗
j
= 0
γ
j
ω
∗
j
= 0
,
∀
j
= 1
,
2
,...,J
;
and
J
j
=1
ω
∗
j
= 1
That is,
(ln
ω
∗
j
)
2
+ 2(1
−
E
)ln
ω
∗
j
+
γ
j
+
λ
−
2
E
= 0
γ
j
ω
∗
j
= 0
,
∀
j
= 1
,
2
,...,J
;
and
J
j
=1
ω
∗
j
= 1
Note that
∂f
E
(
ω
)
∂ω
j

ω
j
=0
= (ln
ω
j
)
2
+ 2(1
−
E
)ln
ω
j

ω
j
=0
=
∞
∂f
E
(
ω
)
∂ω
j

ω
j
=0
= (ln
ω
j
)
2
+ 2(1
−
E
)ln
ω
j

ω
j
=0
<
∞
and thus to maximize
f
E
(
ω
)
, the
ω
∗
j
are strictly positive.Hence
γ
j
= 0
,
(ln
ω
∗
j
)
2
+ 2(1
−
E
)ln
ω
∗
j
+
λ
−
2
E
= 0
,
∀
j
and
J
j
=1
ω
∗
j
= 1
Since
ln
ω
∗
j
is a root of a quadratic equation
x
2
+2(1
−
E
)
x
+
λ
−
2
E
= 0
, this implies
ω
∗
j
can only be either