Blockbased graphcut rate allocation for subband imagecompression and transmission over wireless networks
Maria Trocan
Institut Supérieurd’Electronique de Paris21 rue d’Assas, 75006 Paris
maria.trocan@isep.frBeatricePesquetPopescu
Telecom ParisTech3739 rue Dareau, 75013Paris
beatrice.pesquet@telecomparistech.frJames E. Fowler
Mississippi State UniversityU.S.A.
fowler@ece.msstate.eduCharles Yaacoub
USEKLiban
charlesyaacoub@usek.edu.lb
ABSTRACT
The compression of natural images and their transmissionover multihop wireless networks still presents many challenges for the researchers and industry. In this paper wepresent a new blockbased ratedistortion optimization algorithm that can encode eﬃciently the coeﬃcients of a critically sampled, nonorthogonal or even redundant transform.The basic idea is to construct a specialized graph such thatits minimum cut minimizes the energy functional. We propose to apply this technique for ratedistortion Lagrangianoptimization in blockbased subband image coding. Themethod yields good compression results compared to thestateofart JPEG2000 codec, as well as a general improvement in visual quality.
1. INTRODUCTION
Nowadays, the majority of imagecompression algorithmsuse wavelet transforms, attempting to exploit all the signal redundancy that can appear within and across the different subbands of a spatial decomposition. The wavelettransform has been succesfully used for image representation[1], due to its energy compaction capacities and compression eﬃciency [2]. However, eﬃciency of a coding schemehighly depends also on bit allocation. In order to maximizethe compression eﬃciency, highcomplexity subbandbasedimagecompression schemes, as the stateoftheart compression standard, JPEG2000 [1], may be used in wireless networks.In this paper we present a ratedistortion optimization basedon graph cuts, which can compress eﬃciently the coeﬃcientsof a critically sampled or even redundant, nonorthogonal
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for proﬁt or commercial advantage and that copiesbear this notice and the full citation on the ﬁrst page. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior speciﬁcpermission and/or a fee. Mobimedia’09, September 79, 2009, London,UK. Copyright 2009 ICST 9789639799622/00/0004 ... $5.00
transform. As described in [3, 4, 5], problems that arise incomputer vision can be naturally expressed in terms of energy minimization. Each of these methods consists in modelling a graph for an energy type, such that the minimumcut minimizes globally or locally that functional. Usually,these graph constructions are dense and complex, designing the energy function at pixel level. For example, in [6]the graph cut provides a clean, ﬂexible formulation for image segmentation. With a grid design, the graph provides aconvenient manner to encode simple local segmentation decisions and presents a set of powerful computational mechanisms to extract global segmentation from these simple local(pairwise) pixel similarities. Good energyoptimization results based on graph cuts were obtained in image restoration[7], as well as in motion segmentation [8], texture synthesisin image and video [9], etc. As it will be shown by the experimental results, the method gives good compression resultscompared to the stateoftheart JPEG2000 codec. The paper is organized as follows: Section 2 describes the solutionfor ratedistortion optimization using graphcuts, by modeling distortion energy interactions at block level. Some experimental results obtained with the proposed methods forboth waveletbased and edgeoriented contourletbased image coding are presented in Section 3. Finally, conclusionsand future work directions are drawn in Section 4.
2. IMAGECOMPRESSIONUSINGGRAPHCUTS
As mentioned in the introduction, we propose to use thegraphcut mechanism for the minimization of the rate distortion Lagrangian function and thus ﬁnd the optimal set of quantizers satisfying the imposed constraints. To this aim,we have designed a specialized graph able to represent asubband decomposition taking into consideration the correlations between subbands in a multiresolution approach.In the following, we express the Lagrangian functional asa discrete sum accumulating the contribution of each coding unit (subband or block) in terms of rate and distortioninduced by the quantization. Moreover, the graph model isplanar, and the energy function we intend to optimize is con
vex, so the minimum graph cut can be found in polynomialtime.
2.1 Graph design
Consider the weighted graph
G
= (
V,E,W
), with
V
vertices,
E
edges and positive edge weights
W
, which have not onlytwo, but a set of terminal nodes,
Q
∈
V
. Recall that asubset of edges
E
C
∈
E
is called a
multiway cut
if the terminal nodes are completely separated in the induced graph
G
(
E
C
) = (
V,E
− E
C
,W
) and no proper subset of
E
C
separates the terminals in
E
C
. If
C
is the cost of the multiway cut, then the multiterminal mincut problem is equivalent to ﬁnding the minimumcost multiway cut. For ouroptimization problem, the terminals are given by a set of quantizers
Q
, and the coding units give the rest of the vertices
V
−
Q
. The edges and their weights/capacities will bedeﬁned in the following depending on the coding strategy(subband or block coding) and the distortion functional.In [7], Y. Boykov
et al.
ﬁnd the minimal multiway cut bysuccesively ﬁnding the mincut between a given terminal andthe other terminals. This approximation guarantees a local minimization of the energy function that is close to theoptimal solution for both concave and convex energy functionals. As the ratedistortion Lagrangian lies on a convexcurve (i.e.
D
(
R
)), we propose to use the method in [7] forits optimization.
2.2 Lagrangian ratedistortion functional
Consider the problem of coding an image at a maximal rate
R
max
with a minimal distortion
D
. Each image consists of aﬁxed number of coding units (spatial subbands or blocks of coeﬃcients), each of them coded with a diﬀerent quantizer
q
i
,
q
i
∈
Q
(
Q
being the quantizers set). Let
D
i
(
q
i
) be thedistortion of the coding unit
i
when quantized with
q
i
, andlet
R
i
(
q
i
) be the number of bits required for its encoding.The problem can now be formulated as: ﬁnd min
i
D
i
(
q
i
)
,
such that
i
R
i
(
q
i
) =
R
≤
R
max
.In the Lagrangemultiplier framework, this constrained optimization is written as the equivalent problem:min
i
(
D
i
(
q
i
) +
λR
i
(
q
i
))
, R
≤
R
max
(1)where the choice of the Lagrangian parameter
λ >
0 measures the relative importance between distortion and ratefor the optimization and which can be determined using abinary search. The advantage of problem formulation inEq. (1) is that the sum and the minimum operator can beexchanged to:
i
min(
D
i
(
q
i
) +
λR
i
(
q
i
))
, R
≤
R
max
(2)This formulation obviously reveals that the global optimization can now be carried out independently for each codingunit, making an eﬃcient implementation feasible.
2.2.1 Rate estimation
For the rate estimation of the quantized coding units weconsider a noncontextual arithmetic coder [10], which usesa zeroorder entropy model, where the
M
quantized coefﬁcients of a given coding unit are random i.i.d. variablesfollowing a Gaussian distribution. Thus, the zeroorder entropy (
H
) estimation in bits/variable (i.e., coeﬃcient) is obtained as:
H
=
−
M
i
=1
p
i
log
2
p
i
,
(3)where
p
i
is the probability of the
i
th
coeﬃcient. The resulting entropy estimate per coding unit is weighted by the sizeof the coding unit in order to obtain the total entropy of thequantized image.
2.2.2 Distortion estimation
The distortion
D
between the srcinal image
x
and the quantized one,
x
, is estimated in the following as the
L
2
norm,i.e. :
D
=
x
−
x
2
.
(4)This model will be futher developed, in order to obtain agood distortion estimate in the spatial domain, rather thanin the transform domain, as is usually done for orthonormaltransforms.
2.3 Graph design with crosscorrelation distortion at the block level
Recall that we have written the distortion
D
between thesrcinal image,
x
, and the quantized one,
x
, as the
L
2
norm,i.e.
D
=
x
−
x
2
. In a ﬁrst approximation [11], we haveconsidered only the diagonal terms, i.e.:
D
I
∼
=
i
x
i
−
x
i
2
(5)which amounts to estimating the distortion between the contribution to the image and to the quantized image of onlythe
i
th
subband.In a second approximation, we have also considered the
crosscorrelation
terms, i.e.:
D
∼
=
D
I
+
i
i
′
∈N
(
i
)
x
i
−
x
i
,
x
i
′
−
x
i
′
(6)where
N
(
i
) is a neighborhood of
i
, containing closely correlated subbands. Indeed, given the limited support of thewavelets, the closer in scale and frequency are the subbands,the higher the correlation among them. In practice, thisneighborhood could be described by the geometrical positionof the subbands in a multiresolution decomposition (whereonly the vertical and horizontal directions are considered),or by simply linking the subbands in a chainmanner, oneafter another (for example, in Fig. 1, the neighborhood relations are indicated by the black edges in the graph). Thus,Eq. (6) can be written as:
D
=
i
x
i
−
x
i
2
D
i
+
i
i
′
∈N
(
i
)
x
i
−
x
i
,
x
i
′
−
x
i
′
D
i,i
′
(7)We have shown in [12] that in this case, the function to be
Figure 1: Contourlet decomposition with three levels (left) and threeway graphcut repartition (right) (
q
1
partition in red,
q
2
partition in green,
q
3
partition in blue, where the regular edges are with full black lines,terminal links in colors and the cutedges in gray lines).
minimized is:min
i
x
i
−
x
i
2
+
λR
(
i
)
E
data
(
i
)
+
i
′
∈N
(
i
)
x
i
−
x
i
,
x
i
′
−
x
i
′
E
smooth
(
i
)
(8)In the following, we propose to extend the subband leveldistortion estimation presented in [12] to the block level(Fig. 2). This extension comes naturally, as the smallerthe coding unit, the more correlated in amplitude are thecoeﬃcients within it. At block level, Eq. (8) becomes:min
X
i
=1
N
b
j
=1
x
i,j
−
x
i,j
2
+
λR
(
i,j
)
E
data
(
i,j
)
+
(
i
′
,j
′
)
∈N
(
i,j
)

x
i,j
−
x
i,j
,
x
i
′
,j
′
−
x
i
′
,j
′

E
s
mooth
(
i,j
)
(9)where
X
, respectively
N
b
represent the number of subbands,respectively blocks in each subband,
x
i,j
denotes the imagereconstucted only from the
j
th
block of the
i
th
subband and
x
i,j
−
x
i,j
,
x
i
′
,j
−
x
i
′
,j
′
measures the correlation betweenthe neighbour blocks.The minimization of the energy function deﬁned above isequivalent to the best partition of quantizers per subbandsblocks. Note that for
E
smooth
we have used the sum of absolute values of crosscorrelation terms, in order to ensure that our regular vertices will have associated positiveweights. Our graph will have therefore
B
=
X
×
N
b
−
1regular vertices. The neighbourhood system,
N
, contains
Figure 2: Block graph design: twolevel wavelet decomposition with fourblocks subband division andchain network design for the regular vertices.
now only position correlation links
E
N
(i.e., edges betweenneighbour blocks, as described in Fig.2). The geometricalmodel can be described as:
G
= (
V,E
) where
V
=
B
∪
Q
,
E
=
E
N
∪
E
Q
and
Q
/
E
Q
represent the quantizers set/thelinks between block nodes and quantizers. For the terminallinks,
E
Q
, the weights are given by the direct costs in termsof distortion and rate induced by the quantization (i.e., theedge between block
b
and quantizer
q
, (
b,q
), has the associated weight
w
b,q
=
D
b
(
q
) +
R
b
(
q
)). The capacity betweentwo regular neighbour blocks ((
b
i
,b
i
′
)
∈
E
N
) is deﬁned asthe absolute value of the crosscorrelation distortion inducedby the current quantization of these blocks.
3. APPLICATION TO SUBBAND IMAGECOMPRESSION
In the following, we propose to apply the proposed graphcutminimization model to subband image compression. Someresults are drawn in the framework of classical separable
wavelet image coding, as well as for a geometrical transform,namely the contourlet decomposition [13]. Note that themethod can be applied to almost any existing decomposition(wavelets, Xlets, subbands, blocks, may them be criticallysampled / redundant etc.).
3.1 Wavelet image compression with graphcuts
Due to their energy compaction eﬃciency, the biorthogonal ﬁlter banks are the most used in image compression [1].This is the reason for which we consider in our simulationframework both the 5/3 and 9/7 ﬁlter banks for the spatialdecomposition.
3.1.1 Experimental results
For our simulations, we have considered two representative test images: Barbara (512x512 pixels) and Mandrill(512x512 pixels), which have been selected for the diﬃcultyto encode their texture characteristics.We have used deadzone scalar quantization, with
q
∈ {
2
0
,...,
2
10
}
. The deadzone has twice the width of the otherquantization intervals. All the images have been decomposed over ﬁve spatial levels with the ﬂoatingpoint 5/3 and9/7 ﬁlter banks. Note that for rate estimation in the allocation algorithm we have used a simple (noncontextual) arithmetic coder [10], while JPEG2000 codec [1] uses a highly optimized contextual coder. The JPEG2000 results have beenobtained with the Kakadu framework.
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.42020.52121.52222.52323.52424.525
Mandrill (512x512)
Bitrate (bpp)
P S N R ( d B )
GCC−Simple distortionGCC−Cross−correlated distortionGCC−Block cross−correlated distortion9/7 JPEG2000
Figure 3: Ratedistortion comparison for Mandrillimage with 9/7 wavelet subband decomposition
As it can be remarked from Fig. 3 and Fig. 4, the resultsobtained with the 9/7 wavelet subband decompostion of JPEG2000 are between 0.5 and 1.5 dB higher than thoseobtained with the proposed graphcut ratedistortion algorithm. This situation can be explained by the fact that the9/7 ﬁlter bank is very close, from an energy partition pointof view, to an orthonormal decomposition. As illustratedin Fig. 5 and Fig. 6, our method seems to better cope withnonorthogonal decompositions at very low bitrates (
≤
0
.
1bpp).
0.050.10.150.20.250.30.350.422232425262728293031
Barbara (512x512)
Bitrate (bpp)
P S N R ( d B )
GCC−Simple distortionGCC−Cross−correlated distortionGCC−Block cross−correlated distortion9/7 JPEG2000
Figure 4: Ratedistortion comparison for Barbaraimage with 9/7 wavelet subband decomposition
0.10.150.20.250.30.350.40.4520.52121.52222.52323.52424.525
Mandrill (512x512)
Bitrate (bpp)
P S N R ( d B )
GCC−First order subband distortionGCC−Cross−correlated subband distortionGCC−Cross−correlated block distortion5/3 JPEG2000
Figure 5: Ratedistortion comparison for Mandrillimage with 5/3 wavelet subband decomposition
One can remark that distortion approximation at subbandlevel taking into account the crosscorrelation among subbands always leads to better results than the simple modelwithout crosscorrelation terms, by using a more realisticcorrelation model. Moreover, the ﬁner level of represention for the coding units, the higher the correlation amongthese units, as it can be remarked from the presented results, having an average gain of 0.25 dB over the preceedingratedistortion curve obtained with a subbandlevel crosscorrelated distortion model.
0.10.150.20.250.30.350.4222324252627282930
Barbara (512x512)
Bitrate (bpp)
P S N R ( d B )
GCC−First order subband distortionGCC−Cross−correlated subband distortionGCC−Cross−correlated block distortion5/3 JPEG2000
Figure 6: Ratedistortion comparison for Barbaraimage with 5/3 wavelet subband decomposition
3.2 Contourletimagecompressionwithgraphcuts
The drawback of separable wavelets is the limited orientation selectivity, as they fail to capture the geometry of theimage edges. In order to overcome the problem of edge representation, Minh N. Do and Martin Vetterli have deﬁned anew family of geometrical wavelets, called contourlets [13].With contourlets, one can represent the class of smooth images with discontinuities along smooth curves in a very eﬃcient and sparse way. These decompositions have been successfully applied in image segmentation and noise removal,as well as in image compression: as shown in [14], the codecbased on wedgelets gives better performance in image compression than the JPEG2000 standard at very low rate.
3.2.1 Experimental results
For a better comparison, we have considered the same testimages: Barbara (512x512 pixels) and Mandrill (512x512pixels). We have used deadzone scalar quantization, with
q
∈ {
2
1
,...,
2
10
}
and a 5level contourlet decomposition,where the coarsest three decomposition levels consist of a9/7 separable wavelet transform (i.e., 3 directions), and theﬁnest two levels are represented with a 16 and 32bandbiorthogonal directional ﬁlter. The eﬃciency of this hybridscheme has been proved in [15].As shown in Figs. 7 and 8, our method surpasses JPEG2000at low bitrates, even though it employs a redundant transform. Note that for the rate estimation in the allocation algorithm we have used a simple (noncontextual) arithmeticcoder [10], while JPEG2000 codec uses highly optimized contextual coder.
4. CONCLUSION
In this paper we have presented a blockbased graphcutmethod for ratedistortion optimization in image coding.Its great advantage is that it can be applied to decompositions which are not necessarily orthonormal. As shown byexperimental results, it can eﬃciently encode both wavelet
0.050.10.150.20.250.30.352020.52121.52222.52323.52424.525
Mandrill (512x512)
Bitrate (bpp)
P S N R ( d B )
GCC−First order subband distortion approx.GCC−Cross−correlated subband distortion approx.GCC−Cross−correlated block distortion approx.JPEG2000
Figure 7: Ratedistortion comparison for Mandrillimage with contourlet subband decomposition
0.10.150.20.250.30.352425262728293031
Barbara (512x512)
Bitrate (bpp)
P S N R ( d B )
GCC−First order subband distortionGCC−Cross−correlated subband distortionGCC−Cross−correlated block distortionWavelets J2K
Figure 8: Ratedistortion comparison for Barbaraimage with contourlet subband decomposition
and contourlet coeﬃcients compared to standard RD coding tools, enhancing thus the wireless transmission eﬃciency.Moreover, the proposed method could be further used withvector quantizers.
5. REFERENCES
[1] “Information technology – JPEG 2000 image codingsystem,”Tech. Rep., ISO/IEC 154441, 2000.[2] S. G. Mallat,“A theory for multiresolution signaldecomposition: The wavelet representation,”
IEEE Transactions on Pattern Analysis and Machine Intelligence
, vol. 11, pp. 674–693, 1989.[3] V. Kolmogorov and R Zabin,“What energy functionscan be minimized via graph cuts,”
IEEE Transactions on Pattern Analysis and Machine Intelligence
, vol. 26,pp. 147 – 159, 2004.[4] Y. Boykov and V. Kolmogorov,“An experimental