1
A Multisize Superpixel Approach for SalientObject Detection based on Multivariate NormalDistribution Estimation
Lei Zhu, Dominik A. Klein, Simone Frintrop, Zhiguo Cao
∗
, and Armin B. Cremers
Abstract
—This article presents a new method for salient objectdetection based on a sophisticated appearance comparison of multisize superpixels. Those superpixels are modeled by multivariate normal distributions in CIELab color space, which areestimated from the pixels they comprise. This ﬁtting facilitates anefﬁcient application of the Wasserstein distance on the Euclideannorm (
W
2
) to measure perceptual similarity between elements.Saliency is computed in two ways: on the one hand, we computeglobal saliency by probabilistically grouping visually similarsuperpixels into clusters and rate their compactness. On the otherhand, we use the same distance measure to determine local centersurround contrasts between superpixels. Then, an innovativelocally constrained random walk technique that considers localsimilarity between elements balances the saliency ratings insideprobable objects and background. The results of our experimentsshow the robustness and efﬁciency of our approach against 11recently published stateoftheart saliency detection methods onﬁve widely used benchmark datasets.
Index Terms
—Saliency detection, Multisize superpixels,Wasserstein distance, Centersurround contrasts, Cluster compactness, Random walk.
EDICS Category: 5. SMRHPM, 2. SMRSMD, 4.SMRRep, 33. ARSIIU, 8. TECMRS
I. I
NTRODUCTION
H
UMAN vision is usually capable of locating the mostsalient parts of the scene with a selective attentionmechanism [1]. From the perspective of computer vision,salient region detection is still challenging since the humanattention system has not been fully understood. However, animportant attribute which makes a region salient is that itstands out from its surroundings in one or more visual features.In recent years, saliency detection has become a major researcharea and many computational attention systems have been builtduring the last decade that are based on this centersurroundconcept [2]. Applications of saliency detection include objectdetection [3], [4], image retrieval [5], [6], image and video
compression [7], [8], as well as image segmentation [9], [10].
Copyright (c) 2013 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to pubspermissions@ieee.org.Lei Zhu and Zhiguo Cao are with the National Key Lab of Science andTechnology on Multispectral Information Processing, School of Automation,Huazhong University of Science and Technology, 430074 Wuhan, China. email: (zhulei.iprai@gmail.com, zgcao@mail.hust.edu.cn). Corresponding author is Zhiguo Cao.Dominik A. Klein, Simone Frintrop, and Armin B. Cremers are with theInstitute of Computer Science III, University of Bonn, 53117 Bonn, Germany.email: (
{
kleind, frintrop, abc
}
@iai.unibonn.de).
The classical approaches for saliency computation stemfrom the simulation of human attention mechanisms. Theseapproaches compute the saliency of a pixel as the differenceof a center and a surround regions, both of which are centeredat the pixel and can be rectangular or circular [11]–[13]. Therefore, we call these methods the
local saliency approaches
. Theselection of surrounding regions is always a difﬁcult problemfor pixelbased or regionbased methods due to the ambiguityof unknown object scales. A reasonable solution is the multiscale scheme that computes the centersurround response atseveral different scales [11], [12], [14], [15]. Some existing
approaches also explore the local contrast on single scale. Inthis case, the surroundings can be chosen as the maximumsymmetric surround [16] or regions of the entire image withspatial weighting [17], [18].Alternative approaches consider the occurrence frequencyof certain features in the whole image, i.e., salient objectsare more likely belonging to parts with rare observations inthe frequency domain [19], [20]. We call these approaches
the
global saliency approaches
. Zhai et al. [21] evaluate thepixellevel saliency by contrasting each pixel to all others.Achanta et al. [9] directly assign the salient value of a pixelwith the difference from the average color. By abstracting thecolor components, the global contrast is efﬁciently computedin [17] at pixel level. Global saliency is also investigated viathe visual organization rule, which can be computationallytransformed into rating the color distribution [22].Different from the methods based on the local or globalcontrast, some researchers work on the priors regarding thepotential positions of foreground and background mathematically or empirically. Gopalakrishnan et al. [23] represent animage as a graph and search the most salient nodes andthe background nodes using the random walk technique. Byanalyzing photographic images, Wei et al. [24] found thatpixels located on four boundaries of an image contain thebackground attributes and validated this prior on two populardatasets. Recently, the assumption of boundary prior wasinvestigated in several graphbased saliency models [25]–[28]and achieved impressive results.In this work, a new segmentbased saliency detectionmethod is proposed. We mainly address two problems thatare seldom discussed in previous work:1) Saliency models which take color information as theprimary feature often simply compute the region contrast asthe Euclidean distance between the average colors of regionsor as the histogrambased contrast. The former is efﬁcient and
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available athttp://dx.doi.org/10.1109/TIP.2014.2361024Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubspermissions@ieee.org.
2
reasonable especially when regions are organized as superpixels. However, it might be imprecise when large regions areconsidered. Conversely, the histogrambased contrast is moreprecise in this case but still suffers from parameter problemssuch as the number of bins and the selection of metric space.Instead, we represent the color appearance of superpixels bymultivariate normal distributions. This bases on the assumptionthat the color occurrences of the pixels in each region follow amultivariate normal distribution. This assumption is especiallywell suited for superpixels since the clustered pixels havesimilar properties in the selected feature space. The differencebetween two superpixels is measured with the Wassersteindistance on the Euclidean norm (
W
2
distance), which wasﬁrstly introduced to compute the pixelbased saliency in ourprevious work [29]. Additionally, we also propose a fastalgorithm to compute the
W
2
distance on Nd (N
≤
3
) normaldistributions.2) Holding a uniform saliency distribution of an objectinterior is difﬁcult in the local saliency computation that isbased on the centersurround principle. Typically, this problemcan be alleviated by combining multilayer saliency maps or,smoothing the single layer saliency map at pixel level [18].We propose a locally constrained random walk procedureto directly reﬁne the local saliency map at region level andachieve a more balanced rating inside of probable protoobjects. On the one hand, this approach can improve the ﬁnalcombination results. On the other hand, compared to the Gaussian weightbased upsampling [18], it avoids spreading theerror of saliency assignment to the background regions wheninappropriate Gaussian weights for controlling the sensitivityto color and position are selected.Thus, in a nutshell, the main contributions of this paper are
•
A new representation of superpixels by multivariate normal distributions and their comparison with the Wasserstein distance, which is consistently used throughout theapproach for local as well as global saliency computation.It enables to combine the advantage of the rich information of probability distributions to represent featurestatistics with a computationally efﬁcient method forrepresentation and comparison.
•
A novel saliency ﬂow method, which is a locally constrained random walk procedure to reﬁne the local saliency map. It achieves a more balanced rating inside of probable protoobjects and improves the performancesigniﬁcantly.II. R
ELATED
W
ORK
The detection of visual saliency is one of the two aspects of human visual attention: bottomup and topdown attention [1],[30]. Bottomup attention relates to the detection of salientregions in the perceptual data by purely analyzing this datawithout any additional information. Topdown attention on theother hand considers prior knowledge about a target, the context, or the mental state of the agent. While topdown attentionis an important aspect in human attention, prior knowledge isnot always available and many computational methods proﬁtfrom purely determining the bottomup saliency. Among theseapplication areas are object detection and segmentation, thatwe will consider here. Thus, we concentrate on the followingapproaches that deal with bottomup saliency detection.
A. Pixelbased Saliency
The local contrast principle assumes that the more differentan image region is compared to its local surround the moresalient it is. One of the ﬁrst pixelbased methods to detectsaliency in a biologically motivated way was introduced byItti et al. [11]. Their
Neuromorphic Vision Toolkit (iNVT)
computes the centersurround contrast at different levels inDoG scale space and searches the local maximum responseswith a WinnerTakeAll network. Harel et al. [31] extend theapproach of Itti by evaluating the saliency with a graphbasedmethod. In a recent approach, Goferman et al. [32] followseveral basic principles of human attention and assume that thepatches which are distinctive in colors or patterns are salient.The algorithm proposed by Achanta et al. [16] produces ansrcinal scale saliency map which can keep the boundariesof salient objects by accumulating the information of thesurrounding area of each pixel. Milanfar and Peyman [33]compute the centersurround contrast of each pixel using akind of local structure called
LSK
which is robust to thenoise and variation of luminance. The approach of Liu etal. [14] combines local, regional, and global features in a CRFbased framework. Li et al. [34] propose a method using theconditional entropy under the distortion to measure the visualsaliency, which is also a centersurrounding scheme.The pure global approaches assume that the more infrequentfeatures occur in the whole image, the more salient they are.In [19] and [35], Hou et al. assign higher saliency values to
those pixels which have higher response to the rare magnitudesin amplitude spectrum, and identify others as the redundantcomponents. However, Guo et al. [20] found the image’s phasespectrum is more essential than the amplitude spectrum toobtain the saliency map. Achanta et al. [9] also assume thatthe background has lower frequencies, and directly comparedeach pixel with the entire image in color space.The global principle only works well if the background isfree of uncommon feature manifestations. On the other hand,the local contrast principle involves the difﬁculty to estimatethe scale of a salient object. To avoid this problem, suchmethods usually deﬁne several ranges of a pixel’s neighborhood or construct a multilevel scale space of the originalimage. However, these local methods suffer more from theboundary blurring problem, since on unsuitable scales theforeground/background relation cannot be clearly decided.
B. Segmentbased Saliency
Segmentbased methods take homogeneous regions as thebasic element rather than pixels. Cheng et al. [17] segment theimage into regions with the algorithm proposed in [36], andobtain the saliency map by computing the distance betweenhistograms which are generated by mapping the color vectorsof each region into a 3D space. The same presegmentationmethod was also used in [13] and [37]. Instead of computing
the dissimilarity between regions directly, Wei et al. [13]
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available athttp://dx.doi.org/10.1109/TIP.2014.2361024Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubspermissions@ieee.org.
3
(c) Spatial distribution scoring(normalized to [0, 1])
(a) Multisize superpixel extraction
(h) Final saliency map(b) Multivariate normal distribution estimation(d) Local contrast with boundary weight(e) Locally constrained saliency flowing
Size level
t, t = 0,1,
…
,L
Size level
0
Distribution of featuresGlobal and local measurementsFusion of measures over all levels Basic element extraction
L
2
 norm Wasserstein distance
0. 639 0. 113
Exemplar Superpixel
Local SaliencyGlobal Saliency
(f) Global saliency map(g) Local saliency map
Fig. 1. The overall algorithm ﬂowchart of out method. The structure of the algorithm is exemplarily presented for two scales. (a): each region surroundedby the red curves refers to one superpixel. (b): each superpixel is represented by the multivariate normal distribution estimated from its pixels. Based on the
L
2
norm Wasserstein distance between every pair of superpixels, the local and global saliency are obtained. (c):
global saliency
computation: superpixels areclustered according to their color similarity and exemplar superpixels (cluster centers) are determined. The two images show exemplarily two of the clusters,the corresponding exemplar superpixels, and the cluster saliency scores that measure the spatial distribution of a cluster. (d) and (e): the
local saliency
iscomputed by a local contrast approach based on superpixels, which is further reﬁned by a saliency ﬂow step. (h): the ﬁnal saliency map is obtained by fusingthe global and local saliency maps ((f) and (g), respectively) over all scales.
obtain the saliency of an image window by computing thecost of composing the window from the remaining imageparts. Park et al. [37] merge regions with their neighborsrepeatedly according to the similarity of their HSV histograms,and update the saliency of joint regions in every combination.Ren et al. [38] ﬁrst extract the superpixels from the imagewhich are further clustered with GMM, and use the PageRank algorithm to obtain the superpixellevel saliency. Perazzi etal. [18] obtain the regionlevel saliency map by measuringthe color uniqueness and spatial distribution of each extractedsuperpixel. A ﬁner pixellevel saliency map is produced bycomparing each pixel with all superpixels in both color spaceand location. Wei et al. [24] ﬁrstly proposed the backgroundprior which assumes that the boundaries of an image caneffectively represent the background components. Followingthis idea, Yang et al. [25] consider saliency detection as agraphbased ranking problem and use the label propagation todetermine the regionlevel saliency. A similar graph model isemployed in [26], which casts saliency detection into a randomwalk problem in the absorbing Markov chain.III. M
ULTI

SIZE
S
UPERPIXEL

BASED
S
ALIENCY
D
ETECTION
We propose a superpixel based method for bottomup detection of salient image regions. An image is segmented into acompound of visually homogeneous regions at different scalelevels for representing its ﬁne details as well as large scalestructures. On each scale, two complementary approaches forthe determination of saliency are employed separately: 1) Ina global way, we measure the spatial compactness of similarlooking parts. Superpixels are more salient if they form amore coherent cluster within the image when categorized bytheir color appearances. 2) In a local way, we compute thecentersurround contrast at the superpixel level. The more asuperpixel differs from its surrounding ones, the more salientit is. Local contrast approaches usually grasp every popoutregion whose scale ﬁts the current centersurround structure.That is, isolated background regions with an appropriate scaleare also emphasized. In our work, the boundary prior [24] isused to eliminate the highlighted background regions. Furthermore, the local saliency map is reﬁned by a locally constrainedrandom walk procedure that dilutes saliency in the backgroundand likewise balances it inside potential objects.We assume that the appearance of pixels grouped into onesuperpixel is well expressed by the associated MLestimateof a multivariate normal distribution in CIELab space. Thisrepresentation enables to efﬁciently measure visual difference/similarity between superpixels using the Wasserstein distance on the Euclidean norm [29]. Figure 1 shows a ﬂowchartof our system.
A. Multisize Superpixel Extraction
We use the SLIC superpixel extraction method introducedin [39], which divides an image into adjacent segments of about the same size containing as homogeneous colors insideas possible. For a given number of superpixels, the image isinitially segmented into regularly sized grid cells. Then, iterative
KMeans
clustering is performed on a feature space thatcombines CIELab colors and pixel locations. This clusteringof nearby, similarlooking pixels reﬁnes the cells into superpixels. As mentioned in Section I, we extract superpixels at multi
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available athttp://dx.doi.org/10.1109/TIP.2014.2361024Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubspermissions@ieee.org.
4
size levels. This is achieved by repeating the SLIC algorithmwith different numbers of desired clusters, thus initializingwith a coarser or ﬁner grid. In our method, we increase thenumber of superpixels in steps of factor 2 between scale levels.The images in Figure 2a show examples of segmentationresults with different superpixel sizes. Notice that we ensurea minimal cell size of 100 pixels when initializing the grid,because with less pixels it becomes increasingly unlikely toget meaningful appearance distributions. In our experiments,we analyzed
L
= 3
scale levels.
B. Superpixel Representation and Superpixel Contrast
We express the occurrence of lowlevel features in eachsuperpixel by means of multivariate normal distributions. Asargued in Section I, the unimodal distribution assumption isappropriate for superpixels. Different to [29], who splits thefeature space into a onedimensional lightness plus a twodimensional color distribution, we directly use the srcinalthree dimensions of CIELab color space. For conversion fromRGB web images, we assume the D65 standard illuminant tobe most suitable. For the notations in the following sections,the
i
th
superpixel of scale
t
forms a set:
S
ti
=
N
S
(
µ,
Σ)
,c
S
=
xy
ti
(1)comprised of its feature distribution
N
S
ti
and spatial center
c
S
ti
in image coordinates.
1
Several measuring techniques for distribution contrasts suchas the KLdivergence [12], the Conditional Entropy [34]
and the Bhattacharyya distance [40] have been employedin previous methods to identify local differences. Recently,Klein and Frintrop [29] applied the Wasserstein distanceon the
L
2
norm between feature distributions gathered fromGaussian weighted, local integration windows. We continuethis idea, but instead, employ the Wasserstein metric to scorecontrasts between superpixels. The Wasserstein distance on theEuclidean norm in realvalued vector space is deﬁned as
W
2
(
χ,υ
) =
inf
γ
∈
Γ(
χ,υ
)
R
n
×
R
n

X
−
Y

2
d
γ
(
X,Y
)
12
,
(2)where
χ
and
υ
are probability measures on the metric space
(
R
n
,L
2
)
and
Γ(
χ,υ
)
denotes the set of all measures on
R
n
×
R
n
with marginals
χ
and
υ
. Brieﬂy worded, the Wasserstein distance represents the minimum cost of transformingone distribution into another, taking into account not only theindividual difference in each point of the underlying metricspace, but also how far one has to shift probability masses.In machine vision, the discretized
W
1
distance is also wellknown as
Earth Mover’s Distance
and widely used to comparehistograms.The calculation of Eq. (2) is very demanding for arbitrary,continuous distributions, but thankfully can be solved to a
1
Note that
N
S
denotes the normal distribution representing a superpixel,while
N
C
, that will be introduced in Section IIIC, denotes the normaldistribution representing a cluster.
more facile term in case of normal distributions. As introduced in [41]
2
, an explicit solution for multivariate normaldistributions
N
1
(
µ
1
,
Σ
1
)
and
N
2
(
µ
2
,
Σ
2
)
is
W
2
(
N
1
,
N
2
) =

µ
1
−
µ
2

2
+ tr
Σ
1
+ Σ
2
−
2
Σ
1
Σ
2
12
=

µ
1
−
µ
2

2
+ tr(Σ
1
) + tr(Σ
2
)
−
2tr
Σ
1
Σ
2
12
.(3)In general, there is no explicit formula to obtain the squareroot of an arbitrary
n
×
n
matrix for
n >
2
, which wouldlead to an iterative algorithm for determining
√
Σ
1
Σ
2
inEq. (3). However, noticing the relationship between the traceand the eigenvalues of a matrix, the trace of
√
Σ
1
Σ
2
can berepresented as
tr
Σ
1
Σ
2
=
n
k
=1
λ
Σ
1
Σ
2
(
k
)
12
,
(4)where
λ
Σ
1
Σ
2
(
k
)
is the
k
th
eigenvalue of
Σ
1
Σ
2
.Considering a
n
= 3
dimensional space such as CIELab,given a
3
×
3
matrix
A
, its characteristic polynomial can berepresented as
det(
λ
A
I
−
A
) =
λ
3
A
−
λ
2
A
tr(
A
)
−
12
λ
A
tr(
A
2
)
−
tr
2
(
A
)
−
det(
A
)
,(5)where
λ
A
is an eigenvalue of
A
.
λ
A
can be directly determinedusing a trigonometric solution introduced in [43] by makingan afﬁne change from
A
to
B
as
A
=
pB
+
qI
. (6)Thereby,
B
is a matrix with the same eigenvectors as
A
∀
p
∈
R
\
0
,q
∈
R
⇒
v
A
=
v
B
, (7)thus from the deﬁnition of eigenvalues it holds that
Def., Eqs. (6), (7)
⇐⇒
λ
A
=
p
·
λ
B
+
q
, (8)where
λ
B
is an eigenvalue of
B
.Choosing
p
=
tr((
A
−
qI
)
2
/
6)
and
q
= tr(
A
)
/
3
3
aswell as considering Eq. (5) to Eq. (8), the characteristicequation of
B
can be simpliﬁed to
det(
λ
B
I
−
B
) =
λ
3
B
−
3
λ
B
−
det(
B
) = 0
. (9)By directly solving Eq. (9), one can get all three eigenvaluesof
B
as
λ
B
(
k
) = 2cos
13 arccos
det(
B
)2
+ 2
kπ
3
, (10)where
λ
B
(
k
)
is the
k
th
eigenvalue of
B
with
k
= 0
,
1
,
2
.Thus, Eq. (3) can be applied to quickly compute meaningful
2
A slightly different term was later introduced in [42], claiming that Eq. (3)is only valid in case of commuting covariances. However, we could show thatboth solutions are equivalent, because
Σ
1
Σ
2
=
√
Σ
1
(
√
Σ
1
Σ
2
)
has the samecharacteristic polynomial as
(
√
Σ
1
Σ
2
)
√
Σ
1
, thus has the same eigenvalues.
3
This choice guarantees the validity of Eqs. (6) and (8) also in the specialcase
p
= 0
, since this would imply
A
=
qI
, thus it has a triple eigenvalue
λ
A
=
q
= tr(
qI
)
/
3
.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available athttp://dx.doi.org/10.1109/TIP.2014.2361024Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubspermissions@ieee.org.
5
(a)(b)Fig. 2. An example of multisize superpixel segmentation and the corresponding global saliency maps. (a): from left to right, the initial grid area insuperpixel extraction decreases approximately in steps of 2. (b): the imagesare the obtained global saliency maps corresponding to each scale in (a).
appearance distances between two superpixels using Eqs. (4),(6), (8), and (10).In the following, we use the
W
2
distance coherently inthe different aspects of saliency computation: it ﬁrst servesas a similarity measure in the clustering approach for globalsaliency computation (Section IIIC), second, it measures thelocal contrast of a superpixel to its neighbors (Section IIID),and third, it provides the similarity metric required for the random walk process that enables the saliency ﬂow computationintroduced in Section IIIE.
C. Global Saliency: The Spatial Distribution of Colors
In natural scenes, the colors of regions belonging to thebackground are usually more spatially scattered in the wholeimage than in salient regions. In other words, the more thecolor is spread, the less salient it is [22]. To determinethe spatial spreading, a further clustering is needed. This iscomputed much more efﬁciently on superpixels than wouldbe possible on pixel level, since there are much less elements.Thereby, the spatial distribution of colors can be estimated interms of a higher clusterofsuperpixels level by comparingthe spatial intracluster distances. GMM method is widelyused to represent the probabilities of color appearance, suchas in [14], [38], [44]. However, it may be inappropriate to
assign a ﬁxed number of clusters for different images, sincethis should depend on the image complexity. e.g., a clutteredscene has much more dominant colors than one showinga monotonous background. We employ the APC algorithm(Afﬁnity Propagation Clustering) introduced in [45] to identifyclusters. Here, it is not necessary to initialize the cluster centersas well as the number of clusters.APC is based on the similarities between elements (superpixels). It tries to minimize squared errors, thus in our method,we use
−
(
W
2
(
N
S
ti
,
N
S
tj
))
2
obtained by Eq. (3) between eachpair of superpixels
S
ti
and
S
tj
. Figure 1(c) shows exemplarilytwo identiﬁed clusters. Compatible to superpixels, the
k
th
cluster on scale
t
forms a set
C
tk
=
N
C
(
µ,
Σ)
,c
C
tk
. (11)APC selects so called
exemplar
superpixels to become clustercenters. Thus, we deﬁne the cluster appearance model
N
C
toequal the one of its corresponding exemplar superpixel. Thespatial center of a cluster in image coordinates is computedfrom a linear combination of superpixel centers weighted bytheir cluster membership probability:
c
C
tk
=
M(
t
)
i
=1
P
g
(
C
tk
S
ti
)
·
c
S
ti
M(
t
)
i
=1
P
g
(
C
tk
S
ti
)
, (12)where
M(
t
)
denotes the number of superpixels on scale
t
.Note that APC is also employed to group GMMs in [46].Different from that work, the inherent message exchangedin APC is further explored to facilitate the computation of
P
g
(
C
tk
S
ti
)
. The membership probability of a superpixel toeach cluster can be computed from its visual similarity to theexemplar of that cluster. Converting distances to similaritiesusing Gaussian function has been widely adopted by numerousmethods [18], [25], [26], [46], [47]. However, the falloff rate
of the exponential function is often selected empirically. In thissection, we take advantage of the messages that are propagatedbetween superpixels for directly determining the membershipprobabilities [48].Let
X
tk
denote the exemplar of cluster
C
tk
and then, let
r
(
S
ti
,
X
tk
)
denote the exchanged message named
responsibility
which represents how wellsuited superpixel
X
tk
is to serve asthe exemplar for superpixel
S
ti
. Actually,
r
(
S
ti
,
X
tk
)
implies thelogarithmic form of the cluster membership probability [45].Let
B
t
denote the set that is composed of all nonexemplarsuperpixels. We ﬁrst normalize all responsibilities between thesuperpixels in
B
t
and exemplar
X
tk
to
[
−
1
,
0]
(denoting as
ˆ
r
(
B
t
,
X
tk
)
) then exponentially scale them as
ˆ
r
e
(
B
ti
,
X
tk
) = exp
ˆ
r
(
B
ti
,
X
tk
)
Var
ˆ
r
(
B
t
,
X
tk
)
, (13)where
ˆ
r
(
B
ti
,
X
tk
)
refers to the normalized responsibility between the nonexemplar superpixel
B
ti
and exemplar
X
tk
, and
Var(
·
)
refers to the variance. For exemplars, we simply assigntheir scaled responsibilities as
ˆ
r
e
(
X
ti
,
X
tk
) =
1
,
if
i
=
k
0
,
otherwise. (14)Eqs. (13) and (14) construct the scaled responsibilities between all superpixels to each cluster. Then, the intraclusterprobabilities of each superpixel can be computed as
P
g
(
C
tk
S
ti
) = ˆ
r
e
(
S
ti
,
X
tk
)
K(
t
)
k
=1
ˆ
r
e
(
S
ti
,
X
tk
)
, (15)where
K(
t
)
is the number of clusters on scale
t
. Next, wecompute the probability of being salient for cluster
C
tk
. Thisprobability value is obtained by scoring the relative spatialspreading between the superpixels within the cluster:
P
g
(
sal
C
tk
) = 1
K(
t
)
j
=1
M(
t
)
i
=1
P
g
(
C
tk
S
ti
)
·
c
S
ti
−
c
C
tj

2
M(
t
)
i
=1
P
g
(
C
tk
S
ti
)
, (16)where
Sal
=
{
sal
,
¬
sal
}
is a binary random variable, indicating whether something is salient, that means, whether
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available athttp://dx.doi.org/10.1109/TIP.2014.2361024Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubspermissions@ieee.org.