Food & Beverages

A Relevance Feedback Approach for Content Based Image Retrieval Using Gaussian Mixture Models

Description
A Relevance Feedback Approach for Content Based Image Retrieval Using Gaussian Mixture Models
Published
of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  S. Kollias et al. (Eds.): ICANN 2006, Part II, LNCS 4132, pp. 84   –   93, 2006. © Springer-Verlag Berlin Heidelberg 2006 A Relevance Feedback Approach for Content Based Image Retrieval Using Gaussian Mixture Models Apostolos Marakakis 1 , Nikolaos Galatsanos 2 , Aristidis Likas 2 , and Andreas Stafylopatis 1 1   School of Electrical and Computer Engineering, National Technical University of Athens, 15773 Athens, Greece 2 Department of Computer Science, University of Ioannina, 45110 Ioannina, Greece el00077@central.ntua.gr, galatsanos@cs.uoi.gr, arly@cs.uoi.gr, andreas@cs.ntua.gr Abstract.  In this paper a new relevance feedback (RF) methodology for content based image retrieval (CBIR) is presented. This methodology is based on Gaussian Mixture (GM) models for images. According to this methodology, the GM model of the query is updated in a probabilistic manner based on the GM models of the relevant images, whose relevance degree (positive or negative) is provided by the user. This methodology uses a recently proposed distance metric between probability density functions (pdfs) that can be computed in closed form for GM models. The proposed RF methodology takes advantage of the structure of this metric and proposes a method to update it very efficiently based on the GM models of the relevant and irrelevant images characterized by the user. We show with experiments the merits of the proposed methodology. 1 Introduction The target of content-based image retrieval (CBIR) is to retrieve relevant images from an image database based on their visual content. Users submit one or more example images for query. Then, the CBIR system ranks and displays the retrieved results in order of similarity. Most CBIR systems ([1] – [8]) represent each image as a combination of low-level features, and then define a distance metric that is used to quantify the similarity between images. A lot of effort has been devoted in developing features and strategies that capture human perception of image similarity in order to enable efficient indexing and retrieval for CBIR, see for example [5],[9],[10] and [16]. Nevertheless, low-level image features have a hard time capturing the human perception of image similarity. In other words, it is difficult using only low-level images features to describe the semantic content of an image. This is known in the CBIR community as the semantic gap problem and for a number of years it has been considered as the “holy grail” of CBIR [11]. Relevance feedback (RF), has been proposed as a methodology to ameliorate this problem, see for example [1] - [3] and [6] - [8]. RF attempts to insert the subjective human perception of image similarity into a CBIR system. Thus, RF is an interactive process that refines the distance metric of a query interacting with the user and taking into account his/her preferences. To accomplish this, during a round of RF users are   An RF Approach for CBIR Using GM Models 85 required to rate the retrieved images according to their preferences. Then, the retrieval system updates the matching criterion based on the user’s feedback, see for example [1] – [3], [6] – [8], [15] and [16]. Gaussian mixtures (GM) constitute a well-established methodology to model probability density functions (pdf). The advantages of this methodology such as adaptability to the data, modeling flexibility and robustness have made GM models attractive for a wide range of applications ([17] and [18]). The histogram of the image features is a very succinct description of an image and has been used extensively in CBIR, see for example [4] and [9]. As mentioned previously GM provide a very effective approach to model histograms. Thus, GM models have been used for the CBIR problem ([4], [14] and [17]). The main difficulty when using a GM model in CBIR is to define a distance metric between pdfs that separates well different models, and that can be computed efficiently. The traditionally used distance metric between pdfs the Kullback-Liebler (KL) distance cannot be computed in closed form for GM models. Thus, we have to resort to random sampling Monte-Carlo methods to compute KL for GMs. This makes it impractical for CBIR where implementation time is an important issue. In [14] the earth movers distance (EMD) was proposed as an alternative distance metric for GM models. Although the EMD metric has good separation properties and is much faster to compute than the KL distance (in the GM case) it still requires the solution of a linear program. Thus, it is not computable in closed form and is not fast enough for a CBIR system with RF. In this paper we propose the use for RF of an alternative distance metric between pdfs which was recently proposed in [21]. This metric can be computed in closed form for GM models. In this paper we propose an efficient methodology to compute this metric in the context of RF. In other words, we propose a methodology to update the GM model of the image query based on the relevant images. Furthermore, we propose an effective strategy that requires very few computations to update this distance metric for RF. The rest of this paper is organized as follows: in section 2 we describe the distance metric. In section 3 we present the proposed RF methodology based on this metric. In section 4 we present experiments of this RF methodology that demonstrate its merits. Finally, in section 5 we present conclusions and directions for future research. 2 Gaussian Mixture Models for Content-Based Image Retrieval GM models have been used extensively in many data modeling applications. Using them for the CBIR problems allows us to bring to bear all the known advantages and powerful features of the GM modeling methodology, such as adaptability to the data, modeling flexibility, and robustness that make it attractive for a wide range of applications ([18] and [19]). GM models have been used previously for CBIR, see for example [4] and [14], as histograms models of the features that are used to describe images. A GM model is given by ( )  ( ) 1 K ijij j  pxx  π φ θ  = =  ∑   (1)    86 A. Marakakis et al. where K   is the number of components in the model, 01  j π  ≤ ≤  the mixing probabilities of the model with 1 1 K  j j π  = = ∑ , and ( )  ( ) :, ijijjj  xNx  φ θ θ µ  ⎡ ⎤ = = Σ ⎣ ⎦ a Gaussian pdf with mean  j  µ   and covariance  j Σ . In order to describe the similarity between images in this context a distance metric must be defined. The Kullback-Leibler (KL) distance metric is the most commonly used distance metric between pdfs, see for example [10]. However, the KL distance cannot be computed in closed form for GMs. Thus, one has to resort to time consuming random sampling Monte Carlo methods. For this purpose a few alternatives have been proposed. In [14] the Earth Movers Distance (EMD) metric between GMs was proposed. This metric is based on considering the probability mass of one GM as piles of earth and of the other GM as holes in the ground and then finding the least work necessary to fill the wholes with the earth in the piles. EMD is an effective metric for CBIR however it cannot be computed in closed form and requires the solution of a linear program each time it has to be computed. This makes it slow and cumbersome to use for RF. In order to ameliorate this difficulty a new distance metric was proposed in [21]. This metric between two pdfs ( ) 1  px   and ( ) 2  px   is defined as ( )( ) ( )( ) ( ) 12122212 22,log  pxpxdx Cpp pxdxpxdx  = −+ ⎡ ⎤⎢ ⎥⎣ ⎦ ∫∫ ∫   (2)   and can be computed in closed form when ( ) 1  px   and ( ) 2  px   are GMs. In this case it is given by ( )( ) ( ) ( ) ( ) ( ) ( ) 121122 1212,,121211221122,,,,1122 ,22,log,, ijkijijijijijkijkijijijijij VijeCppVijVijee π π π π π π  Σ Σ=−+Σ Σ Σ Σ ⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦ ∑∑ ∑   (3)   where ( ) 111 (,) mlmilj Vij −− − = Σ +Σ , ( )  ( )  ( ) 1 (,) T mlmiljmiljmilj kij  µ µ µ µ  − = − Σ +Σ − , mi π   the mixing weight of the i-th Gaussian kernel of m  p , and, finally, , mimi  µ   Σ  are mean and covariance matrices for the kernels of the Gaussian mixture m  p .   An RF Approach for CBIR Using GM Models 87 3 Relevance Feedback Based on the C2   Metric For a metric to be useful in RF, it is crucial to be easily updated based on the relevant images provided by the user. Thus, assume we have a query modeled as () GMMq , and the database images modeled by ( ) i GMMd   for 1,..., iN  = . The search based on this query requires the calculation of a 1  N   ×  table of the distances 2(,) i Cqd  . Also assume that from the retrieved images the user decides that the images with models () m GMMr   1,2 mM  =  …  are the most relevant and desires to update his query based on them. One simple and intuitive way to go about it is to generate a new GM model given by ( ) ( ) ( ) ( ) 1 '1  M mmm GMMqGMMqGMMr  λ  = = −Λ + ∑   (4)   where 01 m λ  ≤ ≤ , 1  M mm λ  = = Λ ∑ , 0<1 Λ < , and is the relevance that is assigned to the image m r   by the user. The attractive feature of the model in Eq. (4) is that relevance m λ   has a physical meaning; it is proportional to the relevance degree assigned by the user and this defines a “composite GM model” that also includes the user preferences. Furthermore, it is desirable to be able to efficiently compute the distances between the entries ( ) i GMMd   for 1,2 iN  =  …  and the new query model ( ) GMMq ′ . Based on Eq. (3), the distance C2 is composed by sums of the type ( ) ( ) ,, , ml mlmlmiljkijijmilj VijS e π π  =Σ Σ ∑  where m, l indicate the GM models and i, j the Gaussian components. Based on Ε q. (4) the update of the distance measure for the new query ' q  is given by:. ( )( ) ( ) 2''' 2122(',)log121 r qirir iqqrqrr iirrr rrr  SS Cqd SSSS  λ λ λ λ  ⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦ −Λ +=−−Λ + −Λ + + ∑∑ ∑∑   (5)   The relevant images, indicated by r   are the database images selected by the user. Since we can a priori compute (and store) the ij S   for all the images of the database and since all qi S   have already been computed in the previous query, the computation of the distance between ( ) GMMq ′  and the database image models is very fast since it involves only rescaling operations based on the relevance probabilities r  λ  . Another nice property (for relevance feedback) of the model in Eq. (4), is that it can be generalized for any 2() Cq  which models distance between histograms. In other  88 A. Marakakis et al. words, the pdfs ( ) 1  px   and ( ) 2  px    need not be GMs and could be even simple histograms. The images retrieved by the system at each retrieval epoch that are not selected by the user as relevant, can be regarded as irrelevant. The determination of the irrelevant images could also be done in a more sophisticated manner that involves explicit selection by the user. Thus, using such images negative feedback can be provided and exploited to update the query. We can thus define in a way similar to Eq. (4) an updated query (') GMMn  for the irrelevant images: ( ) (')1()() nnnmmm GMMnGMMnGMr  λ  = −Λ + ∑  (6) where ,' nn  correspond to the negative query and , nnm λ  Λ  are analogous to the previously mentioned , m λ  Λ . The best images to retrieve can be found by combining both positive and negative RF. This can be done by minimizing the following distance metric: ( )  ( ) (,)1(1(,))  pospos ciadqiadni = + − −  (7) where ( ) ( )( ) 2,,max2, iii Cqd dqiCqd  =  and ( ) ( )( ) 2,,max2, iii Cnd dniCnd  =  with 01  pos a ≤ ≤ . After computing the metric ( ) ci  for every database image, we can retrieve the images with the smallest value for this measure. These images will have the property of being near to the user ideal query, which is determined by the initial query and the positive examples, and far away of the user negative examples. 4 Experimental Results In order to test the validity of this approach we used about 1000 annotated low resolution images from the image database in [22]. These images have been manually separated into 12 semantic categories according to their content (e.g. bears, butterflies, earth pictures etc). The features extracted by the images pixels correspond to the color scheme CIE-Lab ([14]). The GM parameters for each image were estimated with the very popular EM algorithm which for robustness was initialized with multiple runs of k-means algorithm. The number of components for every image was chosen empirically to be 5. In all the experiments, we chose to use full covariance for the GM components. A simple graphical user interface has been developed in order to visualize the results of our relevance feedback scheme. The user can choose the number of images which the system will retrieve at each round, the value of parameters , n Λ Λ , the positive examples weight  pos a  and the database image which
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks