Documents

ijfs11-2-r-3-Miyamoto(IJFS20100601-00430.pdf

Categories
Published
of 9
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Description
International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011 ©2011 TFSA 89 Different Objective Functions in Fuzzy c-Means Algorithms and Kernel-Based Clustering Sadaaki Miyamoto Abstract 1 An overview of fuzzy c-means clustering algo- rithms is given where we focus on different objective functions: they use regularized dissimilarity, en- tropy-based function, and function for possibilistic clustering. Classification functions for the objective funct
Transcript
   International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011  © 2011 TFSA 89   Different Objective Functions in Fuzzy  c -Means Algorithms and Kernel-Based Clustering Sadaaki Miyamoto Abstract 1   An overview of fuzzy  c -means clustering algo-rithms is given where we focus on different objective functions: they use regularized dissimilarity, en-tropy-based function, and function for possibilistic clustering. Classification functions for the objective functions and their properties are studied. Fuzzy  c -means algorithms using kernel functions is also discussed with kernelized cluster validity measures and numerical experiments. New kernel functions derived from the classification functions are more-over studied.  Keywords: cluster validity measure, fuzzy c-means  clustering, kernel functions, possibilistic clustering. 1. Introduction Fuzzy clustering is well-known not only in fuzzy community but also in the related fields of data analysis, neural networks, and other areas in computational intel-ligence. Among various techniques of clustering using fuzzy concepts [16, 23, 30, 37], the word of fuzzy clus-tering mostly refers to fuzzy c -means clustering by Dunn and Bezdek [1, 2, 6, 7, 8, 13]. This paper gives an over-view of this method. Nevertheless, we adopt a non-standard formulation . That is, we begin from three different objective functions, and none of them is exactly the same as the one by Dunn and Bezdek. Comparing different objective functions and their so-lutions, we find theoretical properties of fuzzy c -means clustering: different  fuzzy classifiers  are derived from different solutions. Moreover generalization including a “cluster size” variable and a “covariance'” variable is developed. This generalization is shown to be closely related to mixture distributions. Kernel-based fuzzy c -means clustering is moreover studied with associated cluster validity measures. Many numerical simulations are used to evaluate whether or Corresponding Author: Sadaaki Miyamoto is with the Department of Risk Engineering, the University of Tsukuba, Ibaraki 305-8573, Ja- pan. E-mail: miyamoto@risk.tsukuba.ac.jp Manuscript received June 2010; revised Nov. 2010; accepted Dec. 2010. not the kernelized measures are adequate for ordinary  ball-shaped clusters. Finally, a new class of kernel functions is proposed; they are derived from fuzzy c -means solutions. Illustra-tive examples are given. 2. Fuzzy  c -Means Clustering We first give three objective functions. Possibilistic clustering [18] is included as a variation of fuzzy c -means clustering. A.   Preliminary consideration Let objects for clustering be points in the  p -dimensional Euclidean space. They are denoted by 1 (,,)  ppkkk   xxxR = … ∈   ( 1,, kN  = … ). A generic point 1 (,,)  p  xxx = …    implies a variable in  p  R . We assume c  clusters; cluster centers are denoted by i v  ( 1,, ic = … ). We write 1 (,,) c Vvv = …  as the collec-tion of all cluster centers. The dissimilarity between an object and a cluster cen-ter is the squared Euclidean distance: 2 (,). kiki  Dxvxv = − ∥ ∥   (1)   We sometimes write (,) kiki  DDxv =  for simplicity. Moreover (,) i  Dxv  means that variable  x  is substi-tuted into object k   x . () ki Uu =  is the membership matrix: ki u  means the degree of belongingness of k   x  to cluster i . Crisp and fuzzy c -means clustering are based on the minimization of objection functions. Crisp c -means clustering [21] uses the following: 11 (,)(,) cN  Hkikiik   JUVuDxv = = = ∑∑   (2)   Alternate minimization with respect to one of (,) UV  , while another variable is fixed, is repeated until conver-gence [1]. Minimization with respect to U   uses the following constraint: 1 {():1;0,,}. ckikikji  MUuuukj = = = = ≥ ∀ ∑   (3)     International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011  90 We consider three objective functions: 11 (,)(){(,)},(1,0), cN m Bkikiik   JUVuDxvm ε ε  = = = +> ≥ ∑∑   (4)   111 (,){(,)(log1)},(0), cN  Ekikikikiik   JUVuDxvuu λ λ  −= = = + −> ∑∑ (5)   111 (,){()(,)(1)},(0). cN mmPkikikiik   JUVuDxvu ζ ζ  −= = = + −> ∑∑ (6)   All above are different from the srcinal function pro- posed by Dunn [7, 8] and Bezdek [1, 2]. (,)  B  JUV   has a nonnegative parameter ε   proposed by Ichihashi [28]. When 0 ε   = , (,)  B  JUV   is the srcinal objective function. (,)  E   JUV   has an additional term of entropy. The use of entropy in fuzzy c -means clustering has  been proposed by a number of researchers, e.g.,[19, 20, 24]. (,) P  JUV   has been proposed by Krishnapuram and Keller [18] for possibilistic clustering. This function can also be used for fuzzy c -means with constraint (3) when 2 m  = . We use alternate minimization procedure FCM in the following, where (,)  JUV   is either (,)  B  JUV  , (,)  E   JUV  , or (,) P  JUV  . Minimization with respect to U   is with constraint (3). FCM Algorithm of Alternate Optimization. FCM1 : Put initial value V   randomly. FCM2 : Minimize (,)  JUV   with respect to U  . Let the optimal solution be U  . FCM3 : Minimize (,)  JUV   with respect to V  . Let the optimal solution be V  . FCM4 : If (,) UV   is convergent, stop. Otherwise go to FCM2 . End FCM. We show solutions of FCM2  and FCM3  for each ob- jective function, where the derivations are omitted. Solution for  B  J  : 11111 1((,)),1((,)) mkikic jmkj  Dxvu Dxv ε ε  −= − +=+ ∑   (7)   11 ().()  N mkik k i N mkik  uxvu == = ∑∑   (8)   Solution for    E   J  : 1 exp((,)),exp((,)) kikickj j  Dxvu Dxv λ λ  = −=− ∑   (9)   11 .  N kik k i N kik  uxvu == = ∑∑   (10)   Solution for   P  J  : 11111 11(,),11(,) mkikic jmkj  Dxvu Dxv ζ ζ  −= − +=+ ∑   (11)   11 ().()  N mkik k i N mkik  uxvu == = ∑∑   (12)   where 2 m  = . B.    Basic Functions We introduce what we call basic functions  in this pa- per: 11 1(,),((,))  Bm gxy Dxy ε   − =+   (13)   (,)exp((,)),  E  gxyDxy λ  = −   (14)   11 1(,).1(,) Pm gxy Dxy ζ   − =+   (15)   We also assume that (,) gxy  is either (,)  B gxy , (,)  E  gxy , or (,) P gxy . A unified representation is now obtained for optimal ki u : 1 (,)(,) kikickj j gxvugxv = = ∑   (16)   for all three objective functions, since (,) gxy  repre-sents either (,)  B gxy , (,)  E  gxy , or (,) P gxy .  S. Miyamoto: Fuzzy c-Means Algorithms and Kernel-Based Clustering  91 C.   Possibilistic Clustering Possibilistic clustering [18] uses (,) P  JUV   but with a different constraint: {():0,,} kikj  MUuukj = = > ∀ .  Note that (,) P  JUV   and  M   in this paper are simpler than the srcinal formulation [18], but the essential dis-cussion is the same. We cannot use (,)  B  JUV   which leads to a trivial solution in possibilistic clustering, but (,)  E   JUV   can  be used [4]. We have the solution of possibilistic clustering for (,)  E   JUV  : (,) kiEki ugxv =   (17)   using basic function  E  g  with i v  given by (10), while the solution for (,) P  JUV   is the following: (,) kiPki ugxv =   (18)   using basic function P g  with i v  given by (12). Note that 2 m  =  is not assumed for possibilistic clustering. D.   Fuzzy Classifiers There have been many discussions on fuzzy classifiers derived from fuzzy clustering, but we show a standard   classifier that is naturally derived from the optimal solu-tions.  Note that ki u  is given only on objects k   x , while what we need is  fuzzy classification rules  whereby the solutions are provided. To understand classification rules clearly, let us con-sider the crisp c -means, where we use the nearest pro-totype allocation rule : when the set of cluster prototypes are determined, we allocate an object to its nearest pro-totype, i.e., 1 1(argmin(,)),0(otherwise).  jckjki iDxvu  ≤ ≤ =⎧= ⎨⎩    Note that the objective function is  H   J  . This allocation rule is applied to all points in the space, and the result is the Voronoi regions [17] with the cen-ters of the cluster prototypes. Specifically, we define (){:,}  piij SVxRxvxvji = ∈ − < − ∀ =/ ∥ ∥∥ ∥   as a Voronoi region for a given set of cluster prototypes V  . We then have 1 (),()()(), c piiji SVRSVSVij = = ∩ =∅ =/ ∪   where () i SV   is the closure of () i SV  . The nearest al-location rule then is as follows: if()thencluster. i  xSVxi ∈ →   When we consider fuzzy rules, a function (;) i UxV   that interpolates ki u  is used. We define the following function using the basic function: 1 (,)(;),(,)  piic j j gxvUxVxRgxv = = ∈ ∑   (19)   where (,) gxy  is either (,)  B gxy , (,)  E  gxy , or (,) P gxy . Fuzzy rules are simpler in possibilistic clustering: (;)(,),  piii UxvgxvxR = ∈   (20)   where (,) gxy  is either (,)  E  gxy , or (,) P gxy . The rule is thus the same as basic functions in possibilistic clustering. We show a number of theoretical properties of the fuzzy rules defined by the above functions. The proofs are given in [25, 28] and omitted here. Proposition 1:   Let (;) i UxV   is with function  B g . In other words,  B  J   is used. Suppose 0 ε   → . Then the maximum value of (;) i UxV   is at i  xv = : argmax(;),0.  p ii xR UxVvas ε  ∈  → →   Moreover, for all 0 ε   ≥ , we have 1lim(;) i x UxV c →∞ = ∥∥ . Proposition 2:   Let (;) i UxV   is with function P g . In other words, P  J   is used with 2 m  = . Suppose ζ   →+∞ . Then the maximum value of (;) i UxV  is at i  xv = : argmax(;),.  p ii xR UxVvas ζ  ∈  → →+∞   Moreover, for all 0 ζ   ≥ , we have 1lim(;) i x UxV c →∞ = ∥∥ . Hence the functions of the fuzzy rules for  B  J   and P  J    behave similarly when point  x  goes far, while the maximum point approaches to the cluster center as the respective parameters tend to their limitations. In con-trast, fuzzy rule (;) i UxV   for  E   J   has a quite different  property. To describe this, we should discuss Voronoi regions again. In many cases, fuzzy clusters are made crisp  by the maximum membership rule: 1 ifargmax(;)thencluster.  jcj iUxVxi ≤ ≤ = →     International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011  92 Accordingly we can define the set of points that belongs to cluster i : 1 (){:argmax(;)}.  pijcj TVxRiUxV  ≤ ≤ = ∈ =   We then have the next proposition. Proposition 3: For all choices of  B gg = ,  E  gg = , and P gg = , ()(). ii TVSV  =   Thus () i TV   is the closure of the Voronoi region with center V  , and () i TV   is the same for all the three ob- jective functions  B  J  ,  E   J  , and P  J  . Let us now consider (;) i UxV   for  E   J  . Proposition 4:   Let (;) i UxV   is with function  E  g . In other words,  E   J   is used. Assume i v 's are in general  positions in the sense that none of the three are on a line. If a Voronoi region () i TV   is bounded, then lim(;)0. i x UxV  →∞ = ∥∥  If a Voronoi region () i TV   is unbounded and  x  moves inside () i TV  , then lim(;)1. i x UxV  →∞ = ∥∥   In the both cases, 0(;)1 i UxV  < <  for all  p  xR ∈ . The proof is given in [25] and omitted here. Possibilistic clustering As fuzzy rules in possibilistic clustering are bell-shaped functions, we have the same property: argmax(;),  p iii xR Uxvv ∈  =   lim(;)0 ii x Uxv →∞ = ∥∥   for both  E  g  and P g . If possibilistic clusters should be made crisp, we de-fine 1 (){:argmax(;)}.  pijcji TVxRiUxv ≤ ≤ ′ = ∈ =   We have the next proposition: Proposition 5:   For both  E  gg =  and P gg = , ()(). ii TVSV  ′ =   The Voronoi regions are thus derived again. 3. Size and Covariance of a Cluster We frequently need to recognize a prolonged cluster,  but the srcinal fuzzy c -means cannot do this, as the Voronoi region cannot separate such a prolonged region. To solve such a problem, cluster covariances in fuzzy c -means have been considered by Gustafson and Kessel [11]. However, there is another problem to separate a dense cluster and a sparse cluster for which “density” or “cluster size” has to be considered. To solve the both problems, a generalized objective function with a Kullback-Leibler information term has  been proposed by Ichihashi and his colleagues [15, 28]. That is, the following function is used for this purpose: 111211 (,,,)(,;){loglog||} cN KLkikiiik cN kikiiik i  JUVASuDxvS uuS  ν α  = == = =+ + ∑∑∑∑ (23)   where variable 1 (,,) c  A  α α  = … controls cluster sizes with the constraint 1 {:1,0,1,,}. ciji  Ajc α α  = = = ≥ = … ∑ A    (24)   Another variable is 1 (,,) c SSS  = … ; i S   ( 1,, ic = … ) is  pp ×  positive-definite matrix with determinant || i S  . In addition, 1 (,;)()() T iiiii  DxvSxvSxv − = − −   (25)   is the squared Mahalanobis distance for cluster i . Since this objective function has four variables, the alternate optimization means minimization with respect to a variable while other three are fixed: After giving initial values for ,, VAS  , we repeat argmin(,,,),argmin(,,,),argmin(,,,),argmin(,,,), UKLVKL AKLSKL UJUVAS VJUVAS  AJUVAS SJUVAS  ====   until convergence. The solutions are as follows [28]. Solutions for   KL  J  : 12112 (,;)exp||,(,;)exp|| ikiiikic jkji j j  DxvS S u DxvS S  α ν α ν  = ⎛ ⎞−⎜ ⎟⎝ ⎠=⎛ ⎞−⎜ ⎟⎜ ⎟⎝ ⎠ ∑   (26)   11 , nkik k inkik  uxvu == = ∑∑   (27)   1 , nik k i un α   = = ∑   (28)  
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks