Brochures

A new unsupervised approach for fuzzy clustering

Description
In this paper, a new level-based (hierarchical) approach to the fuzzy clustering problem for spatial data is proposed. In this approach each point of the initial set is handled as a fuzzy point of the multidimensional space. Fuzzy point conical form,
Categories
Published
of 16
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Fuzzy Sets and Systems 158 (2007) 2118–2133www.elsevier.com/locate/fss A new unsupervised approach for fuzzy clustering Efendi N. Nasibov ∗ , Gözde Ulutagay  Department of Statistics, Faculty of Science & Arts, Dokuz Eylul University, Kaynaklar Campus, 35160 Buca, Izmir, Turkey Received 8 September 2005; received in revised form 10August 2006; accepted 27 February 2007Available online 12 March 2007 Abstract In this paper, a new level-based (hierarchical) approach to the fuzzy clustering problem for spatial data is proposed. In thisapproach each point of the initial set is handled as a fuzzy point of the multidimensional space. Fuzzy point conical form, fuzzy  -neighbor points, fuzzy   -joint points are defined and their properties are explored. It is known that in classical fuzzy clusteringthe matter of fuzziness is usually a possibility of membership of each element into different classes with different positive degreesfrom [0,1]. In this study, the fuzziness of clustering is evaluated as how much in detail the properties of classified elements areinvestigated. In this extent, a new Fuzzy Joint Points (FJP) method which is robust through noises is proposed. Algorithm of FJPmethod is developed and some properties of the algorithm are explored. Also sufficient condition to recognize a hidden optimalstructure of clusters is proven. The main advantage of the FJP algorithm is that it combines determination of initial clusters, clustervalidity and direct clustering, which are the fundamental stages of a clustering process. It is possible to handle the fuzzy propertieswith various level-degrees of details and to recognize individual outlier elements as independent classes by the FJP method. Thismethod could be important in biological, medical, geographical information, mapping, etc. problems.© 2007 Elsevier B.V.All rights reserved. Keywords:  Clustering; Neighborhood relation; Fuzzy Joint Points (FJP); Fuzzy joint set 1. Introduction Therearesomeproblemssuchasclustering,identification,optimization,etc.whichhaveanimportantpartamongthedecision-making problems. The fuzzy sets theory could be widely used in solving these kinds of problems [6,7,14,15].Among these problems, clustering is the most important one in modern data mining technology which is used inprocessing large data bases [3].The general philosophy of clustering is to divide the initial set into homogenous groupsbased on the similarity of properties. In such cases, patterns in the same group are tend to be as similar as possible toeach other while patterns in different groups are tend to be as dissimilar as possible.In classical clustering, the boundary of different clusters is crisp such that each pattern is assigned to a unique class.Ontheotherhand,theboundarybetweenclusterscouldnotbepreciselydefinedinreallifesuchthatsomeofthepatternscould belong to more than one cluster with different positive degrees of membership. In that case it is represented bythe fuzzy clustering instead of the classical clustering [7].Various methods have been proposed for aforementioned problems in the literature [8,10,20,21,23]. Most of the earlier work is based on the Fuzzy  c -Means (FCM) algorithm. They suppose the fuzziness of clustering with respect ∗ Corresponding author. Tel.: +905365097969; fax: +902324534265.  E-mail address:  efendi_nasibov@yahoo.com (E.N. Nasibov).0165-0114/$-see front matter © 2007 Elsevier B.V.All rights reserved.doi:10.1016/j.fss.2007.02.019   E.N. Nasibov, G. Ulutagay / Fuzzy Sets and Systems 158 (2007) 2118 – 2133  2119 to the possibility of the membership of some elements into various classes. But in our research, a different approachof fuzziness based on a new Fuzzy Joint Points (FJP) method is proposed [17]. The FJP method’s basic difference,compared to the others, is its comprehension of fuzziness in a level-based (hierarchical) point of view. It means thathow much in detail the elements are considered in construction of homogenous groups. It is obvious that the elementsaremoredissimilarfromeachotherwhentheyarediscussedmoreindetail.Thefuzziertheelements,moresimilartheyare.Inthiscase,fuzzinessofclusteringpointsouttheinvestigationoftheconsideredpropertiesmoreindetail.Sinceallof the elements will be dissimilar from each other in minimal fuzziness degree of zero, each element can be consideredas an individual cluster. On the other hand, in maximal degree of fuzziness, all of the elements can be considered to besimilar to each other in such a way that they belong to one class. The elements which are more similar to each otherwill belong to one class, while the elements which are more dissimilar from each other will belong to different classeshaving different membership degrees from the interval  [ 0 , 1 ] . In other words, the FJP algorithm could be consideredas a level-based clustering algorithm. At each iteration of the clustering process, unlike the classical fuzzy clusteringin which the membership degrees of the points to the clusters are determined, the points which constitute the   -levelsets are determined in FJP algorithm.There are other kinds of clustering algorithms similar to FJP in the literature, e.g. DBSCAN, GDBSCAN, OPTICS[4,5,9,13,19]. DBSCAN is one of the clustering algorithms which is based on inter-cluster densities. In this algorithm,distance queries are made for each point in data set for pre-determined    value. It also investigates whether the pointsin   -neighborhood of the point are more than the given MinPts value or not [13]. The MinPts value is used in order toassign points to the clusters.GDBSCAN algorithm is proposed for the density-skewed case [19]. In this method,    and MinPts values are de-termined by the user according to the densities. Set densities are arranged in increasing order and the sets with fewerdensities are joined by using Greedy algorithm. DBSCAN calculates many distance functions that increase the com-plexity of the algorithm. In order to reduce this complexity, OPTICS algorithm is proposed [4,5]. In this algorithm,distance queries of    ′ which are smaller than    are made and distinct distance functions are used only if it is desired toobtain real clustering. A data set can be represented in OPTICS while multidimensional projection is not possible inDBSCAN.Finding the optimal cluster number, specifying initial clusters and direct methods for clustering with iterative de-velopment are fundamental problems of FCM-type clustering algorithms. Among these methods,  K  -nearest neighbor(KNN) and Mountain method are widely used [20,21,23]. But these methods have some disadvantages. For instance,the basic disadvantages of KNN are necessity to a priori given number of clusters and to assign equal number of elements to each class. The basic disadvantage of Mountain method is complexity of its calculations.FJP method, presented in our study, does not have these disadvantages [17]. Another significance of the proposedmethod is that the noise robustness could be fine-tuning on and the outliers could be considered as individual classeswhile in most of the known methods are not. This situation could be important in biological, medical, etc. problems inorder to recognize new forms of living objects.The fundamental idea of the FJP method is to compute the fuzzy relation matrix based on the distance betweenpoints. Then, for certain    ∈ [ 0 , 1 ] ,   -level sets and equivalence classes are constructed. At the same time, these   -degree equivalence classes determine each   -level set of the fuzzy clusters. Also note that, these   -level sets are notcomputed for all    ∈ [ 0 , 1 ]  degrees, instead they are computed only for   -levels in which the number of clusters areaffected.Then,thefinallevelsetiscomputedbasedonthemaximalchangeintervalofthe  ’s.Inotherwords,the  -leveldegree that reflects the cluster structure optimally and   -level set appropriate for this level are found simultaneously.In the third section of the paper, the FJP method is explained in detail. 2. Basic definitions and properties Most of the distance-based clustering methods use the following classical Euclidian distance between the points  a and  b  of   p -dimensional space  E p d(a,b)  =    p  i = 1 (a i  −  b i ) 2 .  (2.1)  2120  E.N. Nasibov, G. Ulutagay / Fuzzy Sets and Systems 158 (2007) 2118 – 2133 0 1  x 1  (  x 1 ,  x 2 )  x 2 a A   R  Fig. 1. Fuzzy conical point  A  =  (a,R)  ∈  F(E 2 )  on the space  E 2 . aR R 01  x  (  x ) Fig. 2. Triangular fuzzy number as a point  A  =  (a,R)  ∈  F(E 1 )  on the space  E 1 . Note that there are methods that use other distances. For example in [11] a clustering problem using FCM algorithmbased on the scaled distance is evaluated and its advantages are demonstrated. But in our work we use the classicaldistance given in the formula (2.1).Let us denote the set of whole  p -dimensional fuzzy sets of the space  E p by  F(E p ) . Let   A  :  E p → [ 0 , 1 ]  denotethe membership function of the fuzzy set  A  ∈  F(E p ) . Definition 1.  A conical fuzzy point  A  =  (a,R)  ∈  F(E p )  of the space  E p is a fuzzy set with membership function(Fig. 1)  A (x)  =  1  −  d(x,a)R if   d(x,a)  R, 0 otherwise , (2.2)where  a  ∈  E p is the center of fuzzy point  A , and  R  ∈  E 1 is the radius of its support supp  A , wheresupp A  = { x  ∈  E p |  A (x) >  0 } . The   -level set of conical fuzzy point  A  =  (a,R)  is calculated as A   = { x  ∈  E p |  A (x)   } = { x  ∈  E p | d(x,a)  R  ·  ( 1  −   ) } .  (2.3)Note that an analogue of fuzzy conical point  A  =  (a,R)  ∈  F(E 1 )  of space  E 1 is a triangular symmetrical fuzzynumber  A  =  (a,R,R)  (Fig. 2).There are other definitions of fuzzy point in the literature. For example, in [22] a clustering problem with multidi-mensional fuzzy point data such as (2.2) is considered and a robust modification of the FCM algorithm is proposed.In this study, we will use the short term “fuzzy point” instead of conical fuzzy point defined in (2.2).   E.N. Nasibov, G. Ulutagay / Fuzzy Sets and Systems 158 (2007) 2118 – 2133  2121  x 2 01  x 1 a A  b B  T  (  A ,  B )  (  x 1 ,  x 2 )  Fig. 3. Fuzzy   -neighbor points  A  =  (a,R)  and  B  =  (b,R)  on the space  E 2 . Let  A  =  (a,R)  and  B  =  (b,R)  be fuzzy points from the set  X  ⊂  F(E p ) . Denote a fuzzy neighborhood relation T   :  X  ×  X  → [ 0 , 1 ]  on the set  X   as following: T(A,B)  =  1  −  d(a,b) 2 R,  (2.4)where  a  ∈  E p and  b  ∈  E p are the centers of fuzzy points  A  and  B , respectively, (Fig. 3). Eq. (2.4) may be written as d(a,b)  =  2 R( 1  −  T(A,B)).  (2.5)It is obvious that the relation  T   is reflexive, i.e.  ∀ A  ∈  X :  T(A,A)  =  1 is satisfied. Definition 2.  Let  A  and  B  be fuzzy points on the set  X  ⊂  F(E p ) . If  T(A,B)    (2.6)is satisfied for fixed    ∈  ( 0 , 1 ] , then points  A  and  B  are called   -neighbor fuzzy points and it is denoted by  A  ∼   B (Fig. 3).The   -neighborhood approach given above is appropriate to the meaning of being   -degree similarity of the points. Lemma 1.  Fuzzy points  A  =  (a,R)  and   B  =  (b,R)  are   - neighbor fuzzy points ,  if and only if  d(a,b)  2 R( 1  −   )  (2.7) is satisfied where  d(a,b)  denotes the distance between centers of the fuzzy points A and B . Proof.  Let the fuzzy points  A  =  (a,R)  and  B  =  (b,R)  be   -neighbors. Then (2.6) holds with respect to Definition 2. Thus, (2.8) holds with respect to (2.4). 1  −  d(a,b) 2 R    ⇒  d(a,b)  2 R( 1  −   ).  (2.8)Now, suppose that (2.7) holds. Then   1  −  d(a,b) 2 R =  T(A,B), i.e. (2.6) holds, which completes the proof.   Definition 3.  If there is a sequence of    -neighbor fuzzy points  C 1 ,...,C k ,k  0, for fixed    ∈  ( 0 , 1 ] , between thepoints  A  and  B , i.e. A  ∼   C 1 , C 1 ∼   C 2 ,...,C k − 1 ∼   C k and  C k ∼   B, then the fuzzy points  A  and  B  are called   -joint fuzzy points.  2122  E.N. Nasibov, G. Ulutagay / Fuzzy Sets and Systems 158 (2007) 2118 – 2133  R (1 −  )  xab Fig. 4. Illustration of Lemma 2. Definition 4.  Let  X  ⊂  F(E p )  be a set of fuzzy points. If the fuzzy points  A  and  B  are   -joint for    ∈  ( 0 , 1 ]  and ∀ A,B  ∈  X , then the set  X   is called fuzzy   -joint set.Suppose that  d(A  ,B  )  is the classical distance between the level sets  A   and  B  , i.e. d(A  ,B  )  =  min { d(x,y) | x  ∈  A  ,y  ∈  B  } . Lemma 2.  The fuzzy points A and B are   - neighbors if and only if  A   ∩  B   =  .  (2.9) Proof.  Let the fuzzy points  A  and  B  be   -neighbors. Thus, (2.6) holds. First, assume that (2.9) does not hold, i.e. A   ∩  B   =  . Then, on the line, which joints the points  a  ∈  E p and  b  ∈  E p , there is  x  ∈  E p ,x / ∈  A  ,x / ∈  B   which holds (Fig. 4). d(a,x) > R( 1  −   )  and  d(b,x) > R( 1  −   ).  (2.10)Taking into account that the points  a ,  x   and  b  lead on a line, from (2.10), the following may be written: d(a,b)  =  d(a,x)  +  d(x,b) >  2 R( 1  −   ). But, due to Lemma 1, inequality given above contradicts   -neighborhood of the points  A  and  B .Now, assume that (2.9) holds. Then  ∃ x  :  x  ∈  A  ,x  ∈  B  . Consequently, because of  (2.3) we have; d(x,a)  R( 1  −   )  and  d(x,b)  R( 1  −   ).  (2.11)Due to the triangular property of distance, it follows from (2.11) that d(a,b)  d(a,x)  +  d(x,b)  2 R( 1  −   )  ⇒  d(a,b)  2 R( 1  −   ). According to Lemma 1, the last inequality shows that the fuzzy points  A  and  B  are   -neighbors. This completes theproof.   Let the relation  ˆ T   :  X  ×  X  → [ 0 , 1 ]  be the transitive closure of relation  T   :  X  ×  X  → [ 0 , 1 ] . Note that transitiveclosure is mentioned by using max–min composition. Theorem 1.  Any points  A,B  ∈  X  of the finite set X are fuzzy   -  joint points if and only if  ˆ T(A,B)   .  (2.12) Proof.  At first, suppose that the fuzzy points  A  and  B  are   -joint points. Then between the points  A  and  B , a sequentof fuzzy points  C 1 ,...,C k ,  k  0 exists, i.e. T(A,C 1 )   , T(C 1 ,C 2 )   ,...,T(C k − 1 ,C k )   , T(C k ,B)   .  (2.13)
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks