Art & Photos

Multiobjective Partitional Clustering for Fuzzy and Mixed data through Hill Climbing

Description
In this paper we have designed and applied to mixed (continuous and categorical) data in a fuzzy context, as proposed multiobjective partitional clustering problem. external validity indexes Adjusted Rand Index (ARI) and Minkowski Score (MS).
Categories
Published
of 18
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  International Journal of Multiobjective P Mixed d Pablo Barbaro Martin DarianHoracio Grass Bo (CENA Amanda Robinson,P a.   A  BSTRACT In this paper we have designed an applied to mixed (continuous and multiobjective partitional clusterin external validity indexes Adjusted  performing multiobjective partitio modeled with fuzzy logic, allo clusters, was obtained as a result of CCS Concepts • Theory of computation   Unsupeoptimization and decision-making • C  Keywords Partitional clustering; multiobj rtificial Intelligence and Soft Computing (IJAISC) Vol. 1, rtitional Clustering for Fu ta through Hill Climbing ez Pedroso Hotel Melia Cayo Coco,Ciego de Á Cuba,+53 58053324 martinezpedroso@gmail.com ada Centro de Aplicaciones de Tecnologías Av AV) La Habana, Cuba,+53 54500567 dgrass@cenatav.co.cu    rovalis Research,Montreal Canada. +514 268 5obinson@provalisresearch.com  d implemented multiple possible stochastic hill climbi categorical) data in a fuzzy context, as proposed  problem. To validate the efficacy of this approach and Index (ARI) and Minkowski Score (MS). An appro nal clustering with mixed data, which also provi ing for a better description of the distribution of obj the research.   vised learning and clustering • Applied computing   Mmputing methodologies   Cluster analysis. ective hill climbing ;fuzzy domain; mixed data. o. 1 1 zy and ila, nzadas 88 g alternatives, olutions to a e selected the ach capable of des solutions cts among the ulti - criterion  International Journal of 1.   INTRODUCTION The large volume of information st of analyzing data analysis and c applied in order to extract unknow task of this process, which is define Giving a set of objects    ,  ,  is divided into   partitions (cluste    Equation (3) defines crisp partition are not very clear. Modeling the pr memberships of objects among the  belongingness of objects to cluster order to model fuzzy partitional cl membership matrix, defined in (Xu      where   ∈ 0,1 is m Where   is the number of clusters distribution of each object among th As (Xu 2009) outlines, optimal part heuristics are needed, although opti found. A well known technique is the k-m selected as centers of clusters. All o(Euclidian distance). Then centers until no new centersare computed o characterize the clusters. rtificial Intelligence and Soft Computing (IJAISC) Vol. 1, red in enterprises, entities, institutions, etc. surpasses hu mprehension. Knowledge discovery from databases p and interesting trends. Partitional clustering is a relevan d as follows …,     where     ,  ,…,      and   is a featur s, groups)   ,  ,…,  ,  where:      ,1,…, 1      2    ∩        ,1,…,   3  al clustering. However, there exist domains where fron  blem as fuzzy partitional clustering allows more accura roups, in contrast to crisp partitional clustering that cons s. This information plays an important role for the deci ustering, Equation (3) is substituted for a data structur 2009) as: mbership coefficient of   -thgroup. Satisfying the followi   1,∀ 4   0  ,∀ 5  and   the total amount of objects. Equation (4) is t e clusters whereas Equation (5) prevents obtaining empt tioning cannot be obtained due to the extreme computati mal solutions cannot be provided, at least near-optimal sans algorithm, the procedure is as follows. First   object ther objects are grouped to the nearest center, based on of clusters are updated through Equation (7). This pro r an iteration limit is reached. The final centers obtaine o. 1 2 man capability rocess can be t unsupervised e of the object. iers of groups y in respect to iders complete sion maker. In   e known as a ng restrictions: e membership groups. onal cost. Thus lutions can be s are randomly istance metric cedure iterates represent and  International Journal of  The K-means algorithm is only sui  Nevertheless, many real life data s 2007), and a variation of k-means is In this variation dissimilarity betwe  where: Centers are constructed with the mo However, most of the real problemsuch domains none of the previous limitation (Huang 1998) proposes this procedure distance function Eq numerical and categorical features mixed features denoted as vector of    ,  Where   is used to avoid favoritis used for numerical data and mode f Representative objects of clusters c k-means and variations as previous cluster can degrade substantially representative and all other objects rtificial Intelligence and Soft Computing (IJAISC) Vol. 1,  |  |     6    1     ∈   7  able for a numeric domain since Euclidian distance is p ets are categorical in nature as is pointed in (Anirban needed. n objects is measured by Equation (8) extracted from (H  ,    ,   8     ,  0     1      9   de of each feature of cluster objects, known as k-modes. ed data sets are mixed in nature (numeric and categoric alternatives could by applied in their srcinal design. To n integration of both techniques in a procedure called k  uation (6) and dissimilarity function Equation (8) are u respectively. Thus the difference between two objects attributes as    ,  ,…,  ,  ,…,   is calculated as f           ,     10  of any features types. Computing representative object r categorical data. an be observed being constructed. In (Kamber 2006) au y presented, are sensitive to outliers, i.e. extremely dist the solution. An alternative is selecting an existin re grouped to the most similar, computed with Equation o. 1 3 urely numeric. ukhopadhyay uang 1998): al features). In overcome this -prototypes. In ed to compare   and   with   ollows: Equation (7) is thors state that nt data from a object as a (11).  International Journal of Where   is total sum of error,   an strategy is known as k-medoids, representative object. Methods based in k-medoids are n dissimilarity measure Equation (8) Equation (10) is selected in order tackled however most of the real da overcome this limitation, a variatio medoids, it partitions the entire dat with a degree of belongingness or as follows.       Where       represents the m and     ,…,    vector of med     The method starts with k rando calculated with Equation (13), it is   of  -th cluster satisfies      su    The techniques presented so far opt type of distribution optimizing co unsupervised technique there is nocriterion alone cannot uncover grou rtificial Intelligence and Soft Computing (IJAISC) Vol. 1,    ∈    11  object of cluster    and    its correspondent representati a medoid being the most centrally located object in ot restricted to specific data type, thus distant metric can be adapted without limitation to k-medoid procedur   to cover a mixed data domain. So far a mixed data do ta set does not have clear enough frontiers between clust of the previously mentioned technique is adopted, kno a set into k clusters considering that each object belong embership, defined in (Mukhopadhyay 2013) ,:   ∗     ,   12   atrix of fuzzy partition,    membership degree of obje ids.   1∑   ,    ,      13   ly selected medoids. In each iteration, after member sed to re-computmedoids with Equation (14). Medoid ch as:     ∗  ,     14   imize only one criterion (compactness) over the entire d  pactness can yield good solutions. However, since c  previous knowledge about distribution of the objects.  ps of distinct types, therefore, and as (Hruschka 2009) s  o. 1 4 e object. This a cluster, i.e. quation (6) or . Nevertheless main has been ers. In order to n as fuzzy k- to all clusters ct    to cluster   ship matrix is ta set. For this lustering is an Moreover, one ggests quality  International Journal of of clusters should be measured by Optimizing more than one crite multiobjective (Hruschka 2009).(H  be more robust and provides better exploits entirely the potential of usi  provided by multiples single obje simultaneously optimized. On the The multiobjective approach introd conflictive between them, thus opfunctions are considered as the probThe formalization of the Multiobjec Find the vector ̅ ∗  ∗ , ∗ ,…, ∗       th     and    ̅ To clarify when a solution is consi concepts can be found in (Coello C A solution ∈, is said to be opti   ʹ     ʹ  ,…,   ʹ   dodominates another vector    (   , is, ∀ ∈ 1,…,,     ∧∃1, than one, a set of solutions is obta  process. For a given MOO problem,  ∗ :  The objective of this paper is to de capable of covering mixed and fuzz rtificial Intelligence and Soft Computing (IJAISC) Vol. 1, ultiple criteria instead of a single criterion. rion has been proposed in two main approaches andl J. 2007) and (Hruschka 2009) outline of ensemble solutions than single objective optimization, they posit ng various criteria. Since ensemble is restricted to integr ctive optimization techniques it does not exploit sol ther hand such solutions are explored by the multiobje uced in (Handl J. 2004a), optimizes simultaneously vari timizing one, degrades other. In such an approach, lem, and every one with the same level of priority. tive optimization problem is extracted from (Mukhopadh of decision variables that will satisfy the   inequality c ̅0,1,2,…, 15  e   equality constraints ̅0,1,2,…, 16  optimizes the vector function ̅  ̅,  ̅,…,  ̅   17  dered optimal principles of Pareto are applied in this re ello 2007) and are defined as follows. al of Pareto respecting to   if and only if there is no  ʹ  minates    ,…,   . A vector  …,  ) denoted by ( ≼  if and only if   is partially l …,  such as     . Applying principles of Pareto t ined, known as the Pareto optimal set, which is in fact  , Pareto optimal set  ∗ , is defined as:  ∈  | ∄  ʹ  ∈   ʹ  ≼   18  velop a multiobjective optimization procedure for partiti y data. o. 1 5 ensemble and which tends to hat it does not ating solutions tions that are tive approach. ous objectives, any objective   yay 2007a): onstraints earch. Related ∈  for which (   ,…,  ) ess than , this o MOO rather the aim of the onal clustering
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x