Description

International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011
©2011 TFSA
89
Different Objective Functions in Fuzzy c-Means Algorithms and
Kernel-Based Clustering
Sadaaki Miyamoto
Abstract
1
An overview of fuzzy c-means clustering algo-
rithms is given where we focus on different objective
functions: they use regularized dissimilarity, en-
tropy-based function, and function for possibilistic
clustering. Classification functions for the objective
funct

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011
© 2011 TFSA 89
Different Objective Functions in Fuzzy
c
-Means Algorithms and Kernel-Based Clustering
Sadaaki Miyamoto
Abstract
1
An overview of fuzzy
c
-means clustering algo-rithms is given where we focus on different objective functions: they use regularized dissimilarity, en-tropy-based function, and function for possibilistic clustering. Classification functions for the objective functions and their properties are studied. Fuzzy
c
-means algorithms using kernel functions is also discussed with kernelized cluster validity measures and numerical experiments. New kernel functions derived from the classification functions are more-over studied.
Keywords: cluster validity measure, fuzzy c-means clustering, kernel functions, possibilistic clustering.
1. Introduction
Fuzzy clustering is well-known not only in fuzzy community but also in the related fields of data analysis, neural networks, and other areas in computational intel-ligence. Among various techniques of clustering using fuzzy concepts [16, 23, 30, 37], the word of fuzzy clus-tering mostly refers to fuzzy
c
-means clustering by Dunn and Bezdek [1, 2, 6, 7, 8, 13]. This paper gives an over-view of this method. Nevertheless, we adopt a
non-standard formulation
. That is, we begin from three different objective functions, and none of them is exactly the same as the one by Dunn and Bezdek. Comparing different objective functions and their so-lutions, we find theoretical properties of fuzzy
c
-means clustering: different
fuzzy classifiers
are derived from different solutions. Moreover generalization including a “cluster size” variable and a “covariance'” variable is developed. This generalization is shown to be closely related to mixture distributions. Kernel-based fuzzy
c
-means clustering is moreover studied with associated cluster validity measures. Many numerical simulations are used to evaluate whether or
Corresponding Author: Sadaaki Miyamoto is with the Department of Risk Engineering, the University of Tsukuba, Ibaraki 305-8573, Ja- pan. E-mail: miyamoto@risk.tsukuba.ac.jp Manuscript received June 2010; revised Nov. 2010; accepted Dec. 2010.
not the kernelized measures are adequate for ordinary ball-shaped clusters. Finally, a new class of kernel functions is proposed; they are derived from fuzzy
c
-means solutions. Illustra-tive examples are given.
2. Fuzzy
c
-Means Clustering
We first give three objective functions. Possibilistic clustering [18] is included as a variation of fuzzy
c
-means clustering. A.
Preliminary consideration
Let objects for clustering be points in the
p
-dimensional Euclidean space. They are denoted by
1
(,,)
ppkkk
xxxR
= … ∈
(
1,,
kN
= …
). A generic point
1
(,,)
p
xxx
= …
implies a variable in
p
R
. We assume
c
clusters; cluster centers are denoted by
i
v
(
1,,
ic
= …
). We write
1
(,,)
c
Vvv
= …
as the collec-tion of all cluster centers. The dissimilarity between an object and a cluster cen-ter is the squared Euclidean distance:
2
(,).
kiki
Dxvxv
= −
∥ ∥
(1)
We sometimes write
(,)
kiki
DDxv
=
for simplicity. Moreover
(,)
i
Dxv
means that variable
x
is substi-tuted into object
k
x
.
()
ki
Uu
=
is the membership matrix:
ki
u
means the degree of belongingness of
k
x
to cluster
i
. Crisp and fuzzy
c
-means clustering are based on the minimization of objection functions. Crisp
c
-means clustering [21] uses the following:
11
(,)(,)
cN Hkikiik
JUVuDxv
= =
=
∑∑
(2)
Alternate minimization with respect to one of
(,)
UV
, while another variable is fixed, is repeated until conver-gence [1]. Minimization with respect to
U
uses the following constraint:
1
{():1;0,,}.
ckikikji
MUuuukj
=
= = = ≥ ∀
∑
(3)
International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011
90
We consider three objective functions:
11
(,)(){(,)},(1,0),
cN m Bkikiik
JUVuDxvm
ε ε
= =
= +> ≥
∑∑
(4)
111
(,){(,)(log1)},(0),
cN Ekikikikiik
JUVuDxvuu
λ λ
−= =
= + −>
∑∑
(5)
111
(,){()(,)(1)},(0).
cN mmPkikikiik
JUVuDxvu
ζ ζ
−= =
= + −>
∑∑
(6)
All above are different from the srcinal function pro- posed by Dunn [7, 8] and Bezdek [1, 2].
(,)
B
JUV
has a nonnegative parameter
ε
proposed by Ichihashi [28]. When
0
ε
=
,
(,)
B
JUV
is the srcinal objective function.
(,)
E
JUV
has an additional term of entropy. The use of entropy in fuzzy
c
-means clustering has been proposed by a number of researchers, e.g.,[19, 20, 24].
(,)
P
JUV
has been proposed by Krishnapuram and Keller [18] for possibilistic clustering. This function can also be used for fuzzy
c
-means with constraint (3) when
2
m
=
. We use alternate minimization procedure
FCM
in the following, where
(,)
JUV
is either
(,)
B
JUV
,
(,)
E
JUV
, or
(,)
P
JUV
. Minimization with respect to
U
is with constraint (3).
FCM Algorithm of Alternate Optimization. FCM1
: Put initial value
V
randomly.
FCM2
: Minimize
(,)
JUV
with respect to
U
. Let the optimal solution be
U
.
FCM3
: Minimize
(,)
JUV
with respect to
V
. Let the optimal solution be
V
.
FCM4
: If
(,)
UV
is convergent, stop. Otherwise go to
FCM2
.
End FCM.
We show solutions of
FCM2
and
FCM3
for each ob- jective function, where the derivations are omitted.
Solution for
B
J
:
11111
1((,)),1((,))
mkikic jmkj
Dxvu Dxv
ε ε
−= −
+=+
∑
(7)
11
().()
N mkik k i N mkik
uxvu
==
=
∑∑
(8)
Solution for
E
J
:
1
exp((,)),exp((,))
kikickj j
Dxvu Dxv
λ λ
=
−=−
∑
(9)
11
.
N kik k i N kik
uxvu
==
=
∑∑
(10)
Solution for
P
J
:
11111
11(,),11(,)
mkikic jmkj
Dxvu Dxv
ζ ζ
−= −
+=+
∑
(11)
11
().()
N mkik k i N mkik
uxvu
==
=
∑∑
(12)
where
2
m
=
. B.
Basic Functions
We introduce what we call
basic functions
in this pa- per:
11
1(,),((,))
Bm
gxy Dxy
ε
−
=+
(13)
(,)exp((,)),
E
gxyDxy
λ
= −
(14)
11
1(,).1(,)
Pm
gxy Dxy
ζ
−
=+
(15)
We also assume that
(,)
gxy
is either
(,)
B
gxy
,
(,)
E
gxy
, or
(,)
P
gxy
. A unified representation is now obtained for optimal
ki
u
:
1
(,)(,)
kikickj j
gxvugxv
=
=
∑
(16)
for all three objective functions, since
(,)
gxy
repre-sents either
(,)
B
gxy
,
(,)
E
gxy
, or
(,)
P
gxy
.
S. Miyamoto: Fuzzy c-Means Algorithms and Kernel-Based Clustering
91
C.
Possibilistic Clustering
Possibilistic clustering [18] uses
(,)
P
JUV
but with a different constraint:
{():0,,}
kikj
MUuukj
= = > ∀
.
Note that
(,)
P
JUV
and
M
in this paper are simpler than the srcinal formulation [18], but the essential dis-cussion is the same. We cannot use
(,)
B
JUV
which leads to a trivial solution in possibilistic clustering, but
(,)
E
JUV
can be used [4]. We have the solution of possibilistic clustering for
(,)
E
JUV
:
(,)
kiEki
ugxv
=
(17)
using basic function
E
g
with
i
v
given by (10), while the solution for
(,)
P
JUV
is the following:
(,)
kiPki
ugxv
=
(18)
using basic function
P
g
with
i
v
given by (12). Note that
2
m
=
is not assumed for possibilistic clustering. D.
Fuzzy Classifiers
There have been many discussions on fuzzy classifiers derived from fuzzy clustering, but we show a
standard
classifier that is naturally derived from the optimal solu-tions. Note that
ki
u
is given only on objects
k
x
, while what we need is
fuzzy classification rules
whereby the solutions are provided. To understand classification rules clearly, let us con-sider the crisp
c
-means, where we use
the nearest pro-totype allocation rule
: when the set of cluster prototypes are determined, we allocate an object to its nearest pro-totype, i.e.,
1
1(argmin(,)),0(otherwise).
jckjki
iDxvu
≤ ≤
=⎧= ⎨⎩
Note that the objective function is
H
J
. This allocation rule is applied to all points in the space, and the result is the Voronoi regions [17] with the cen-ters of the cluster prototypes. Specifically, we define
(){:,}
piij
SVxRxvxvji
= ∈ − < − ∀ =/
∥ ∥∥ ∥
as a Voronoi region for a given set of cluster prototypes
V
. We then have
1
(),()()(),
c piiji
SVRSVSVij
=
= ∩ =∅ =/
∪
where
()
i
SV
is the closure of
()
i
SV
. The nearest al-location rule then is as follows:
if()thencluster.
i
xSVxi
∈ →
When we consider fuzzy rules, a function
(;)
i
UxV
that interpolates
ki
u
is used. We define the following function using the basic function:
1
(,)(;),(,)
piic j j
gxvUxVxRgxv
=
= ∈
∑
(19)
where
(,)
gxy
is either
(,)
B
gxy
,
(,)
E
gxy
, or
(,)
P
gxy
. Fuzzy rules are simpler in possibilistic clustering:
(;)(,),
piii
UxvgxvxR
= ∈
(20)
where
(,)
gxy
is either
(,)
E
gxy
, or
(,)
P
gxy
. The rule is thus the same as basic functions in possibilistic clustering. We show a number of theoretical properties of the fuzzy rules defined by the above functions. The proofs are given in [25, 28] and omitted here.
Proposition 1:
Let
(;)
i
UxV
is with function
B
g
. In other words,
B
J
is used. Suppose
0
ε
→
. Then the maximum value of
(;)
i
UxV
is at
i
xv
=
:
argmax(;),0.
p
ii xR
UxVvas
ε
∈
→ →
Moreover, for all
0
ε
≥
, we have
1lim(;)
i x
UxV c
→∞
=
∥∥
. Proposition 2:
Let
(;)
i
UxV
is with function
P
g
. In other words,
P
J
is used with
2
m
=
. Suppose
ζ
→+∞
. Then the maximum value of
(;)
i
UxV
is at
i
xv
=
:
argmax(;),.
p
ii xR
UxVvas
ζ
∈
→ →+∞
Moreover, for all
0
ζ
≥
, we have
1lim(;)
i x
UxV c
→∞
=
∥∥
.
Hence the functions of the fuzzy rules for
B
J
and
P
J
behave similarly when point
x
goes far, while the maximum point approaches to the cluster center as the respective parameters tend to their limitations. In con-trast, fuzzy rule
(;)
i
UxV
for
E
J
has a quite different property. To describe this, we should discuss Voronoi regions again. In many cases, fuzzy clusters are
made crisp
by the maximum membership rule:
1
ifargmax(;)thencluster.
jcj
iUxVxi
≤ ≤
= →
International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011
92
Accordingly we can define the set of points that belongs to cluster
i
:
1
(){:argmax(;)}.
pijcj
TVxRiUxV
≤ ≤
= ∈ =
We then have the next proposition.
Proposition 3:
For all choices of
B
gg
=
,
E
gg
=
, and
P
gg
=
,
()().
ii
TVSV
=
Thus
()
i
TV
is the closure of the Voronoi region with center
V
, and
()
i
TV
is the same for all the three ob- jective functions
B
J
,
E
J
, and
P
J
. Let us now consider
(;)
i
UxV
for
E
J
.
Proposition 4:
Let
(;)
i
UxV
is with function
E
g
. In other words,
E
J
is used. Assume
i
v
's are in general positions in the sense that none of the three are on a line. If a Voronoi region
()
i
TV
is bounded, then
lim(;)0.
i x
UxV
→∞
=
∥∥
If a Voronoi region
()
i
TV
is unbounded and
x
moves inside
()
i
TV
, then
lim(;)1.
i x
UxV
→∞
=
∥∥
In the both cases,
0(;)1
i
UxV
< <
for all
p
xR
∈
. The proof is given in [25] and omitted here.
Possibilistic clustering
As fuzzy rules in possibilistic clustering are bell-shaped functions, we have the same property:
argmax(;),
p
iii xR
Uxvv
∈
=
lim(;)0
ii x
Uxv
→∞
=
∥∥
for both
E
g
and
P
g
. If possibilistic clusters should be made crisp, we de-fine
1
(){:argmax(;)}.
pijcji
TVxRiUxv
≤ ≤
′ = ∈ =
We have the next proposition:
Proposition 5:
For both
E
gg
=
and
P
gg
=
,
()().
ii
TVSV
′ =
The Voronoi regions are thus derived again.
3. Size and Covariance of a Cluster
We frequently need to recognize a prolonged cluster, but the srcinal fuzzy
c
-means cannot do this, as the Voronoi region cannot separate such a prolonged region. To solve such a problem, cluster covariances in fuzzy
c
-means have been considered by Gustafson and Kessel [11]. However, there is another problem to separate a dense cluster and a sparse cluster for which “density” or “cluster size” has to be considered. To solve the both problems, a generalized objective function with a Kullback-Leibler information term has been proposed by Ichihashi and his colleagues [15, 28]. That is, the following function is used for this purpose:
111211
(,,,)(,;){loglog||}
cN KLkikiiik cN kikiiik i
JUVASuDxvS uuS
ν α
= == =
=+ +
∑∑∑∑
(23)
where variable
1
(,,)
c
A
α α
= …
controls cluster sizes with the constraint
1
{:1,0,1,,}.
ciji
Ajc
α α
=
= = ≥ = …
∑
A
(24)
Another variable is
1
(,,)
c
SSS
= …
;
i
S
(
1,,
ic
= …
) is
pp
×
positive-definite matrix with determinant
||
i
S
. In addition,
1
(,;)()()
T iiiii
DxvSxvSxv
−
= − −
(25)
is the squared Mahalanobis distance for cluster
i
. Since this objective function has four variables, the alternate optimization means minimization with respect to a variable while other three are fixed: After giving initial values for
,,
VAS
, we repeat
argmin(,,,),argmin(,,,),argmin(,,,),argmin(,,,),
UKLVKL AKLSKL
UJUVAS VJUVAS AJUVAS SJUVAS
====
until convergence. The solutions are as follows [28].
Solutions for
KL
J
:
12112
(,;)exp||,(,;)exp||
ikiiikic jkji j j
DxvS S u DxvS S
α ν α ν
=
⎛ ⎞−⎜ ⎟⎝ ⎠=⎛ ⎞−⎜ ⎟⎜ ⎟⎝ ⎠
∑
(26)
11
,
nkik k inkik
uxvu
==
=
∑∑
(27)
1
,
nik k i
un
α
=
=
∑
(28)

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks