Description

A new C× K nearest neighbor algorithm with a new point of view is proposed. In this algorithm K neighbors from each of the classes are taken into account instead of the well-known K neighbor algorithm in which only the total number of neighbors are

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

1
A NEW
C
x
K
- NEAREST NEIGHBOR LINKAGE APPROACH TO THE CLASSIFICATION PROBLEM
*
GÖZDE ULUTAGAY
†
Department of Industrial Engineering, Izmir University, Gursel Aksel Blv. o.!" Izmir, #$#$%, &urkey
EFENDI NASIBOV
Department of Computer '(ien(e, Dokuz Eylul University, &inaztepe Campus, Bu(a Izmir, #$!)%, &urkey
A ne
CK
×
ne!"e#$ ne%&'()" !&)"%$'+ %$' ! ne ,)%n$ ) .%e %# ,"),)#e/0 In $'%# !&)"%$'+
K
ne%&'()"# ")+ e!1' ) $'e 1!##e# !"e $!2en %n$) !11)3n$ %n#$e!/ ) $'e e-2n)n
K
ne%&'()" !&)"%$'+ %n '%1' )n4 $'e $)$! n3+(e" ) ne%&'()"# !"e 1)n#%/e"e/0 A$e" ex,e"%+en$# %$' e-2n)n 1!##%%1!$%)n /!$!#e$#5 e 1)n13/e $'!$
K
-NN5 e%&'$e/
K
-NN5 !n/ !.e"!&e %n2!&e ne%&'()"# "e#3$# !"e (e$een $'e #%n&e-%n2!&e !n/ 1)+,e$e-%n2!&e !&)"%$'+#0 A$e" $'e e.!3!$%)n ) $'e !.e"!&e !113"!14 "e#3$#5 e "e!%6e/ $'!$ $'e (e#$ "e#3$# !"e )($!%ne/ )" $'e .!3e# )
K
(e$een 7 !n/ 780 On $'e )$'e" '!n/5 %$ %# /e$e"+%ne/ $'!$ 3#%n& #%n&e-%n2!&e #$"!$e&4 ,").%/e# '%&' .!3e# ) $'e "e#3$# !$ +)#$ ) $'e $%+e#0
1. Introduction
K
-nearest neighbor (
K
-NN) algorithm is a well-known non-parametric approach in which a new unknown data is assigned to the closest class in the learning set [1,2]. The similarity is determined by using distance measures.
K
-NN algorithm gives equal importance to each of the objects in assigning class label to the input vector. This is one of the challenges of the
K
-NN algorithm since such an assignment could reduce the accuracy of the algorithm if there is a strong overlapping degree amongst the data vectors. There are some other approaches that increase the accuracy of the algorithm by weighting the nearest neighbors in various ways.
*
T'%# )"2 %# #3,,)"$e/ (4 TUBITAK G"!n$ N)0777T9:;0
†
C)""e#,)n/%n& !3$')"5 e-+!%< &)6/e033$!&!4=%6+%"0e/30$"
2
In this study, a new
CK
×
nearest neighbor algorithm reflecting a different kind of viewpoint is proposed. By using
K
-NN algorithm, a new point is assigned to the class in which most of the neighbors exist. In the proposed approach,
K
neighbors from each of the classes are taken into account instead of only the total number of
K
neighbors. Such a point of view results in a better classification since it considers totally
CK
×
number of points. The aggregated distance of the classified point between its
K
-nearest neighbors in each class could be calculated by using different strategies and such a property increases the flexibility of the proposed algorithm [3,6]. In the study, single-linkage, complete-linkage, and average-linkage approaches which are widely used in hierarchical clustering are considered. After conducting some experiments with well-known classification datasets, we conclude that
K
-nearest neighbors, weighted k-nearest neighbors, and average linkage neighbors results are between the single-linkage and complete-linkage algorithms. The experiments are conducted on the well-known datasets such as Wine, Glass, and Iris for
K
=1 through
K
=50. For these values of
K
, the experiments are repeated for 10 times on randomly selected learning and test sets. After the evaluation of the average accuracy results, we realized that the best results are obtained for the values of
K
between 1 and 10. On the other hand, it is determined that using single-linkage strategy provides high values of the results at most of the times.
2. Single-Linkage Approach
The inputs for the single linkage algorithm could be distances or similarities between pairs of objects. Classes are constructed from the individual entities by joining nearest neighbors, where the term nearest neighbor connotes the largest similarity or smallest distance [1,2]. Let
(,)
DistAB
be the distance between clusters
A
and
B
,and
i
y
and
j
z
be the elements of clusters
A
and
B
, respectively. Then the single-linkage method defines the inter-cluster distance as the distance between the elements in each of the two clusters that are nearest:
,
(,)min[]
ij
ij yAzB
DistABd
∈ ∈
=
. (1) Since single-linkage joins clusters by the shortest link between them, the technique cannot discern poorly separated clusters. On the other hand, single-linkage is one of the few clustering methods that can delineate non-ellipsoidal clusters. The tendency of single-linkage to pick out long string-like clusters is known as chaining. Chaining can be misleading if items at opposite ends of the chain are, in fact, quite similar [7].
3
The clusters formed by the single-linkage method will be unchanged by any assignment of distance or similarity that gives the same relative orderings as the initial distances.
3. Complete-Linkage Approach
Complete-linkage works similar to single-linkage clustering, except that at each stage of clustering process, the distance or similarity between clusters is determined by the distance between the two elements from different clusters that are most distant [1,2,4]. Similar to the single-linkage, complete-linkage method defines the inter-cluster distance as the distance between the elements in each of the two clusters that are most distant [8]:
,
(,)max[]
ij
ij yAzB
DistABd
∈ ∈
=
(2) Thus, complete-linkage ensures that all items in a cluster are within some maximum distance or minimum similarity of each other.
4. Average-Linkage Approach
Average-linkage treats the distance between two clusters as the average distance between all pairs of items where one member of a pair belongs to each cluster: 11(,)
ijabiAjB
DistABd nn
∈ ∈
=
. (3) For average-linkage clustering, changes in the assignment of distances or similarities can affect the arrangement of the final configuration of clusters, even though the changes preserve relative orderings.
5.
CK
×
–Nearest Neighbor Algorithm
In
K
-NN algorithm, a new unknown data is put in to the closest class in the learning set. The similarity is determined by using distance measures. Let
12
{,,...,}
n
Xxxx
=
be a set of
n
labeled samples.
K
-NN algorithm gives equal importance to each of the objects in assigning class label to the input vector which is one of the challenges of the
K
-NN algorithm since such an assignment could reduce the accuracy of the algorithm if there is a strong overlapping degree amongst the data vectors.
4
In the proposed approach totally
CK
×
numbers of neighbors are being considered for the new data with unknown class label. The difference of the
CK
×
-nearest neighbor algorithm is that a point is assigned to a class, which is closest to the point among its other nearest neighbors. The distance between the point and its nearest neighbor could be handled as single-linkage, average-linkage, complete-linkage or any other aggregated distance. Let
12
{,,...,}
n
Xxxx
=
be a set of
n
labeled samples,
12
{,,...,}
j
jjj jn
Cxxx
=
1,2,...,
jC
=
be the a priori known classes,
j
n
be the number of elements in class
j
C
where
12
...
c
nnnn
+ + + =
. The pseudo code of the
CK
×
nearest neighbors algorithm is given below:
BEGIN
An 3n1!##%%e/ %n,3$ #!+,e
x
Se$
,1
KKn
≤ ≤
FOR EACH CLASS
j
C
DO UNTIL (
K
-ne!"e#$ ne%&'()" )3n/ %n 1!##
j
C
)
Se$
1
i
=
0 C!13!$e $'e /%#$!n1e (e$een
x
!n/
C i
x
IF
>
iK
≤
!n/
j
C i
x
%# 1)#e" $)
x
$'!n !n4 )$'e" ,"e.%)3# ne%&'()" %n 1!##
j
C
?
THEN
Dee$e $'e !"$'e#$ #!+,e %n $'e #e$ )
K
-ne!"e#$ ne%&'()"# %n 1!##
j
C
0 A##%&n
j
C i
x
$) $'e #e$ )
K
-ne!"e#$ ne%&'()"# )" 1!##
j
C
0
END IF
1
ii
= +
END DO UNTIL
C!13!$e $'e /%#$!n1e )
x
")+
K
-ne!"e#$ ne%&'()"# ) 1!##
j
C
END FOR
M!"2 $'e 1!## %$' +%n%+3+ /%#$!n1e !#
*
j
A##%&n
x
$)
*
j
C
END
6. Experimental Results
In our study, we compare single-linkage, complete-linkage, and average-linkage based
CK
×
nearest neighbor classification results for known datasets such as Glass, Iris, and Wine from UCI machine learning repository. Note that approximately 2/3 of data for each data set is used as learning, and the rest is as test dataset. First, a model is constructed on learning set, then this model is used
5
on test data set and classification accuracy of the results is measured. We handle the following formula as the classification accuracy rate [6]:
correctlyclassifiednumberofdataCAtotalnumberofdata
=
. (4) In Fig. 1, the classification accuracy results of single-linkage, complete-linkage, and average-linkage approaches are shown. It is obvious that
CK
×
nearest neighbor approach with single-linkage provides better results than those of other methods.
(a) (b) (c)
Figure 1. Average accuracy versus
K
for (a) Iris, (b) Wine, (c) Glass data set

Search

Similar documents

Related Search

A Phenomenological Approach to the RelationshQualitative Approach to the Study of Politica2007. An Ichnofabric Approach to the DepositiA new approach to denoising EEG signals-mergeA Linguistic Approach to NarrativeA Genre Approach to Syllabus DesignA Sociocultural Approach to Urban AestheticsA Systemic Approach to LeadershipHealth Promotion Approach in ECD: A New DiscoA New Method to N-Arylmethylene Pyrroles from

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks