Description

Implementation of a modified Fuzzy C-Means clustering algorithm for real-time applications

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

Implementation of a modiﬁed Fuzzy C-Means clustering algorithmfor real-time applications
Jesu´s La´zaro*, Jagoba Arias, Jose´ L. Martı´n, Carlos Cuadrado, Armando Astarloa
Department of Electronics and Telecommunications, University of the Basque Country, Alameda Urquijo s/n, 48013 Bilbao, Spain
Received 30 September 2003; revised 13 February 2004; accepted 29 September 2004Available online 10 November 2005
Abstract
Every month new applications of fuzzy logic to image processing appear. The lightly tight nature of fuzzy algorithms simulates humanvision and thus, the ﬁeld of applications widens. This paper implements in hardware a very popular fuzzy algorithm, the Fuzzy C-Meansalgorithm. The version of the algorithm allows a high degree of parallelism, which makes the hardware implementation suited for real-timevideo applications.
q
2005 Elsevier B.V. All rights reserved.
Keywords:
FPGA; Segmentation; Fuzzy C-Means; Image processing
1. Introduction
The fuzzy C-Mean (FCM) algorithm is used in a greatvariety of image processing designs. Its applications ﬁeldrange from feature selection [1], document clustering [2],
river quality [3] to medical applications [4–6]. Since, the
fuzzy C-Mean algorithm is very time consuming, most of the applications work ofﬂine. This is the reason why, sinceits introduction, several particular implementations havebeen developed to boost its efﬁciency [7–12].This paper describes an implementation of the modiﬁ-cations proposed in Ref. [12]. While in that paper severalmodiﬁcations where described that should improve theefﬁciency of the algorithm, no actual hardware implemen-tation was described. Furthermore, this article describeshardware improvements in the generation of the mathemat-ical terms that lead to smaller area footprints when usingonly two clusters.The circuits described in this paper will achieve a real-time performance deﬁned as the capability of clusteringgrey scale video stream with a resolution up to 256
!
256pixels per image and 50 images per second (both ﬁelds).Several implementations are described using Xilinx andAltera devices with comparisons, in terms of area, betweendifferent approaches in the hardware description of thealgorithm.Although a two cluster approach is not very useful inmedical or geological applications, it can be used forobject detection. This approach can be used together witha real-time motion estimation [13] to detect fast movingtarget in a noisy environment. Since the motionestimation using optical ﬂow produces a great amountof data, a high speed clustering algorithm is requiredmaking a software implementation of the FCM algorithmnot suitable.
2. Fuzzy C-Mean
A clustering approach that involves minimization of some objective function, or error criterion, belongs to afamily of objective function clustering algorithms [14]. Thepurpose of these algorithms is to partition the space of agiven data samples. When the algorithms minimizes anerror function it is often called C-Means being
c
the numberof classes or clusters. If the classes are allowed to be fuzzy,
0141-9331/$ - see front matter
q
2005 Elsevier B.V. All rights reserved.doi:10.1016/j.micpro.2004.09.002Microprocessors and Microsystems 29 (2005) 375–380www.elsevier.com/locate/micpro* Corresponding author.
E-mail addresses:
jtplaarj@bi.ehu.es (J. La´zaro), jtparpej@bi.ehu.es(J. Arias), jtpmagoj@bi.ehu.es (J.L. Martı´n), jtpcuvic@bi.ehu.es(C. Cuadrado), jtpascua@bi.ehu.es (A. Astarloa).
the Fuzzy C-Means (FCM) clustering algorithm may beused [15].
2.1. FCM algorithm
The Fuzzy C-Means algorithm minimizes the leasts-quares functional that is given by a generalized within-groups sum of square errors function:
J
m
ð
U
;
z
Þ
Z
X
nk
Z
1
X
ci
Z
1
u
mi
;
k
,
d
2
i
;
k
;
(1)where
U
2
M
fcn
isafuzzyc-partitionof
X
;
z
Z
(
z
1
,
z
2
,
.
,
z
c
)
2-
R
cp
, with
z
i
2
R
p
as the cluster center or prototype of the
i
thclass;
d
2
ik
Z
jj
x
k
K
z
i
jj
, with
jj
$
jj
being any inner productinduced norm on
R
p
; and
weighting
or
fuzzy exponent m
2
(1,
N
). Clearly, J
m
:
m
fcn
!
R
cp
/
R
C
. The optimum is reachedwhen the fuzzy partition matrix
U
* and a collection of prototypes
z
* are found such that
J
m
is minimized. That is,whentheweightedwithingroupssumofdistancesbetweenthesamples and the prototypes is the smallest possible.The solutions of minimization are least-squared errorstationary points of
J
m
. The necessary conditions forminimization of
J
m
are derived in Ref. [14]. The necessaryconditions for minimization of
J
m
can be written as:
u
i
;
k
Z
X
c j
Z
1
ð
d
i
;
k
=
d
j
;
k
Þ
ð
2
=
m
K
1
Þ
!
;
c
i
;
k
:
(2)and
Z
i
Z
P
nk
Z
1
u
mi
;
k
,
x
k
P
nk
Z
1
u
mi
;
k
:
(3)The convergence theory of the FCM algorithm wasinitially studied in Ref. [14,16] and later improved in Ref.[17,18].A pseudocode showing the algorithm is shown forclariﬁcation.beginFix
c
, 2
!
c
!
n
;Choose any inner product norm metric for
R
p
;Fix
m
, 1
!
m
!N
;Initialize
U
2
M
fcn
;
for
l:
Z
0 step 1
until
maxiter
do
beginCalculate the C Fuzzy cluster centers (
z
i
) with (3)and
U
;Using (2) and (
z
i
) obtain
U
new
;If
jj
U
new
K
U
jj
!
3
then break;
U
Z
U
new
;endend
3. Modiﬁed Fuzzy C-Means
The Fuzzy C-Means algorithm explained above hasseveral inconveniences to be implemented in hardware.Several improvements have been made over the years tothe algorithm. Some of them have treated the formulation[7] to simplify it while others have focused on the relationbetween the algorithm and the actual way of implemen-tation [11] to reduce the time complexity. One of the keypoints in the text of Kolen and Hutcheson [11] is itsdeﬁnition of the algorithm to convert it into a real-timealgorithm. In their text, real time was an image every secondalthough over bigger images and more groups.
3.1. Proposed modiﬁcations
The key points of the modiﬁcation used in this paperinclude the number of clusters. This number must be ﬁxed ina hardware implementation. The minimum number of clusters allowed is two (valid for applications as binariza-tion), other applications use three [4] to higher number suchas in satellite imaging. In or hardware applications it hasbeen ﬁxed to two clusters leading to the smallest need of memory.Another particularization of the hardware implemen-tation includes the fuzziness factor
m
. It appears as exponentin several points of the algorithm: in Eq. (2) as thedenominator of an exponent and in Eq. (3) as the exponentitself. The fuzziness factor must be chosen empiricallydepending on the actual applications. Since the implemen-tations of fractional exponents is such a difﬁcult task, weobtain an ‘optimum’
m
of 2. This election makes Eq. (2)easier to calculate since obtaining the square of a number isa feasible matter.The third particularization is the initialization of the
U
2
M
fcn
matrix. In software implementations, the matrix isinitialized randomly (this is the method used by the Matlabalgorithm). Such a circuit would include complexitywithoutany improvement. Instead of randomly generating the
U
matrix, the input image as the initialized
U
matrix.The main modiﬁcation of the algorithm lies in the loopsection (line seven of the pseudocode). It is necessary toiterate through all the input data and the
U
matrix, inorder to obtain the fuzzy centers. In addition, the new
U
matrix is calculated from the old
U
matrix, the data andthe fuzzy centers. This means that, for each picture, twoiterations through the input data are needed or, in otherwords, the input data should be stored and read twice inthe period of time between images. This is practicallyimpossible in a real-time application. To solve thisproblem, the old centers can be used to obtain the new
U
matrix. This means that the new
U
matrix and centerscan be obtained in the same iteration without needing tostore the input image. Another implication of thisparticularity is that only the value of a single pixel isneeded to obtain the corresponding element in the
U
J. La´ zaro et al. / Microprocessors and Microsystems 29 (2005) 375–380
376
matrix. Thus, the clustering can be performed as theimage arrives, not being necessary to store the wholeimage to start the processing.
4. Matlab implementation
4.1. Modiﬁed Matlab function
In order to make the algorithm suitable for real-timeapplications and synthetisable in hardware, several particu-larizations, modiﬁcations and simpliﬁcations have beenperformed.The particularization comes from the fact that thenumbers of clusters and the fuzzy exponent have been setto two.The simpliﬁcation comes from one of the particulariza-tions. When dividing into two clusters, the
U
matrix has twovalues for each input pixel, each of the values meaning themembership of the pixel to each cluster. Since themembership function is normalized, the second valuecontains no information. As a result, an array is storedinstead of
U
i,j
, a bidimensional matrix.
X
number of clusters
ð
c
Þ
j
Z
1
U
i
;
j
Z
X
2
j
Z
1
U
i
;
j
Z
1
;
U
i
;
1
C
U
i
;
2
Z
1
/
U
i
;
1
Z
1
K
U
i
;
2
:
(4)This means that the circuit only needs enough memoryfor one image.A second simpliﬁcation is the way of storing the inputdata. Instead of storing in memory each image of the videostream and iterate through it, a different frame is used ineach iteration. This assumption obliges to suppose that eachframe in a video stream is not very different from theprecedent. This is not a big assumption since the fastconvergence of the algorithm needs less than a second in theinitialization to converge, and afterwards, it only needs to‘follow’ the signal in the very same fashion of a PLL. Thissimpliﬁcation implies that an output pixel is to be obtainedfor each input pixel.The modiﬁcation comes from the fact that twoiterations must be performed through all the input: theﬁrst obtains the fuzzy centers and the next obtains thedistances. The proposed modiﬁcation does both oper-ations at the same time by using the previous iterationscenters to calculate the distances. The centers for the ﬁrstiteration are 0 and 1 assuring convergence. Theconvergence time is increased but the award is a parallelcode.A second modiﬁcation is done in the initialization of the
U
matrix. Instead of using a random number generator, theinput image is used. This modiﬁcation makes the conver-gence faster since the membership of a pixel to a cluster isdirectly related to its luminance.The resulting algorithm would be:
U
Z
data;center
Z
[0 1];
while
(forever)
for
(each pixel in
image
)
U
(
i
, 2)
Z
1
K
U
(
i
, 1);mf(
j
)
Z
U
(
i
,
j
)
^
2;
d
(
j
)
Z
d
(
j
)
C
mf(
j
);
n
(
j
)
Z
d
(
j
)
C
mf(
j
)*data(
i
);dist(
j
)
Z
abs
(center(
j
)
K
data(
i
));tmp(
j
)
Z
dist(
j
)
^
(
K
2);
U
(
i
,1)
Z
tmp(1)/(tmp(1)
C
tmp(2));center(
j
)
Z
n
(
j
)/
d
(
j
)
5. Hardware design
The main block diagram of the hardware implementationcan be seen in Fig. 1. The design is divided into ﬁve blocks.
5.1. U
new
generating block
The block in charge of generating the output
U
matrixhas two different parts. First, the distance between thepixel and each center is calculated and registered.Second, the actual value of
U
i
,1
is calculated. The fullblock diagram is depicted in Fig. 2. As it can be seen inthe block diagram, the dividend of the division block ismultiplied by 256, so that the result can be expressed inan 8 bit integer.
Fig. 1. Main block diagram.Fig. 2.
U
new
generating block diagram.
J. La´ zaro et al. / Microprocessors and Microsystems 29 (2005) 375–380
377
The distance block diagram is shown in Fig. 3. The block begins with a multiplexer to account for the initial state,wherethecenterisﬁxedto00hinoneclusterandtoFFhintheother. The distance between pixels is calculated by subtract-ingthesmallerfromthebiggerfollowedbyasquareelevator.
5.2. Denominator generating block
The denominator generating block is in charge of generating the denominator that will calculate the newcenter and generate some of the signals needed by thenumerator generating block.The denominator block has two different parts. The ﬁrstof them is depicted in Fig. 4. This block is in charge of theinitialization of the
U
matrix (the ﬁrst time the input data istaken as
U
matrix) and of generating both
U
i,x
values fromthe only one stored in memory.The second block can be seen in Fig. 5. This block ispipelined in three stages. The ﬁrst of them captures
mf(x)
.This value will be used by the denominator generatingblock. The second register is found in the accumulator. Athird register captures the output. This circuit is inserted tocompensate the different length of the numerator anddenominator pipes.
5.3. Numerator generating block
The numerator generating block generates the numeratorin a very similar fashion of the denominator generatingblock. The block is shown in Fig. 6.The block takes as inputs the pixel value and the
m
fx
value that comes from the denominator generating block.The pixel value is registered to allow the correct
m
fx
valuearriving. In this case, the accumulator output is notregistered to compensate for the ﬁrst register.
5.4. Center generating block
The center generating block is shown in Fig. 7. The block is composed of a fully pipelined divisor and a register. Theregister is in charge of maintaining the center value constantthrough the entire image. This is the most time consumingoperation of the algorithm but it is performed only once perimage. Thus, the lapse of time of the vertical synchronism isavailable to perform this operation.
5.5. Control unit
The control unit must:
†
Generate the necessary signals to make the FIFO work.
†
Manage the initialization of the system, providing thenecessary interface with the video signal.
†
Control the pipeline system.
†
Decide when the system should work depending whetherthe data from the video stream is valid or it is asynchronism signal.
Fig. 3. Distance calculating block.Fig. 6. Numerator generating pipeline.Fig. 7. Center generating block.Fig. 5. Denominator generating pipeline.Fig. 4.
U
i,x
generating block.
J. La´ zaro et al. / Microprocessors and Microsystems 29 (2005) 375–380
378
6. Hardware implementation
One of the key points in the hardware implementation isthe way in which the hardware design explained in Section 5is efﬁciently introduced into a programmable logic device.The project has been developed using Altera’s Max
C
Plus IIsoftware [19] and Xilinx ISE development platform [20].
The description of the circuit has been performed in twodifferent ways.
†
The high level design and blocks have been connectedusing graphic design.
†
The control unit and particular blocks such as the absolutevalue subtractor have been described using VHDL.As it can be seen in previous sections, the algorithmperforms the same operations over to different data at thesame time, one over
U
i
,1
and one over
U
i
,2
. To efﬁciently usesilicon area, both operations have been performed with thesame hardware. To do so, those pipelined parts are clockedat double rate and the special registers (such as those in theaccumulators) have been doubled as well. The ends of thesepipes are two registers, one for each different input data.The
U
new
generating block, from the absolute valuesubtractor to the square elevator, the pipeline depicted inFig. 5 of the Denominator generating block and the pipelinedepicted in Fig. 6 of the Numerator generating block havebeen treated that way.
7. Results
The project has been translated into two differenttechnologies, Altera and Xilinx.When compiling the project into Altera components,the main concern is the use of EAB to implementthe multipliers. Depending on how many of the differentmultipliers are implemented this way, we obtain differentresource allocations that are summarized in Table 1.The ﬁrst implementation implements all the multipliersin EAB except for the 8 bit multipliers. The secondimplementation uses EAB for the 8 bit multipliers but notfor the 24 bit one. It is easily seen that both designs lead tosimilar resource allocations, being slightly more hardwareconsuming the second option.In Table 2 we can see the resource allocation for threedifferent implementation using Xilinx devices. The ﬁrstimplementation uses the internal tristate buffers toimplement the different multiplexers. The secondimplementation tries to lower the number of buffersusing logic to implement the 8 bit multiplexer. The thirdimplementation uses LUTs for every multiplexer exceptfor a 42 bit multiplexer. In this case, the critical factor isthe multiplexer since the devices include hardwaremultipliers.From the Xilinx tables, we can try a different approach toreduce area. We could think that using two hardware pipescould reduce the number of tristate buffers. Since, Xilinxhas embedded multipliers the impact could reduced. As itcan be seen in the following table, although we reduce thenumber of Tbufs, apart from using more multipliers, thenumber of slices required increases accordingly, partiallybecause of the dividers (Table 3).
Table 1Results using Altera devicesDevice Input pins Output pins BidirectionalpinsMemory bits Percentage of memory (%)LCs Percentage of LCs (%)1 EP1K100FC484 22 8 0 24,576 50 4205 842 EP1K100FC484 22 8 0 24,576 50 4310 86Table 2Results using Xilinx devicesx2v250 (1) x2v250 (2) X2v80 (3)Number Percentage Number Percentage Number PercentageSlices 475 30 484 31 510 99LUTs 344 11 362 11 646IOBs 30 17 30 17 30 25Tbufs 264 34 216 28 100 39MULT18X18s 3 12 3 12 3 37GCLKs 3 18 3 18 3 18Table 3Results using Xilinx devices with no shared hardwarex2v250 (2)Number PercentageSlices 924 60LUTs 864 28Tbufs 32 4MULT18X18s 6 25
J. La´ zaro et al. / Microprocessors and Microsystems 29 (2005) 375–380
379

Search

Similar documents

Tags

Related Search

Fuzzy C-Means Clustering AlgorithmFuzzy C-Means ClusteringDevelopment and Implementation of a Fuzzy LogFuzzy C MeansStudy, Design and Implementation of a Quadcopk-Means clustering AlgorithmA novel comprehensive method for real time ViDensity Based Clustering algorithm for VehicuZero Of A FunctionWorld Of A Song Of Ice And Fire

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks