Small Business & Entrepreneurship

Implementation of a modified Fuzzy C-Means clustering algorithm for real-time applications

Description
Implementation of a modified Fuzzy C-Means clustering algorithm for real-time applications
Published
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Implementation of a modified Fuzzy C-Means clustering algorithmfor real-time applications Jesu´s La´zaro*, Jagoba Arias, Jose´ L. Martı´n, Carlos Cuadrado, Armando Astarloa  Department of Electronics and Telecommunications, University of the Basque Country, Alameda Urquijo s/n, 48013 Bilbao, Spain Received 30 September 2003; revised 13 February 2004; accepted 29 September 2004Available online 10 November 2005 Abstract Every month new applications of fuzzy logic to image processing appear. The lightly tight nature of fuzzy algorithms simulates humanvision and thus, the field of applications widens. This paper implements in hardware a very popular fuzzy algorithm, the Fuzzy C-Meansalgorithm. The version of the algorithm allows a high degree of parallelism, which makes the hardware implementation suited for real-timevideo applications. q 2005 Elsevier B.V. All rights reserved. Keywords:  FPGA; Segmentation; Fuzzy C-Means; Image processing 1. Introduction The fuzzy C-Mean (FCM) algorithm is used in a greatvariety of image processing designs. Its applications fieldrange from feature selection [1], document clustering [2], river quality [3] to medical applications [4–6]. Since, the fuzzy C-Mean algorithm is very time consuming, most of the applications work offline. This is the reason why, sinceits introduction, several particular implementations havebeen developed to boost its efficiency [7–12].This paper describes an implementation of the modifi-cations proposed in Ref. [12]. While in that paper severalmodifications where described that should improve theefficiency of the algorithm, no actual hardware implemen-tation was described. Furthermore, this article describeshardware improvements in the generation of the mathemat-ical terms that lead to smaller area footprints when usingonly two clusters.The circuits described in this paper will achieve a real-time performance defined as the capability of clusteringgrey scale video stream with a resolution up to 256 ! 256pixels per image and 50 images per second (both fields).Several implementations are described using Xilinx andAltera devices with comparisons, in terms of area, betweendifferent approaches in the hardware description of thealgorithm.Although a two cluster approach is not very useful inmedical or geological applications, it can be used forobject detection. This approach can be used together witha real-time motion estimation [13] to detect fast movingtarget in a noisy environment. Since the motionestimation using optical flow produces a great amountof data, a high speed clustering algorithm is requiredmaking a software implementation of the FCM algorithmnot suitable. 2. Fuzzy C-Mean A clustering approach that involves minimization of some objective function, or error criterion, belongs to afamily of objective function clustering algorithms [14]. Thepurpose of these algorithms is to partition the space of agiven data samples. When the algorithms minimizes anerror function it is often called C-Means being  c  the numberof classes or clusters. If the classes are allowed to be fuzzy, 0141-9331/$ - see front matter q 2005 Elsevier B.V. All rights reserved.doi:10.1016/j.micpro.2004.09.002Microprocessors and Microsystems 29 (2005) 375–380www.elsevier.com/locate/micpro* Corresponding author. E-mail addresses:  jtplaarj@bi.ehu.es (J. La´zaro), jtparpej@bi.ehu.es(J. Arias), jtpmagoj@bi.ehu.es (J.L. Martı´n), jtpcuvic@bi.ehu.es(C. Cuadrado), jtpascua@bi.ehu.es (A. Astarloa).  the Fuzzy C-Means (FCM) clustering algorithm may beused [15]. 2.1. FCM algorithm The Fuzzy C-Means algorithm minimizes the leasts-quares functional that is given by a generalized within-groups sum of square errors function:  J  m ð U  ;  z Þ Z X nk  Z 1 X ci Z 1 u mi ; k  , d  2 i ; k  ;  (1)where U  2  M  fcn isafuzzyc-partitionof   X  ;  z Z (  z 1 ,  z 2 , . ,  z c ) 2-  R cp , with  z i 2  R  p  as the cluster center or prototype of the  i thclass;  d  2 ik  Z jj  x k  K  z i jj , with  jj $ jj  being any inner productinduced norm on  R  p ; and  weighting  or  fuzzy exponent m 2 (1, N ). Clearly, J m : m fcn !  R cp /  R C . The optimum is reachedwhen the fuzzy partition matrix  U  * and a collection of prototypes  z * are found such that  J  m  is minimized. That is,whentheweightedwithingroupssumofdistancesbetweenthesamples and the prototypes is the smallest possible.The solutions of minimization are least-squared errorstationary points of   J  m . The necessary conditions forminimization of   J  m  are derived in Ref. [14]. The necessaryconditions for minimization of   J  m  can be written as: u i ; k  Z X c j Z 1 ð d  i ; k   =  d   j ; k  Þ ð 2  =  m K 1 Þ ! ; c i ; k  :  (2)and  Z  i Z P nk  Z 1  u mi ; k  ,  x k  P nk  Z 1  u mi ; k  :  (3)The convergence theory of the FCM algorithm wasinitially studied in Ref. [14,16] and later improved in Ref.[17,18].A pseudocode showing the algorithm is shown forclarification.beginFix  c , 2 ! c ! n ;Choose any inner product norm metric for  R  p ;Fix  m , 1 ! m !N ;Initialize  U  2  M  fcn ; for  l: Z 0 step 1  until  maxiter  do beginCalculate the C Fuzzy cluster centers (  z i ) with (3)and  U  ;Using (2) and (  z i ) obtain  U  new ;If   jj U  new K U  jj ! 3  then break; U  Z U  new ;endend 3. Modified Fuzzy C-Means The Fuzzy C-Means algorithm explained above hasseveral inconveniences to be implemented in hardware.Several improvements have been made over the years tothe algorithm. Some of them have treated the formulation[7] to simplify it while others have focused on the relationbetween the algorithm and the actual way of implemen-tation [11] to reduce the time complexity. One of the keypoints in the text of Kolen and Hutcheson [11] is itsdefinition of the algorithm to convert it into a real-timealgorithm. In their text, real time was an image every secondalthough over bigger images and more groups. 3.1. Proposed modifications The key points of the modification used in this paperinclude the number of clusters. This number must be fixed ina hardware implementation. The minimum number of clusters allowed is two (valid for applications as binariza-tion), other applications use three [4] to higher number suchas in satellite imaging. In or hardware applications it hasbeen fixed to two clusters leading to the smallest need of memory.Another particularization of the hardware implemen-tation includes the fuzziness factor  m . It appears as exponentin several points of the algorithm: in Eq. (2) as thedenominator of an exponent and in Eq. (3) as the exponentitself. The fuzziness factor must be chosen empiricallydepending on the actual applications. Since the implemen-tations of fractional exponents is such a difficult task, weobtain an ‘optimum’  m  of 2. This election makes Eq. (2)easier to calculate since obtaining the square of a number isa feasible matter.The third particularization is the initialization of the U  2  M  fcn  matrix. In software implementations, the matrix isinitialized randomly (this is the method used by the Matlabalgorithm). Such a circuit would include complexitywithoutany improvement. Instead of randomly generating the  U  matrix, the input image as the initialized  U   matrix.The main modification of the algorithm lies in the loopsection (line seven of the pseudocode). It is necessary toiterate through all the input data and the  U   matrix, inorder to obtain the fuzzy centers. In addition, the new  U  matrix is calculated from the old  U   matrix, the data andthe fuzzy centers. This means that, for each picture, twoiterations through the input data are needed or, in otherwords, the input data should be stored and read twice inthe period of time between images. This is practicallyimpossible in a real-time application. To solve thisproblem, the old centers can be used to obtain the new U   matrix. This means that the new  U   matrix and centerscan be obtained in the same iteration without needing tostore the input image. Another implication of thisparticularity is that only the value of a single pixel isneeded to obtain the corresponding element in the  U   J. La´  zaro et al. / Microprocessors and Microsystems 29 (2005) 375–380 376  matrix. Thus, the clustering can be performed as theimage arrives, not being necessary to store the wholeimage to start the processing. 4. Matlab implementation 4.1. Modified Matlab function In order to make the algorithm suitable for real-timeapplications and synthetisable in hardware, several particu-larizations, modifications and simplifications have beenperformed.The particularization comes from the fact that thenumbers of clusters and the fuzzy exponent have been setto two.The simplification comes from one of the particulariza-tions. When dividing into two clusters, the  U   matrix has twovalues for each input pixel, each of the values meaning themembership of the pixel to each cluster. Since themembership function is normalized, the second valuecontains no information. As a result, an array is storedinstead of   U  i,j , a bidimensional matrix. X number of clusters  ð c Þ  j Z 1 U  i ;  j Z X 2  j Z 1 U  i ;  j Z 1 ; U  i ; 1 C U  i ; 2 Z 1 / U  i ; 1 Z 1 K U  i ; 2 : (4)This means that the circuit only needs enough memoryfor one image.A second simplification is the way of storing the inputdata. Instead of storing in memory each image of the videostream and iterate through it, a different frame is used ineach iteration. This assumption obliges to suppose that eachframe in a video stream is not very different from theprecedent. This is not a big assumption since the fastconvergence of the algorithm needs less than a second in theinitialization to converge, and afterwards, it only needs to‘follow’ the signal in the very same fashion of a PLL. Thissimplification implies that an output pixel is to be obtainedfor each input pixel.The modification comes from the fact that twoiterations must be performed through all the input: thefirst obtains the fuzzy centers and the next obtains thedistances. The proposed modification does both oper-ations at the same time by using the previous iterationscenters to calculate the distances. The centers for the firstiteration are 0 and 1 assuring convergence. Theconvergence time is increased but the award is a parallelcode.A second modification is done in the initialization of the U   matrix. Instead of using a random number generator, theinput image is used. This modification makes the conver-gence faster since the membership of a pixel to a cluster isdirectly related to its luminance.The resulting algorithm would be: U  Z data;center Z [0 1]; while  (forever) for  (each pixel in  image ) U  ( i , 2) Z 1 K U  ( i , 1);mf(  j ) Z U  ( i ,  j ) ^ 2; d  (  j ) Z d  (  j ) C mf(  j ); n (  j ) Z d  (  j ) C mf(  j )*data( i );dist(  j ) Z abs (center(  j ) K data( i ));tmp(  j ) Z dist(  j ) ^ ( K 2); U  ( i ,1) Z tmp(1)/(tmp(1) C tmp(2));center(  j ) Z n (  j )/  d  (  j ) 5. Hardware design The main block diagram of the hardware implementationcan be seen in Fig. 1. The design is divided into five blocks. 5.1. U  new  generating block  The block in charge of generating the output  U   matrixhas two different parts. First, the distance between thepixel and each center is calculated and registered.Second, the actual value of   U  i ,1  is calculated. The fullblock diagram is depicted in Fig. 2. As it can be seen inthe block diagram, the dividend of the division block ismultiplied by 256, so that the result can be expressed inan 8 bit integer. Fig. 1. Main block diagram.Fig. 2.  U  new  generating block diagram.  J. La´  zaro et al. / Microprocessors and Microsystems 29 (2005) 375–380  377  The distance block diagram is shown in Fig. 3. The block begins with a multiplexer to account for the initial state,wherethecenterisfixedto00hinoneclusterandtoFFhintheother. The distance between pixels is calculated by subtract-ingthesmallerfromthebiggerfollowedbyasquareelevator. 5.2. Denominator generating block  The denominator generating block is in charge of generating the denominator that will calculate the newcenter and generate some of the signals needed by thenumerator generating block.The denominator block has two different parts. The firstof them is depicted in Fig. 4. This block is in charge of theinitialization of the  U   matrix (the first time the input data istaken as  U   matrix) and of generating both  U  i,x  values fromthe only one stored in memory.The second block can be seen in Fig. 5. This block ispipelined in three stages. The first of them captures  mf(x) .This value will be used by the denominator generatingblock. The second register is found in the accumulator. Athird register captures the output. This circuit is inserted tocompensate the different length of the numerator anddenominator pipes. 5.3. Numerator generating block  The numerator generating block generates the numeratorin a very similar fashion of the denominator generatingblock. The block is shown in Fig. 6.The block takes as inputs the pixel value and the  m fx value that comes from the denominator generating block.The pixel value is registered to allow the correct  m fx  valuearriving. In this case, the accumulator output is notregistered to compensate for the first register. 5.4. Center generating block  The center generating block is shown in Fig. 7. The block is composed of a fully pipelined divisor and a register. Theregister is in charge of maintaining the center value constantthrough the entire image. This is the most time consumingoperation of the algorithm but it is performed only once perimage. Thus, the lapse of time of the vertical synchronism isavailable to perform this operation. 5.5. Control unit  The control unit must: †  Generate the necessary signals to make the FIFO work. †  Manage the initialization of the system, providing thenecessary interface with the video signal. †  Control the pipeline system. †  Decide when the system should work depending whetherthe data from the video stream is valid or it is asynchronism signal. Fig. 3. Distance calculating block.Fig. 6. Numerator generating pipeline.Fig. 7. Center generating block.Fig. 5. Denominator generating pipeline.Fig. 4.  U  i,x  generating block.  J. La´  zaro et al. / Microprocessors and Microsystems 29 (2005) 375–380 378  6. Hardware implementation One of the key points in the hardware implementation isthe way in which the hardware design explained in Section 5is efficiently introduced into a programmable logic device.The project has been developed using Altera’s Max C Plus IIsoftware [19] and Xilinx ISE development platform [20]. The description of the circuit has been performed in twodifferent ways. †  The high level design and blocks have been connectedusing graphic design. †  The control unit and particular blocks such as the absolutevalue subtractor have been described using VHDL.As it can be seen in previous sections, the algorithmperforms the same operations over to different data at thesame time, one over  U  i ,1  and one over  U  i ,2 . To efficiently usesilicon area, both operations have been performed with thesame hardware. To do so, those pipelined parts are clockedat double rate and the special registers (such as those in theaccumulators) have been doubled as well. The ends of thesepipes are two registers, one for each different input data.The  U  new  generating block, from the absolute valuesubtractor to the square elevator, the pipeline depicted inFig. 5 of the Denominator generating block and the pipelinedepicted in Fig. 6 of the Numerator generating block havebeen treated that way. 7. Results The project has been translated into two differenttechnologies, Altera and Xilinx.When compiling the project into Altera components,the main concern is the use of EAB to implementthe multipliers. Depending on how many of the differentmultipliers are implemented this way, we obtain differentresource allocations that are summarized in Table 1.The first implementation implements all the multipliersin EAB except for the 8 bit multipliers. The secondimplementation uses EAB for the 8 bit multipliers but notfor the 24 bit one. It is easily seen that both designs lead tosimilar resource allocations, being slightly more hardwareconsuming the second option.In Table 2 we can see the resource allocation for threedifferent implementation using Xilinx devices. The firstimplementation uses the internal tristate buffers toimplement the different multiplexers. The secondimplementation tries to lower the number of buffersusing logic to implement the 8 bit multiplexer. The thirdimplementation uses LUTs for every multiplexer exceptfor a 42 bit multiplexer. In this case, the critical factor isthe multiplexer since the devices include hardwaremultipliers.From the Xilinx tables, we can try a different approach toreduce area. We could think that using two hardware pipescould reduce the number of tristate buffers. Since, Xilinxhas embedded multipliers the impact could reduced. As itcan be seen in the following table, although we reduce thenumber of Tbufs, apart from using more multipliers, thenumber of slices required increases accordingly, partiallybecause of the dividers (Table 3). Table 1Results using Altera devicesDevice Input pins Output pins BidirectionalpinsMemory bits Percentage of memory (%)LCs Percentage of LCs (%)1 EP1K100FC484 22 8 0 24,576 50 4205 842 EP1K100FC484 22 8 0 24,576 50 4310 86Table 2Results using Xilinx devicesx2v250 (1) x2v250 (2) X2v80 (3)Number Percentage Number Percentage Number PercentageSlices 475 30 484 31 510 99LUTs 344 11 362 11 646IOBs 30 17 30 17 30 25Tbufs 264 34 216 28 100 39MULT18X18s 3 12 3 12 3 37GCLKs 3 18 3 18 3 18Table 3Results using Xilinx devices with no shared hardwarex2v250 (2)Number PercentageSlices 924 60LUTs 864 28Tbufs 32 4MULT18X18s 6 25  J. La´  zaro et al. / Microprocessors and Microsystems 29 (2005) 375–380  379
Search
Similar documents
View more...
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks