Government Documents

A FPGA core generator for embedded classification systems

Description
A FPGA core generator for embedded classification systems
Published
of 21
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Journal of Circuits, Systems, and Computersc  World Scientific Publishing Company A FPGA CORE GENERATOR FOR EMBEDDED CLASSIFICATION SYSTEMS DAVIDE ANGUITA, LUCA CARLINO, ALESSANDRO GHIO ∗ , SANDRO RIDELLA Dept. of Biophysical and Electronic Engineering, University of Genova, Via Opera Pia 11a Genova, I-16145, Italy  Received (received date)Revised (revised date)Accepted (accepted date)We describe in this work a Core Generator for Pattern Recognition tasks. This tool isable to generate, according to user requirements, the hardware description of a digitalarchitecture, which implements a Support Vector Machine, one of the current state–of–the–art algorithms for pattern recognition. The output of the Core Generator consistsof a high–level language hardware core description, suitable to be mapped on a recon-figurable device, like a Field Programmable Gate Array (FPGA). As an example of theuse of our tool, we compare different solutions, by targeting several reconfigurable de-vices, and implement the recognition part of a machine vision system for automotiveapplications. 1. Introduction A desired goal for building more effective embedded systems for real–world ap-plications is the increase of on-board intelligence through, for example, PatternRecognition or Machine Learning algorithms  1 . These algorithms require a signif-icant amount of computational power, which is not always available on embeddedsystems due to their severe resource constraints (e.g. size, dissipation, etc.). Onepossibility for overcoming this problem consists in reducing the complexity of thealgorithms at the expense of their accuracy  2 , 3 . An alternative option consists inproviding sufficient computational capabilities, supporting the main processing unit,through a special–purpose co–processor, which is, in general, more efficient than thegeneral–purpose one. A good candidate for this task is a Field Programmable GateArray (FPGA), which can be easily configured to implement the desired compu-tation: the main processing unit deals with most of the work, while the FPGA isdedicated to the resource–consuming algorithm. Several examples of this approachhave recently appeared in the literature, thank to the development, in the lastyears, of high–performance FPGAs and the corresponding programming environ-ments: they target robot control  4 , digital signal processing  5 , DNA sequencing  6 ,etc. ∗ Corresponding author. E-mail: Alessandro.Ghio@unige.it.  A FPGA Core Generator for Embedded Classification Systems Among the algorithms that are of interest for embedding recognition capabilitieson a processing system, we focus our attention on the Support Vector Machine(SVM)  7 , 8 . The SVM has been developed mainly for pattern recognition tasks andcan be seen as a next–generation Artificial Neural Network (ANN). In particular,it resembles the Radial Basis Function (RBF) and Multi–Layer Perceptron (MLP)networks  9 , 10 , but gives up their biological plausibility in favour of solid statisticalfoundations  11 . The algorithm builds a classifier during a  training phase  , usinga set of labeled patterns (the  training set  ); then, the parameters of the trainedclassifiers are frozen and the classifier is used to predict the more plausible class of any new pattern (the  on–line   or  feedforward phase  ). Even if some solutions havebeen proposed in the literature to perform the training phase on the device itself  12 , usually this step is performed off–line, on a conventional computer, then thetrained classifier is downloaded to the embedded system  13 , 14 , 15 .While the srcinal SVM has been mostly implemented on general–purpose com-puters  16 , several variants have been proposed to target special–purpose digitalarchitectures, therefore allowing its realization on FPGAs and embedded systems 17 , 18 . Unfortunately, the hardware implementation of a SVM is not straightforward,even when implementing only the feedforward phase: in fact, the user faces severaloptions and can choose between different architectures, each one having differentadvantages and drawbacks. Depending on the user requirements and the charac-teristics of the target device, the architecture must be designed to guarantee thebest trade–off between resource utilization (memory, logic gates, adders, embeddedmultipliers, etc.) and performance (reliability of the output, throughput, maximumclock frequency, latency, etc.). For this task, a tool that generates an optimizedhardware core, according to the user requirements, can be particularly useful.The idea of using a tool for hardware design and optimization, starting from analgorithmic description, has been quite successful in the past, when Silicon Compil-ers  19 , 20 were developed to automatically generate the layout of an integrated cir-cuit, taking user specifications as inputs. Core Generators and Hardware Compilers 21 , 22 , 23 represent the modern evolution of Silicon Compilers. Their customizationscan be set by the user through a Graphical User Interface (GUI) or a high–levelsource code, and the output consists of the behavioural or structural descriptionof the architecture in a high–level Hardware Description Language (e.g. VHDL orVerilog). Our proposal belongs to this framework and allows application develop-ers to easily include a state–of–the–art pattern recognition module in their design,which is optimized according to their needs and system constraints.In particular, the main objective of this work is to describe a Core Generator forSVMs or, in other words, a tool that, according to user requirements, generates anoptimized hardware description of a digital architecture, which targets a current–generation FPGA and implements the feedforward phase of a trained SVM.This paper is organized as follows: as a first issue, we briefly review a hardware–friendly version of the SVM, that is a reformulation of the SVM using fixed–pointarithmetic, which is more suitable for resource constrained systems where floating–  A FPGA Core Generator for Embedded Classification Systems point units are usually avoided. Then, we propose several digital architectures, thatcan be exploited for implementing the SVM: each architecture is optimal accordingto some criteria and the main scope of the Core Generator is to choose the best onerespect to the application, the user’s requirements and the main characteristics of the target device. Finally, we show the use of the Core Generator by targeting fourdifferent FPGAs, from 300K up to 4M equivalent logic gates. Both the performanceand the resource requirements for implementing the SVM core are detailed andcompared, along with the best architecture selected by the Core Generator. 2. A Support Vector Machine for digital hardware The algorithm targeted by our analysis is the homogeneous SVM: this version is the-oretically equivalent to the conventional one  24 and is more amenable for hardwareimplementations  18 .Let us suppose that the dataset, which must be learned by the SVM, is composedby  l  patterns  { x 1 ,...., x l } , where  x i  ∈ ℜ m . The information stored in each patterndepends on the particular application and can be, for example, a digital image or asignal or, in general, a set of features. Note that it is also possible to deal with nonnumerical or structured patterns  25 but, given the kind of applications targeted inthis work, we do not address here this possibility.Each pattern is labeled according to its class, which is indicated by a binaryvalue  y i  =  ± 1,  ∀ i  ∈  [1 ,l ]; the extension to more than two classes is easily addressedby combining several binary classifiers  26 .The homogeneous SVM learning phase consists in solving the following Con-strained Quadratic Programming (CQP) problem:min α 12 α T  Q α  + r T  α  (1)0  ≤  α i  ≤  C   ∀ i  ∈  [1 ,...,l ]where  q  ij  =  K  ( x i , x j ),  r i  =  − 1  ∀ i ,  C   is a user defined hyperparameter and  K  ( · , · )is a positive semidefinite Mercer’s kernel  7 , 24 .The solution of the above problem produces the parameters of the classifier  α ,which are used in the feedforward phase: f  ( x ) = l  i =1 y i α i K   ( x i , x ) (2)where  x  is the new pattern to be classified. The predicted classification label cor-responds to the sign of   f  ( x ).It is easy to see from Eq. (2) that only the training patterns corresponding to α i   = 0 are involved in the feedforward phase: they are called  Support Vectors   ( SVs  )since they are the only data supporting the classification.  A FPGA Core Generator for Embedded Classification Systems The modification of the conventional SVM to a hardware–friendly version hasbeen the objective of recent research activity  18 . Most of the effort lies in themapping of the SVM solution from the real domain (i.e. floating–point) to theinteger one (i.e. fixed–point), but retaining the good generalization ability of thesrcinal classifier. In this sense, Eq. (1) can be reformulated as a Mixed–IntegerQuadratic Programming (MIQP) problem, which can be solved through advancedoptimization techniques  18 , 27 .A similar attention must be reserved to select a Mercer’s kernel that can beeasily implemented in hardware. A Mercer’s hardware–friendly kernel is: K  ( x i , x ) = 2 − γ   x i − x  1 (3)where   z  1  =   j | z j |  is the Manhattan norm (i.e. 1-norm),  γ   = 2  p , and  p  is asigned integer value. This kernel allows to achieve remarkable classification accuracyin many practical pattern recognition tasks and is particularly suited for digitalhardware implementations because it can be computed through a CORDIC–likealgorithm (i.e. using only shifts and adds)  17 , 18 . Its mapping to the fixed–pointdomain is straightforward, as the value of the kernel can be uniformly quantizedwith  u  bits  28 :0  ≤  K  ( x i , x j )  ≤  1 − 2 − u ∀ i  ∈  [1 ,l ] .  (4)where we suppose  x i   =  x j  (in fact, if this assumption does not hold, the correctclassification is trivially  y i ). Analogously, it is possible to normalize and quantizethe data patterns with  v  ≥  1 bits0  ≤  x i  ≤  1 − 2 − v ∀ i  ∈  [1 ,m ] (5)so that the feedforward phase of the SVM can be computed as f  ( x ) = l  i =1 y i β  i K   ( x i , x ) (6)where all the variables are in fixed–point format and, therefore, easily mapped to aresource–constrained digital device. 3. Architectural building blocks for the feedforward SVM In Fig. 1 the high–level view of the SVM core is presented. The data needed tocompute the feedforward phase (i.e. the coefficients  β  i  and the training patterns x i ) can be stored in the FPGA itself or in an external memory. In both cases thearchitecture for the computation of the feedforward phase is the same and can be  A FPGA Core Generator for Embedded Classification Systems divided in three blocks (Fig. 2). Fig. 1. The SVM core.Fig. 2. The architecture for the feedforward phase of the SVM. The first block computes the Manhattan norm   x i  − x  1  between the inputpattern  x  and the  i –th training pattern  x i . Since the values of the hyperparameter
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks