Journal of Circuits, Systems, and Computersc
World Scientiﬁc Publishing Company
A FPGA CORE GENERATOR FOR EMBEDDED CLASSIFICATION SYSTEMS
DAVIDE ANGUITA, LUCA CARLINO, ALESSANDRO GHIO
∗
, SANDRO RIDELLA
Dept. of Biophysical and Electronic Engineering, University of Genova, Via Opera Pia 11a Genova, I16145, Italy
Received (received date)Revised (revised date)Accepted (accepted date)We describe in this work a Core Generator for Pattern Recognition tasks. This tool isable to generate, according to user requirements, the hardware description of a digitalarchitecture, which implements a Support Vector Machine, one of the current state–of–the–art algorithms for pattern recognition. The output of the Core Generator consistsof a high–level language hardware core description, suitable to be mapped on a reconﬁgurable device, like a Field Programmable Gate Array (FPGA). As an example of theuse of our tool, we compare diﬀerent solutions, by targeting several reconﬁgurable devices, and implement the recognition part of a machine vision system for automotiveapplications.
1. Introduction
A desired goal for building more eﬀective embedded systems for real–world applications is the increase of onboard intelligence through, for example, PatternRecognition or Machine Learning algorithms
1
. These algorithms require a significant amount of computational power, which is not always available on embeddedsystems due to their severe resource constraints (e.g. size, dissipation, etc.). Onepossibility for overcoming this problem consists in reducing the complexity of thealgorithms at the expense of their accuracy
2
,
3
. An alternative option consists inproviding suﬃcient computational capabilities, supporting the main processing unit,through a special–purpose co–processor, which is, in general, more eﬃcient than thegeneral–purpose one. A good candidate for this task is a Field Programmable GateArray (FPGA), which can be easily conﬁgured to implement the desired computation: the main processing unit deals with most of the work, while the FPGA isdedicated to the resource–consuming algorithm. Several examples of this approachhave recently appeared in the literature, thank to the development, in the lastyears, of high–performance FPGAs and the corresponding programming environments: they target robot control
4
, digital signal processing
5
, DNA sequencing
6
,etc.
∗
Corresponding author. Email: Alessandro.Ghio@unige.it.
A FPGA Core Generator for Embedded Classiﬁcation Systems
Among the algorithms that are of interest for embedding recognition capabilitieson a processing system, we focus our attention on the Support Vector Machine(SVM)
7
,
8
. The SVM has been developed mainly for pattern recognition tasks andcan be seen as a next–generation Artiﬁcial Neural Network (ANN). In particular,it resembles the Radial Basis Function (RBF) and Multi–Layer Perceptron (MLP)networks
9
,
10
, but gives up their biological plausibility in favour of solid statisticalfoundations
11
. The algorithm builds a classiﬁer during a
training phase
, usinga set of labeled patterns (the
training set
); then, the parameters of the trainedclassiﬁers are frozen and the classiﬁer is used to predict the more plausible class of any new pattern (the
on–line
or
feedforward phase
). Even if some solutions havebeen proposed in the literature to perform the training phase on the device itself
12
, usually this step is performed oﬀ–line, on a conventional computer, then thetrained classiﬁer is downloaded to the embedded system
13
,
14
,
15
.While the srcinal SVM has been mostly implemented on general–purpose computers
16
, several variants have been proposed to target special–purpose digitalarchitectures, therefore allowing its realization on FPGAs and embedded systems
17
,
18
. Unfortunately, the hardware implementation of a SVM is not straightforward,even when implementing only the feedforward phase: in fact, the user faces severaloptions and can choose between diﬀerent architectures, each one having diﬀerentadvantages and drawbacks. Depending on the user requirements and the characteristics of the target device, the architecture must be designed to guarantee thebest trade–oﬀ between resource utilization (memory, logic gates, adders, embeddedmultipliers, etc.) and performance (reliability of the output, throughput, maximumclock frequency, latency, etc.). For this task, a tool that generates an optimizedhardware core, according to the user requirements, can be particularly useful.The idea of using a tool for hardware design and optimization, starting from analgorithmic description, has been quite successful in the past, when Silicon Compilers
19
,
20
were developed to automatically generate the layout of an integrated circuit, taking user speciﬁcations as inputs. Core Generators and Hardware Compilers
21
,
22
,
23
represent the modern evolution of Silicon Compilers. Their customizationscan be set by the user through a Graphical User Interface (GUI) or a high–levelsource code, and the output consists of the behavioural or structural descriptionof the architecture in a high–level Hardware Description Language (e.g. VHDL orVerilog). Our proposal belongs to this framework and allows application developers to easily include a state–of–the–art pattern recognition module in their design,which is optimized according to their needs and system constraints.In particular, the main objective of this work is to describe a Core Generator forSVMs or, in other words, a tool that, according to user requirements, generates anoptimized hardware description of a digital architecture, which targets a current–generation FPGA and implements the feedforward phase of a trained SVM.This paper is organized as follows: as a ﬁrst issue, we brieﬂy review a hardware–friendly version of the SVM, that is a reformulation of the SVM using ﬁxed–pointarithmetic, which is more suitable for resource constrained systems where ﬂoating–
A FPGA Core Generator for Embedded Classiﬁcation Systems
point units are usually avoided. Then, we propose several digital architectures, thatcan be exploited for implementing the SVM: each architecture is optimal accordingto some criteria and the main scope of the Core Generator is to choose the best onerespect to the application, the user’s requirements and the main characteristics of the target device. Finally, we show the use of the Core Generator by targeting fourdiﬀerent FPGAs, from 300K up to 4M equivalent logic gates. Both the performanceand the resource requirements for implementing the SVM core are detailed andcompared, along with the best architecture selected by the Core Generator.
2. A Support Vector Machine for digital hardware
The algorithm targeted by our analysis is the homogeneous SVM: this version is theoretically equivalent to the conventional one
24
and is more amenable for hardwareimplementations
18
.Let us suppose that the dataset, which must be learned by the SVM, is composedby
l
patterns
{
x
1
,....,
x
l
}
, where
x
i
∈ ℜ
m
. The information stored in each patterndepends on the particular application and can be, for example, a digital image or asignal or, in general, a set of features. Note that it is also possible to deal with nonnumerical or structured patterns
25
but, given the kind of applications targeted inthis work, we do not address here this possibility.Each pattern is labeled according to its class, which is indicated by a binaryvalue
y
i
=
±
1,
∀
i
∈
[1
,l
]; the extension to more than two classes is easily addressedby combining several binary classiﬁers
26
.The homogeneous SVM learning phase consists in solving the following Constrained Quadratic Programming (CQP) problem:min
α
12
α
T
Q
α
+
r
T
α
(1)0
≤
α
i
≤
C
∀
i
∈
[1
,...,l
]where
q
ij
=
K
(
x
i
,
x
j
),
r
i
=
−
1
∀
i
,
C
is a user deﬁned hyperparameter and
K
(
·
,
·
)is a positive semideﬁnite Mercer’s kernel
7
,
24
.The solution of the above problem produces the parameters of the classiﬁer
α
,which are used in the feedforward phase:
f
(
x
) =
l
i
=1
y
i
α
i
K
(
x
i
,
x
) (2)where
x
is the new pattern to be classiﬁed. The predicted classiﬁcation label corresponds to the sign of
f
(
x
).It is easy to see from Eq. (2) that only the training patterns corresponding to
α
i
= 0 are involved in the feedforward phase: they are called
Support Vectors
(
SVs
)since they are the only data supporting the classiﬁcation.
A FPGA Core Generator for Embedded Classiﬁcation Systems
The modiﬁcation of the conventional SVM to a hardware–friendly version hasbeen the objective of recent research activity
18
. Most of the eﬀort lies in themapping of the SVM solution from the real domain (i.e. ﬂoating–point) to theinteger one (i.e. ﬁxed–point), but retaining the good generalization ability of thesrcinal classiﬁer. In this sense, Eq. (1) can be reformulated as a Mixed–IntegerQuadratic Programming (MIQP) problem, which can be solved through advancedoptimization techniques
18
,
27
.A similar attention must be reserved to select a Mercer’s kernel that can beeasily implemented in hardware. A Mercer’s hardware–friendly kernel is:
K
(
x
i
,
x
) = 2
−
γ
x
i
−
x
1
(3)where
z
1
=
j

z
j

is the Manhattan norm (i.e. 1norm),
γ
= 2
p
, and
p
is asigned integer value. This kernel allows to achieve remarkable classiﬁcation accuracyin many practical pattern recognition tasks and is particularly suited for digitalhardware implementations because it can be computed through a CORDIC–likealgorithm (i.e. using only shifts and adds)
17
,
18
. Its mapping to the ﬁxed–pointdomain is straightforward, as the value of the kernel can be uniformly quantizedwith
u
bits
28
:0
≤
K
(
x
i
,
x
j
)
≤
1
−
2
−
u
∀
i
∈
[1
,l
]
.
(4)where we suppose
x
i
=
x
j
(in fact, if this assumption does not hold, the correctclassiﬁcation is trivially
y
i
). Analogously, it is possible to normalize and quantizethe data patterns with
v
≥
1 bits0
≤
x
i
≤
1
−
2
−
v
∀
i
∈
[1
,m
] (5)so that the feedforward phase of the SVM can be computed as
f
(
x
) =
l
i
=1
y
i
β
i
K
(
x
i
,
x
) (6)where all the variables are in ﬁxed–point format and, therefore, easily mapped to aresource–constrained digital device.
3. Architectural building blocks for the feedforward SVM
In Fig. 1 the high–level view of the SVM core is presented. The data needed tocompute the feedforward phase (i.e. the coeﬃcients
β
i
and the training patterns
x
i
) can be stored in the FPGA itself or in an external memory. In both cases thearchitecture for the computation of the feedforward phase is the same and can be
A FPGA Core Generator for Embedded Classiﬁcation Systems
divided in three blocks (Fig. 2).
Fig. 1. The SVM core.Fig. 2. The architecture for the feedforward phase of the SVM.
The ﬁrst block computes the Manhattan norm
x
i
−
x
1
between the inputpattern
x
and the
i
–th training pattern
x
i
. Since the values of the hyperparameter