Description

A sliding-window kernel RLS algorithm and its application to nonlinear channel identification

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

A SLIDING-WINDOW KERNEL RLS ALGORITHM AND ITS APPLICATION TONONLINEAR CHANNEL IDENTIFICATION
Steven Van Vaerenbergh, Javier V ´
ı
a and Ignacio Santamar ´
ı
a
Dept. of Communications Engineering, University of Cantabria, SpainE-mail:
{
steven,jvia,nacho
}
@gtas.dicom.unican.es
ABSTRACT
In this paper we propose a new kernel-based version of therecursive least-squares (RLS) algorithm for fast adaptive non-linear
ﬁ
ltering. Unlike other previous approaches, we com-bine a sliding-window approach (to
ﬁ
x the dimensions of thekernel matrix) with conventional
L
2
-norm regularization (toimprove generalization). The proposed kernel RLS algorithmisappliedtoanonlinearchannelidenti
ﬁ
cationproblem(specif-ically, a linear
ﬁ
lter followed by a memoryless nonlinearity),which typically appears in satellite communications or digitalmagnetic recording systems. We show that the proposed al-gorithm is able to operate in a time-varying environment andtracks abrupt changes in either the linear
ﬁ
lter or the nonlin-earity.
1. INTRODUCTION
In recent years a number of kernel methods, including supportvector machines [1], kernel principal component analysis [2],kernel Fisher discriminant analysis [3] and kernel canonicalcorrelation analysis [4, 5] have been proposed and success-fully applied to classi
ﬁ
cation and nonlinear regression prob-lems. In their srcinal forms, most of these algorithms cannotbe used to operate online since a number of dif
ﬁ
culties are in-troduced by the kernel methods, such as the time and memorycomplexities (because of the growing kernel matrix) and theneed to avoid over
ﬁ
tting of the problem.Recently a kernel RLS algorithm was proposed that dealtwith both dif
ﬁ
culties [6]: by applying a sparsi
ﬁ
cation proce-dure the kernel matrix size was limited and the order of theproblem was reduced. In this paper we present a differentapproach, applying a sliding-window approach and conven-tional regularization. This way the size of the kernel matrixcan be
ﬁ
xed rather than limited, allowing the algorithm to op-erate online in time-varying environments.The basic idea of kernel methods is to transform the data
x
i
from the input space to a high dimensional feature spaceof vectors
Φ(
x
i
)
, where the inner products can be calculatedusing a positive de
ﬁ
nite kernel function satisfying Mercer’scondition [1]:
κ
(
x
i
,
x
j
) =
Φ(
x
i
)
,
Φ(
x
j
)
. This simple andelegant idea, also known as the “kernel trick”, allows inner
This work was supported by MEC (Ministerio de Educaci´on y Ciencia)under grant TEC2004-06451-C05-02.
products in the feature space to be computed without makingdirect reference to the feature vectors.A common nonlinear kernel is the Gaussian kernel
κ
(
x
,
y
) = exp
−
x
−
y
2
2
σ
2
.
(1)This kernel will be used to calculate the elements of a
kernelmatrix
. In the sliding-window approach, updating this kernelmatrix means
ﬁ
rst removing its
ﬁ
rst row and
ﬁ
rst column andthen adding a new row and column at the end, based on newobservations. Calculation of its inverse is needed to updatethe algorithm’s solution. To this end, two matrix inversionformulas were derived in Appendix A. One of them has al-ready been used in kernel methods [7] but in order to
ﬁ
x thedimensions of the kernel matrix we also introduce a comple-mentary formula.The rest of this paper is organized as follows: in Section 2a kernel transformation is introduced into linear least-squaresregression theory. A detailed description of the proposed al-gorithm is given in Section 3, and in Section 4 it is applied toa nonlinear channel identi
ﬁ
cation problem. Finally, Section 5summarizes the main conclusions of this work.
2. LEAST-SQUARES REGRESSION2.1. Linear Methods
The least-squares (LS) criterion [8] is a widely used methodin signal processing. Given a vector
y
∈
R
N
×
1
and a datamatrix
X
∈
R
N
×
M
of observations, it consists in seeking theoptimal vector
h
∈
R
M
×
1
that solves
J
= min
h
y
−
Xh
2
.
(2)It should be clear that the solution
h
can be represented in thebasis de
ﬁ
ned by the rows of
X
. Hence it can also be written as
h
=
X
T
a
, making it a linear combination of the input patterns(this is sometimes denoted as the “dual representation”).For many problems however, not all data are known inadvance and the solution has to be re-calculated as the newobservations become available. An online algorithm is thenneeded, which in case of linear problems is given by the well-known recursive least-squares (RLS) algorithm [8].
V - 7891-4244-0469-X/06/$20.00 ©2006 IEEE ICASSP 2006
2.2. Kernel Methods
The linear LS methods can be extended to nonlinear versionsby transforming the data into a feature space. Using the trans-formed vector
˜ h
∈
R
M
×
1
and the transformed data matrix
˜ X
∈
R
N
×
M
, the LS problem (2) can be written in featurespace as
J
= min
˜ h
y
−
˜ X ˜ h
2
.
(3)The transformed solution
˜ h
can now also be represented in thebasis de
ﬁ
ned by the rows of the (transformed) data matrix
˜ X
,namely as
˜ h
=
˜ X
T
α
.
(4)Moreover, introducing the
kernel matrix
K
=
˜ X ˜ X
T
the LSproblem in feature space (3) can be rewritten as
J
= min
α
y
−
K
α
2
(5)in which the solution
α
is an
N
×
1
vector. The advantageof writing the nonlinear LS problem in the dual notation isthat thanks to the “kernel trick”, we only need to compute
K
,which is done as
K
(
i,j
) =
κ
(
X
i
,
X
j
)
,
(6)where
X
i
and
X
j
are the
i
-th and
j
-th rows of
X
. As a con-sequence the computational complexity of operating in thishigh-dimensional space is not necessarily larger than that of working in the srcinal low-dimensional space.
2.3. Measures Against Over
ﬁ
tting
For most useful kernel functions, the dimension of the featurespace,
M
, will be much higher than the number of availabledata points
N
(for instance, in case the Gaussian kernel isused the feature space will have dimension
M
=
∞
). Inthese cases, Eq. (5) could have an in
ﬁ
nite number of solu-tions, representing an over
ﬁ
t problem.Various techniques to handle this over
ﬁ
tting have beenpresented. One possible method is to reduce the order of thefeaturespace[6,4,5]. Asecondmethod, usedhere, istoregu-larize the solution. More speci
ﬁ
cally, the norm of the solution
˜ h
is penalized to obtain the following problem:
J
= min
˜ h
y
−
˜ X ˜ h
2
+
c
˜ h
2
(7)
= min
α
y
−
K
α
2
+
c
α
T
K
α
(8)whose solution is given by
α
=
K
−
1
reg
y
(9)with
K
reg
= (
K
+
c
I
)
,
c
a regularization constant and
I
theidentity matrix.
3. THE ONLINE ALGORITHM
In various situations it is preferred to have an online, i.e. re-cursive, version instead of a batch algorithm. In particular,if the data points
y
are the result of a time-varying process,an online algorithm able to track these time variations can bedesigned. In any case, the key feature of an online algorithmis that the number of computations required per new samplemust not increase as the number if samples increases.
3.1. A Sliding-Window Approach
The presented algorithm is a regularized kernel version of theRLS algorithm. An online prediction setup assumes we aregiven a stream of input-output pairs
{
(
x
1
,y
1
)
,
(
x
2
,y
2
)
,...
}
.The sliding-window approach consists in only taking the last
N
pairs of this stream into account. For window
n
, the obser-vation vector
y
n
= [
y
n
,y
n
−
1
,...,y
n
−
N
+1
]
T
and observa-tion matrix
X
n
= [
x
n
,
x
n
−
1
,...,
x
n
−
N
+1
]
T
are formed, andthecorresponding regularized kernel matrix
K
n
=
˜ X
n
˜ X
T n
+
c
I
can be calculated.Notethatitisnecessarytolimitthenumberofdatavectors
x
n
,
N
, for which the kernel matrix is calculated. Contrary tostandard linear RLS, for which the correlation matrices have
ﬁ
xed sizes depending on the (
ﬁ
xed)
dimension of the input vectors
M
, the size of the kernel matrix in an online scenariodepends on the
number of observations
N
.In [6], a kernel RLS algorithm is designed that limits thematrix sizes by means of a sparsi
ﬁ
cation procedure, whichmaps the samples to a (limited) dictionary. It allows both toreduce the order of the feature space (which prevents over
ﬁ
t-ting) and to keep the complexity of the algorithm bounded.In our approach these two measures are obtained by two dif-ferent mechanisms. On one hand, the regularization againstover
ﬁ
tting is done by penalizing the solutions, as in (9). Onthe other hand, the complexity of the algorithm is reducedby considering only the observations in a window with
ﬁ
xedlength. The advantage of the latter approach is that it is able totrack time variations without any extra computational burden.
3.2. Updating the Inverse of the Kernel Matrix
The calculation of the updated solution
α
n
requires the cal-culation of the
N
×
N
inverse matrix
K
−
1
n
for each window.Thisiscostlybothcomputationallyandmemory-wise(requir-ing
O
(
N
3
)
operations). Therefore an update algorithm is de-veloped that can compute
K
−
1
n
solely from knowledge of thedata of the current window
{
X
n
,
y
n
}
and the previous
K
−
1
n
−
1
.The updated solution
α
n
can then be calculated in a straight-forward way using Eq. (9).Given the regularized kernel matrix
K
n
−
1
, the new reg-ularized kernel matrix
K
n
can be constructed by removingthe
ﬁ
rst row and column of
K
n
−
1
, referred to as
ˆK
n
−
1
, andadding kernels of the new data as the last row and column:
K
n
=
ˆK
n
−
1
k
n
−
1
(
x
n
)
k
n
−
1
(
x
n
)
T
k
nn
+
c
(10)
V - 790
Algorithm 1
Summary of the proposed adaptive algorithm.Initialize
K
0
as
(1 +
c
)
I
and
K
−
10
as
I
/
(1 +
c
)
for
n
= 1
,
2
,...
do
Obtain
ˆK
n
−
1
out of
K
n
−
1
Calculate
ˆK
−
1
n
−
1
according to Eq. (12)Obtain
K
n
according to Eq. (10)Calculate
K
−
1
n
according to Eq. (11)Obtain the updated solution
α
n
=
K
−
1
n
y
n
end for
where
k
n
−
1
(
x
n
) = [
κ
(
x
n
−
N
+1
,
x
n
)
,...,κ
(
x
n
−
1
,
x
n
)]
T
and
k
nn
=
κ
(
x
n
,
x
n
)
.Calculating the inverse kernel matrix
K
−
1
n
is done in twosteps, using the two inversion formulas derived in appendicesA.1 and A.2. Note that these formulas do not calculate theinverse matricesexplicitly, but ratherderive them fromknownmatrices maintaining an overall time and memory complexityof
O
(
N
2
)
of the algorithm.First, given
K
n
−
1
and
K
−
1
n
−
1
, the inverse of the
N
−
1
×
N
−
1
matrix
ˆK
n
−
1
is calculated according to Eq. (12). Then
K
−
1
n
can be calculated applying the matrix inversion formulafrom Eq. 11, based on the knowledge of
ˆK
−
1
n
−
1
and
K
n
. Thecomplete algorithm is summarized in Alg. (1).
4. EXAMPLE PROBLEM: IDENTIFICATION OF ANONLINEAR WIENER SYSTEM
In this section, we consider the identi
ﬁ
cation problem for anonlinearWienersystem, andcomparetheperformanceoftheproposed kernel RLS algorithm to the standard approach us-ing a multilayer perceptron (MLP). Since kernel methods pro-vide a natural nonlinear extension of linear regression meth-ods, the proposed system is supposed to perform well com-pared to the MLP [7].
4.1. Experimental Setup
The nonlinear Wiener system is a well-known and simplenonlinear system which consists of a series connection of alinear
ﬁ
lter and a memoryless non-linearity (see Fig. 1). Sucha nonlinear channel can be encountered in digital satellitecommunications [9] and in digital magnetic recording [10].Traditionally, the problem of blind nonlinear equalization oridenti
ﬁ
cationhasbeentackledbyconsideringnonlinearstruc-tures such as MLPs [11], recurrent neural networks [12], orpiecewise linear networks [13].Here we consider a supervised identi
ﬁ
cation problem, inwhich moreover at a given time instant the linear channel co-ef
ﬁ
cients are changed abruptly to compare the tracking capa-bilities of both algorithms: During the
ﬁ
rst part of the sim-ulation, the linear channel is
H
1
(
z
) = 1 + 0
.
0668
z
−
1
−
0
.
4764
−
2
+ 0
.
8070
−
3
and after receiving
500
symbols it ischangedinto
H
2
(
z
) = 1
−
0
.
4326
z
−
1
−
0
.
6656
z
−
2
+0
.
7153
−
3
.A binary signal is sent through this channel and then the non-linear function
y
= tanh(
x
)
is applied on it, where
x
is thes[n]x[n]y[n]v[n]z[n]H(z)f(.)
Fig. 1
. A nonlinear Wiener system.linear channel output. Finally, white Gaussian noise is addedto match an SNR of
20
dB. The Wiener system is then treatedas a black box of which only input and output are known.
4.2. Simulation Results
System identi
ﬁ
cation was
ﬁ
rst performed by an MLP with
8
neurons in its hidden layer and then using the sliding-windowkernel RLS algorithm, for two different window sizes
N
. Forbothmethodsweappliedtime-embeddingtechniquesinwhichthe length
L
of the linear channel was known. More specif-ically, the used MLP was a time-delay MLP with
L
inputs,and the input vectors for the kernel RLS algorithm were time-delayed vectors of length
L
,
s
(
n
) = [
s
(
n
−
L
+1)
,...,
s
(
n
)]
.In each iteration, system identi
ﬁ
cation was performed by es-timating the output sample corresponding to the next inputsample, and comparing it to the actual output. The meansquare error (MSE) for both approaches is shown in Fig. 2.Most noticeable is the fast convergence of the kernel RLSalgorithm: convergence time is of the order of the windowlength.Further note that the structure of the nonlinear system hasnot been exploited while performing identi
ﬁ
cation. Obvi-ously, the presented kernel RLS method can be extended andused as the basis of a more complex algorithm that modelsbetter the known system structure. For instance, the solutionto the nonlinear Wiener identi
ﬁ
cation problem could be foundas the solution to two coupled LS problems, where the
ﬁ
rstone applies a linear kernel on the input data and the secondone applies a nonlinear kernel on the output data. Also, theimplications of not knowing the correct linear channel lengthremain to be studied. We will consider this and other exten-sions as future research lines.
5. CONCLUSIONS
A kernel-based version of the RLS algorithm was presented.Its main features are the introduction of regularization againstover
ﬁ
tting (by penalizing the solutions) and the combinationof a sliding-window approach and ef
ﬁ
cient matrix inversionformulas to keep the complexity of the problem bounded.Thanks to the use of a sliding-window the algorithm is able toprovide tracking in a time-varying environment.First results of this algorithm are promising, and suggest itcan be extended to deal with the nonlinear extensions of mostproblems that are classically solved by linear RLS. Future re-search lines also include its direct application to online kernelcanonical correlation analysis (kernel CCA).
V - 791
0 500 1000 1500
−
25
−
20
−
15
−
10
−
505101520
iterations
M S E ( d B )
MLPkernel RLS, N=75kernel RLS, N=150
Fig. 2
. MSE of the identi
ﬁ
cation of the nonlinear Wiener sys-tem of Fig. 1, for the standard method using an MLP and forthe window-based kernel RLS algorithm with window length
N
= 150
(thick curve) and
N
= 75
(thin curve). A changein
ﬁ
lter coef
ﬁ
cients of the nonlinear Wiener system was in-troduced after
500
iterations. The results were averaged outover
250
Monte-Carlo simulations.
6. REFERENCES
[1] V.N.Vapnik,
TheNatureofStatisticalLearningTheory
,Springer-Verlag New York, Inc., New York, USA, 1995.[2] B. Sch¨olkopf, A. J. Smola, and K.-R. M¨uller, “Non-linear component analysis as a kernel eigenvalue prob-lem,”
Neural Computation
, vol. 10, no. 5, pp. 1299–1319, 1998.[3] S. Mika, G. R¨atsch, J. Weston, B. Sch¨olkopf, and K.-R. M¨uller, “Fisher discriminant analysis with kernels,”in
Proc. NNSP’99
, Y.-H Hu, J. Larsen, E. Wilson, andS. Douglas, Eds. Jan. 1999, pp. 41–48, IEEE.[4] F. R. Bach and M. I. Jordan, “Kernel independent com-ponent analysis,”
Journal of Machine Learning Re-search
, vol. 3, pp. 1–48, 2003.[5] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor,“Canonical correlation analysis: An overview with ap-plication to learning methods,” Technical Report CSD-TR-03-02, RoyalHollowayUniversityofLondon, 2003.[6] Y. Engel, S. Mannor, and R. Meir, “The kernel recursiveleast squares-algorithm,”
IEEE Transactions on SignalProcessing
, vol. 52, no. 8, Aug. 2004.[7] B. Sch¨olkopf and A. J. Smola,
Learning with Kernels
,The MIT Press, Cambridge, MA, 2002.[8] A.H.Sayed,
FundamentalsofAdaptiveFiltering
, Wiley,New York, USA, 2003.[9] G. Kechriotis, E. Zarvas, and E. S. Manolakos, “Usingrecurrent neural networks for adaptive communicationchannelequalization,”
IEEETrans.onNeuralNetworks
,vol. 5, pp. 267–278, Mar 1994.[10] N. P. Sands and J. M. Ciof
ﬁ
, “Nonlinear channel modelsfordigitalmagneticrecording,”
IEEETrans.Magn.
, vol.29, pp. 3996–3998, Nov 1993.[11] D. Erdogmus, D. Rende, J. C. Principe, and T. F. Wong,“Nonlinear channel equalization using multilayer per-ceptrons with information-theoric criterion,” in
Proc. IEEE Workshop on Neural Networks and Signal Pro-cessing XI
, Falmouth, MA, Sept 2001, pp. 401–451.[12] T. Adali and X. Liu, “Canonical piecewise linear net-work for nonlinear
ﬁ
ltering and its application to blindequalization,”
Signal Process.
, vol. 61, no. 2, pp. 145–155, Sept 1997.[13] P. W. Holland and R. E. Welch, “Robust regresison us-ing iterative reweighted least squares,”
Commun. Statist.Theory Methods
, vol. A. 6, no. 9, pp. 813–827, 1997.
A. MATRIX INVERSION FORMULASA.1. Adding a row and a column
To a given non-singular matrix
A
a row and column are addedas shown below, resulting in matrix
K
. The inverse matrix
K
−
1
can then be expressed in terms of the known elementsand
A
−
1
as follows:
K
=
A bb
T
d
,
K
−
1
=
E f f
T
g
⇒
⎧⎨⎩
AE
+
bf
T
=
IAf
+
b
g
=
0b
T
f
+
dg
= 1
⇒
K
−
1
=
A
−
1
(
I
+
bb
T
A
−
1
H
g
)
−
A
−
1
b
g
−
(
A
−
1
b
)
T
g g
(11)with
g
= (
d
−
b
T
A
−
1
b
)
−
1
.
A.2. Removing the
ﬁ
rst row and column
From a given non-singular matrix
K
a row and column areremoved as shown below, resulting in matrix
D
. The inversematrix
D
−
1
cantheneasilybeexpressedintermsoftheknownelements of
K
−
1
as follows:
K
=
a
b
T
b D
,
K
−
1
=
e
f
T
f G
⇒
b
e
+
Df
=
0bf
T
+
DG
=
I
⇒
D
−
1
=
G
−
ff
T
/e.
(12)
V - 792

Search

Similar documents

Tags

Related Search

Positive psychology and its application to heA Study About the Nurses Empowerment and Its A reconsideration of the Mezek tholos and itsAC/DC Drives and its applicationTropopause structures and its role to UTLS dyBritish Literature and its Adaptation to FilmDigital Signal Processing and Its ApplicationFuzzy Logic and Its ApplicationProcedural Justice and Its Failure to Impart Constructivism and its application

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks