A Functional DensityBased NonparametricApproach for Statistical Calibration
Noslen Hern´andez
1
, Rolando J. Biscay
2
,
3
,Nathalie VillaVialaneix
4
,
5
, and Isneri Talavera
1
1
Advanced Technology Application Centre, CENATAV  Cuba
2
Institute of Mathematics, Physics and Cybernetics  Cuba
3
Departamento de Estad´ıstica de la Universisad de Valpara´ıso, CIMFAV  Chile
4
Institut de Math´ematiques de Toulouse, Universit´e de Toulouse  France
5
IUT de Perpignan, D´epartement STID, Carcassonne  France
Abstract.
In this paper a new nonparametric functional method is introduced for predicting a scalar random variable
Y
from a functionalrandom variable
X
. The resulting prediction has the form of a weightedaverage of the training data set, where the weights are determined bythe conditional probability density of
X
given
Y
, which is assumed to beGaussian. In this way such a conditional probability density is incorporated as a key information into the estimator. Contrary to some previousapproaches, no assumption about the dimensionality of
E
(
X

Y
=
y
) isrequired. The new proposal is computationally simple and easy to implement. Its performance is shown through its application to both simulatedand real data.
1 Introduction
The fast development of instrumental analysis equipment and modern measurement devices provides huge amounts of data as highresolution digitized functions. As a consequence, Functional Data Analysis (FDA) has become a growingresearch ﬁeld. In the FDA setting, each individual is treated as a single entity described by a continuous realvalued function rather than by a ﬁnitedimensionalvector: functional data (FD) are then supposed to have values in an inﬁnitedimensional space, often particularized as a Hilbert space.An extensive review of the methods developed for FD can be found in themonograph of Ramsay and Silverman[1]. In the case of functional regression,where one intends to estimate a random scalar variable
Y
from a functionalvariable
X
taking values in a functional space
X
, earlier works were focused onlinear methods such as the functional linear model with scalar response[2–8] or
the functional Partial Least Squares[9]. More recently, the problem has also beenaddressed nonparametrically with smoothing kernel estimates [10], multilayerperceptrons[11], and support vector regression[12,13]. Another point of view
between these two approaches is to use a semiparametric approach, such as theSIR (Sliced Inverse Regression [14]) that has been extended to functional data(FIR) in[15–17]. In this approach, the functional regression problem is addressed
I. Bloch and R.M. Cesar, Jr. (Eds.): CIARP 2010, LNCS 6419, pp. 450–457,2010.c
SpringerVerlag Berlin Heidelberg 2010
A Functional DensityBased Nonparametric Approach 451
through the opposite regression problem i.e., the estimation of
E
(
X

Y
=
y
), byassuming that this quantity belongs to a ﬁnite dimensional subspace of
X
.In this paper, a new functional regression method to estimate
γ
(
X
) =
E
(
Y

X
)is introduced that also relies on regarding the inverse regression model
X
=
F
(
Y
) +
e
. Its main practical motivation arises from calibration problems inChemometrics, speciﬁcally in spectroscopy, where some chemical variable
Y
(e.g., concentration) needs to be predicted from a digitized function
X
(e.g.,an spectrum). In this setting, said ‘’inverse” model represents the physical datageneration process in which the output spectrum
X
is determined by the inputchemical concentration
Y
, and
e
is a functional random perturbation mainly dueto the measurement procedure. The speciﬁc form of the conditional density of
X
given
Y
, which is assumed to be Gaussian, is incorporated as a key informationinto the estimator. This regression estimate, will be refereed to as functionalDensityBased Nonparametric Regression (DBNR). Unlike the FIR approach,few assumptions are required: in particular,
γ
does not need to be a function of a ﬁnite number of projections nor
X
has to follow an elliptical distribution (orany other given distribution). DBNR is computationally very easy to use.This paper is organized as follows. Section2presents the functional DensityBased Nonparametric Regression method. Sections3and4illustrate the use of
this approach in simulated and real data. Conclusions are given in Section5.
2 Functional DensityBased Nonparametric Regression
2.1 Deﬁnition of DBNR in a General Setting
Let (
X,Y
) be a pair of random variables taking values in
X×
R
where (
X
,
.,.
)is a Hilbert space. Suppose also that
n
i.i.d. realizations of (
X,Y
) are given,denoted by (
x
i
,y
i
)
i
=1
,...,n
. The goal is to build, from (
x
i
,y
i
)
i
, a way to predicta new value for
Y
from a given (observed) value of
X
. This problem is usuallyaddressed by the estimation of the regression function
γ
(
x
) =
E
(
Y

X
=
x
).The functional densitybased nonparametric regression implicitly supposesthat the inverse model makes sense; this inverse model is:
X
=
F
(
Y
) +
(1)where
is a random process (perturbation or noise) with zero mean, independentof
Y
, and
y
→
F
(
y
) is a function from
R
into
X
. As was stated in Section1,this is a common background for calibration problems, amongs others.Additionally, the following assumptions are made: ﬁrst, it exists a probabilitymeasure
P
0
on
X
(not depending on
y
) such that the conditional probability measure of
X
given
Y
=
y
, say
P
(
·
y
), has a density
f
(
·
y
) with respect to
P
0
:
P
(
A
y
) =
A
f
(
x
y
)
P
0
(
dx
)for any measurable set
A
in
X
. Furthermore, it is assumed that
Y
is a continuousrandom variable, i.e., that its distribution has a density
f
Y
(
y
) (with respect tothe Lebesgue measure on
R
).
452 N. Hern´andez et al.
Under these assumptions, the regression function is:
γ
(
x
) =
R
f
(
x
y
)
f
Y
(
y
)
ydyf
X
(
x
)
,
where
f
X
(
x
) =
R
f
(
x
y
)
f
Y
(
y
)
dy.
Hence, given an estimate
f
(
x
y
) of
f
(
x
y
), the following estimate of
γ
(
x
) canbe constructed from the previous equation:
γ
(
x
) =
ni
=1
f
(
x
y
i
)
y
i
f
X
(
x
)
,
where
f
X
(
x
) =
n
i
=1
f
(
x
y
i
)
.
(2)
2.2 Speciﬁcation in the Gaussian Case
The general estimation scheme given in Equation (2) will be here speciﬁed forthe case in which
P
(
·
y
) is a Gaussian measure on
X
=
L
2
[0
,
1] for each
y
∈
R
.
P
(
·
y
) is then supposed to have a mean function
μ
(
·
y
)
∈ X
(which is thenequal to
F
(
y
)(
·
) according to Equation (1)) and a covariance operator
r
(notdepending on
y
), which is a HilbertSchmidt operator on the space
X
. Then,there exists an eigenvalue decomposition of
r
, (
ϕ
j
,λ
j
)
j
≥
1
such that (
λ
j
)
j
isa decreasing series of positive real numbers, (
ϕ
j
)
j
take values in
X
and
r
=
j
λ
j
ϕ
j
⊗
ϕ
j
where
ϕ
j
⊗
ϕ
j
(
h
) =
ϕ
j
,h
ϕ
j
for any
h
∈X
.Denote by
P
0
the Gaussian measure on
X
with zero mean and covarianceoperator
r
. Assume the following usual regularity condition holds: for each
y
∈
R
,
∞
j
=1
μ
2
j
(
y
)
λ
j
<
∞
,
with
μ
j
(
y
) =
μ
(
·
y
)
,ϕ
j
.
Then,
P
(
·
y
) and
P
0
are equivalent Gaussian measures, and the density
f
(
·
y
)has the explicit form:
f
(
x
y
) = exp
⎧⎨⎩
∞
j
=1
μ
j
(
y
)
λ
j
x
j
−
μ
j
(
y
)2
⎫⎬⎭
,
where
x
j
=
x,ϕ
j
for all
j
≥
1. This leads to the following estimation schemefor
f
(
x
y
):1. Obtain an estimate
μ
(
·
y
) of
t
→
μ
(
t
y
) for all
y
∈
R
. This may be carriedout trough any standard nonparametric regression from
R
to
R
, based onthe learning set (
y
i
,x
i
(
t
))
i
=1
,...,n
; e.g., a smoothing kernel method.2. Obtain estimates (
ϕ
j
,
λ
j
)
j
of the eigenfunctions and eigenvalues (
ϕ
j
,
λ
j
)
j
of the covariance
r
on the basis of the empirical covariance of the residuals
x
i
−
μ
(
·
y
i
),
i
= 1,...,
n
. Only the ﬁrst
p
eigenvalues and eigenfunctions areestimated, where
p
=
p
(
n
) is a given integer, smaller than
n
.3. Estimate
f
(
x
y
) by
f
(
x
y
) = exp
⎧⎨⎩
p
j
=1
μ
j
(
y
)
λ
j
x
j
−
μ
j
(
y
)2
⎫⎬⎭
(3)where
μ
j
(
y
) =
μ
(
·
y
)
,
ϕ
j
and
x
j
=
x,
ϕ
j
.
A Functional DensityBased Nonparametric Approach 453
Finally, substituting (3) into (2) leads to an estimate
γ
(
x
) of
γ
(
x
). Undersome technical assumptions the consistency of the DBNR method can be proved:lim
n
→∞
ˆ
γ
(
x
) =
P
γ
(
x
).
3 A Simulation Study
The feasibility and the performance of the introduced nonparametric functionalregression method are ﬁrst explored through a simulation study. For comparison,results obtained by the functional NadarayaWatson kernel (NWK) estimator[10] are also shown.
3.1 Data Generation
The data were simulated in the following way: values for the real random variable,
Y
, were drawn from a uniform distribution in the interval [0
,
10]. Then,
X
wasgenerated by 4 diﬀerent models or settings:
M1
X
=
Y e
1
+ 2
Y e
2
+ 3
Y e
5
+ 4
Y e
10
+
M2
X
= (exp(
Y
)
/
exp(10))
e
1
+ (
Y
2
/
100)
e
2
+ (
Y
3
/
1000)
e
5
+ log(
Y
+ 1)
e
10
+
M3
X
= sin(
Y
)
e
1
+ log(
Y
+ 1)
e
5
+
M4
X
=
α
exp
Y
10
e
1
+
where (
e
i
)
i
≥
1
is the trigonometric basis of
X
=
L
2
([0
,
1]) (i.e.,
e
2
k
−
1
=
√
2cos(2
πkt
), and
e
2
k
=
√
2sin(2
πkt
)), and
a Gaussian process independent of
Y
with zero mean and covariance operator
Γ
e
=
j
≥
11
j
e
j
⊗
e
j
. More precisely,
was simulated by using a truncation of
Γ
e
,
Γ
e
(
s,t
)
qj
=11
j
e
j
(
t
)
e
j
(
s
) with
q
= 500.A sample of size
n
L
= 300 was simulated for training and a sample of size
n
T
= 200 for testing. Figure1gives examples of
X
obtained for model
M3
forthree diﬀerent values of
y
and of the underlying (non noisy) function,
F
(
y
)(
·
). Inthis example, the simulated data have a high level of noise so that the regressionestimation is a rather hard statistical task.
3.2 Simulation Results
To apply the DBNR method, the discretized functions
X
were approximated bya continuous function using a functional basis expansion. Speciﬁcally, the datawere approximated using 128 Bspline basis functions of order 4, as it is shownin Figure1. The conditional mean
μ
(
·
/y
) was estimated by a kernel smoothingin which the bandwidth parameter
h
was selected by 10fold crossvalidationminimizing the mean squared error (MSE) criterion. A similar procedure wasused to select the parameter
p
(number of eigenvalues and eigenfunctions usedin (3)).Finally, DBNR performance was compared with those obtained by the functional NWK estimate with two kinds of metrics for the kernel: the usual
L
2
normand the PCA based semimetric norm (see [10] for further details about these
454 N. Hern´andez et al.
00.20.40.60.81−6−4−2024
f u n c t i o n s
sample: 44 (Y = 3.7948)
00.20.40.60.81−6−4−2024
f u n c t i o n s
sample: 248 (Y = 1.9879)
00.20.40.60.81−6−4−20246
f u n c t i o n s
sample: 41 (Y = 8.3812)argument (t)argument (t)argument (t)
Fig.1.
True function,
F
(
y
)(
·
) (smooth continuous line), simulated data,
X
, (gray roughline) and approximation of
X
using Bsplines (rough black line) in
M3
for three different values of
y
Table 1.
RMSE for all the methods and all generating modelsModel DBNR NWK
(PCA)
NWK
(
L
2)
M1
0.08 0.10 0.09
M2
1.47 1.60 1.77
M3
1.79 1.79 2.00
M4
0.94 2.16 1.91
methods). The resulting root mean squared errors (RMSE) are presented in Table1. The results show that DBNR is a good alternative to common NWKmethods. Indeed, DBNR outperforms NWK methods in all the the cases considered in this simulation study that includes both linear (
M1
) and nonlinear(
M2
−
M4
) models.Figures2and3show how the method performs for each step of the estimation
scheme (described in Section2.2) for the model
M3
. In particular, Figure2givesthe result of the ﬁrst step by displaying the true value and the estimate of
F
(
y
)(
·
)for various values of
y
(top) and the true value and the estimate of
F
(
·
)(
t
) forvarious values of
t
(bottom). The results are very satisfactory given the fact thatthe data have a high level of noise (which is stressed on in the bottom of theﬁgure): a minor estimation problem appears at the boundaries of
F
(
·
)(
t
), whichis a known drawback of the kernel smoothing method. Also, those estimates aresmoother than the estimates of
F
(
y
)(
·
): this can be explained by the fact thatthe kernel estimator is used regarding
y
and not regarding
t
, but this aspect canbe improved in the future.Figure3shows the results of the steps 23 of the estimation scheme: theestimated eigendecomposition of
r
is compared to the true one and ﬁnally, thepredicted value for
Y
are compared to the true ones, both on training and testsets. The estimation of the eigendecomposition is, once again very satisfactorygiven the high level of noise, and the comparison between training and test setsshow that the method does not overﬁt the data.