A novel family of nonparametric cumulative baseddivergences for point processes
Sohan Seth
University of Florida
Il “Memming” Park
University of Texas at Austin
Austin J. Brockmeier
University of Florida
Mulugeta Semework
SUNY Downstate Medical Center
John Choi, Joseph T. Francis
SUNY Downstate Medical Center & NYUPoly
Jos´e C. Pr´ıncipe
University of Florida
Abstract
Hypothesis testing on point processes has several applications such as model ﬁtting, plasticity detection, and nonstationarity detection. Standard tools for hypothesis testing include tests on mean ﬁring rate and time varying rate function.However, these statistics do not fully describe a point process, and therefore, theconclusions drawn by these tests can be misleading. In this paper, we introducea family of nonparametric divergence measures for hypothesis testing. A divergence measure compares the full probability structure and, therefore, leads to amore robust test of hypothesis. We extend the traditional Kolmogorov–Smirnovand Cram´er–vonMises tests to the space of spike trains via stratiﬁcation, andshow that these statistics can be consistently estimated from data without any freeparameter. We demonstrate an application of the proposed divergences as a costfunction to ﬁnd optimally matched point processes.
1 Introduction
Neurons communicate mostly through noisy
sequences of action potentials
, also known as
spiketrains
. A
point process
captures the stochastic properties of such sequences of events [1]. Many
neuroscience
problems such as model ﬁtting (goodnessofﬁt), plasticity detection, change pointdetection, nonstationarity detection, and neural code analysis can be formulated as statistical inference on point processes [2, 3]. To avoid the complication of dealing with spike train observations,neuroscientists often use summarizing statistics such as mean ﬁring rate to compare two point processes. However, this approach implicitly assumes a model for the underlying point process, andtherefore, the choice of the summarizing statistic fundamentally restricts the validity of the inferenceprocedure.One alternative to mean ﬁring rate is to use the distance between the inhomogeneous rate functions,i.e.

λ
1
(
t
)
−
λ
2
(
t
)

d
t
, as a test statistic, which is sensitive to the temporal ﬂuctuation of themeans of the point processes. In general the rate function does not fully specify a point process,and therefore, ambiguity occurs when two distinct point processes have the same rate function.Although physiologically meaningful change is often accompanied by the change in rate, there hasbeen evidence that the higher order statistics can change without a corresponding change of rate [4,5]. Therefore, statistical tools that capture higher order statistics, such as
divergences
, can improvethe stateoftheart hypothesis testing framework for spike train observations, and may encouragenew scientiﬁc discoveries.1
In this paper, we present a novel family of divergence measures between two point processes. Unlike ﬁring rate function based measures, a divergence measure is zero
if and only if
the two pointprocesses are identical. Applying a divergence measure for hypothesis testing is, therefore, moreappropriate in a statistical sense. We show that the proposed measures can be estimated fromdata without any assumption on the underlying probability structure. However, a distributionfree(nonparametric) approach often suffers from having free parameters, e.g. choice of kernel in nonparametric density estimation, and these free parameters often need to be chosen using computationally expensive methods such as cross validation [6]. We show that the proposed measures canbe consistently estimated in a
parameter free
manner, making them particularly useful in practice.One of the difﬁculties of dealing with continuoustime point process is the lack of well structuredspace on which the corresponding probability laws can be described. In this paper we follow a ratherunconventional approach for describing the point process by a direct sum of Euclidean spaces of varying dimensionality, and show that the proposed divergence measures can be expressed in termsof cumulative distribution functions (CDFs) in these disjoint spaces. To be speciﬁc, we representthe point process by the probability of having a ﬁnite number of spikes and the probability of spiketimes given that number of spikes, and since these time values are reals, we can represent them ina Euclidean space using a CDF. We follow this particular approach since, ﬁrst, CDFs can be easilyestimated consistently using empirical CDFs without any free parameter, and second, standard testson CDFs such as Kolmogorov–Smirnov (KS) test [7] and Cram´er–vonMises (CM) test [8] arewell studied in the literature. Our work extends the conventional KS test and CM test on the realline to the space of spike trains.The rest of the paper is organized as follows; in section 2 we introduce the measure space wherethe point process is deﬁned as probability measures, in section 3 and section 4 we introduce theextended KS and CM divergences, and derive their respective estimators. Here we also prove theconsistency of the proposed estimators. In section 5, we compare various point process statistics ina hypothesis testing framework. In section 6 we show an application of the proposed measures inselecting the optimal stimulus parameter. In section 7, we conclude the paper with some relevantdiscussion and future work guidelines.
2 Basic point process
We deﬁne a point process to be a probability measure over all possible spike trains. Let
Ω
be theset of all ﬁnite spike trains, that is, each
ω
∈
Ω
can be represented by a ﬁnite set of action potentialtimings
ω
=
{
t
1
≤
t
2
≤
...
≤
t
n
} ∈
R
n
where
n
is the number of spikes. Let
Ω
0
,
Ω
1
,
···
denotethe partitions of
Ω
such that
Ω
n
contains all possible spike trains with exactly
n
events (spikes),hence
Ω
n
=
R
n
. Note that
Ω =
∞
n
=0
Ω
n
is a disjoint union, and that
Ω
0
has only one elementrepresenting the empty spike train (no action potential). See Figure 1 for an illustration.Deﬁne a
σ
algebra on
Ω
by the
σ
algebra generated by the union of Borel sets deﬁned on the Euclidean spaces;
F
=
σ
(
∞
n
=0
B
(Ω
n
))
. Note that any measurable set
A
∈ F
can be partitionedinto
{
A
n
=
A
∩
Ω
n
}
∞
n
=0
, such that each
A
n
is measurable in corresponding measurable space
(Ω
n
,
B
(Ω
n
))
. Here
A
denotes a collection of spike trains involving varying number of action potentials and corresponding action potential timings, whereas
A
n
denotes a subset of these spiketrains involving only
n
action potentials each.A (ﬁnite) point process is deﬁned as a probability measure
P
on the measurable space
(Ω
,
F
)
[1].Let
P
and
Q
be two probability measures on
(Ω
,
F
)
, then we are interested in ﬁnding the divergence
d
(
P,Q
)
between
P
and
Q
, where a divergence measure is characterized by
d
(
P,Q
)
≥
0
and
d
(
P,Q
) = 0
⇐⇒
P
=
Q
.
3 Extended KS divergence
A KolmogorovSmirnov (KS) type divergence between
P
and
Q
can be derived from the
L
1
distance between the probability measures, following the equivalent representation,
d
1
(
P,Q
) =
Ω
d

P
−
Q
≥
sup
A
∈F

P
(
A
)
−
Q
(
A
)

.
(1)2
54320
timeInhomogeneous Poisson Firing
68
Figure 1: (Left) Illustration of how the point process space is stratiﬁed. (Right) Example of spiketrains stratiﬁed by their respective spike count.Since (1) is difﬁcult and perhaps impossible to estimate directly without a model, our strategy is touse the stratiﬁed spaces
(Ω
0
,
Ω
1
,...
)
deﬁned in the previous section, and take the supremum only inthe corresponding conditioned probability measures. Let
F
i
=
F ∩
Ω
i
:=
{
F
∩
Ω
i

F
∈F}
. Since
∪
i
F
i
⊂F
,
d
1
(
P,Q
)
≥
n
∈
N
sup
A
∈F
n

P
(
A
)
−
Q
(
A
)

=
n
∈
N
sup
A
∈F
n

P
(Ω
n
)
P
(
A

Ω
n
)
−
Q
(Ω
n
)
Q
(
A

Ω
n
)

.
Since each
Ω
n
is a Euclidean space, we can induce the traditional KS test statistic by further reducing the search space to
˜
F
n
=
{×
i
(
−∞
,t
i
]

t
= (
t
1
,...,t
n
)
∈
R
n
}
. This results in the followinginequality,
sup
A
∈F
n

P
(
A
)
−
Q
(
A
)
≥
sup
A
∈
˜
F
n

P
(
A
)
−
Q
(
A
)

= sup
t
∈
R
n
F
(
n
)
P
(
t
)
−
F
(
n
)
Q
(
t
)
,
(2)where
F
(
n
)
P
(
t
) =
P
[
T
1
≤
t
1
∧
...
∧
T
n
≤
t
n
]
is the cumulative distribution function (CDF)corresponding to the probability measure
P
in
Ω
n
. Hence, we deﬁne the KS divergence as
d
KS
(
P,Q
) =
n
∈
N
sup
t
∈
R
n
P
(Ω
n
)
F
(
n
)
P
(
t
)
−
Q
(Ω
n
)
F
(
n
)
Q
(
t
)
.
(3)Given a ﬁnite number of samples
X
=
{
x
i
}
N
P
i
=1
and
Y
=
{
y
j
}
N
Q
j
=1
from
P
and
Q
respectively, wehave the following estimator for equation (3).
ˆ
d
KS
(
P,Q
) =
n
∈
N
sup
t
∈
R
n
ˆ
P
(Ω
n
) ˆ
F
(
n
)
P
(
t
)
−
ˆ
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
t
)
=
n
∈
N
sup
t
∈
X
n
∪
Y
n
ˆ
P
(Ω
n
) ˆ
F
(
n
)
P
(
t
)
−
ˆ
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
t
)
,
(4)where
X
n
=
X
∩
Ω
n
, and
ˆ
P
and
ˆ
F
P
are the empirical probability and empirical CDF, respectively.Notice that we only search the supremum over the locations of the realizations
X
n
∪
Y
n
and notthe whole
R
n
, since the empirical CDF difference
ˆ
P
(Ω
n
) ˆ
F
(
n
)
P
(
t
)
−
ˆ
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
t
)
only changesvalues at those locations.
Theorem 1
(
d
KS
is a divergence)
.
d
1
(
P,Q
)
≥
d
KS
(
P,Q
)
≥
0
(5)
d
KS
(
P,Q
) = 0
⇐⇒
P
=
Q
(6)3
Proof.
The ﬁrst property and the
⇐
proof for the second property are trivial. From the deﬁnitionof
d
KS
and properties of CDF,
d
KS
(
P,Q
) = 0
implies that
P
(Ω
n
) =
Q
(Ω
n
)
and
F
(
n
)
P
=
F
(
n
)
Q
for all
n
∈
N
. Given probability measures for each
(Ω
n
,
F
n
)
denoted as
P
n
and
Q
n
, there existcorresponding unique extended measures
P
and
Q
for
(Ω
,
F
)
such that their restrictions to
(Ω
n
,
F
n
)
coincide with
P
n
and
Q
n
, hence
P
=
Q
.
Theorem 2
(Consistency of KS divergence estimator)
.
As the sample size approaches inﬁnity,
d
KS
−
ˆ
d
KS
a.u.
−−→
0
(7)
Proof.
Note that

sup
·−
sup
· ≤

sup
·−
sup
·
. Due to the triangle inequality of thesupremum norm,
sup
t
∈
R
n
P
(Ω
n
)
F
(
n
)
P
(
t
)
−
Q
(Ω
n
)
F
(
n
)
Q
(
t
)
−
sup
t
∈
R
n
ˆ
P
(Ω
n
) ˆ
F
(
n
)
P
(
t
)
−
ˆ
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
t
)
≤
sup
t
∈
R
n
P
(Ω
n
)
F
(
n
)
P
(
t
)
−
Q
(Ω
n
)
F
(
n
)
Q
(
t
)
−
ˆ
P
(Ω
n
) ˆ
F
(
n
)
P
(
t
)
−
ˆ
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
t
)
.
Again, using the triangle inequality we can show the following:
P
(Ω
n
)
F
(
n
)
P
(
t
)
−
Q
(Ω
n
)
F
(
n
)
Q
(
t
)
−
ˆ
P
(Ω
n
) ˆ
F
(
n
)
P
(
t
)
−
ˆ
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
t
)
≤
P
(Ω
n
)
F
(
n
)
P
(
t
)
−
Q
(Ω
n
)
F
(
n
)
Q
(
t
)
−
ˆ
P
(Ω
n
) ˆ
F
(
n
)
P
(
t
) + ˆ
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
t
)
=
P
(Ω
n
)
F
(
n
)
P
(
t
)
−
P
(Ω
n
) ˆ
F
(
n
)
P
(
t
)
−
Q
(Ω
n
)
F
(
n
)
Q
(
t
) +
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
t
)+
P
(Ω
n
) ˆ
F
(
n
)
P
(
t
)
−
ˆ
P
(Ω
n
) ˆ
F
(
n
)
P
(
t
) + ˆ
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
t
)
−
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
t
)
≤
P
(Ω
n
)
F
(
n
)
P
(
t
)
−
ˆ
F
(
n
)
P
(
t
)
+
Q
(Ω
n
)
F
(
n
)
Q
(
t
)
−
ˆ
F
(
n
)
Q
(
t
)
+ ˆ
F
(
n
)
P
(
t
)
P
(Ω
n
)
−
ˆ
P
(Ω
n
)
+ ˆ
F
(
n
)
Q
(
t
)
Q
(Ω
n
)
−
ˆ
Q
(Ω
n
)
.
Then the theorem follows from the GlivenkoCantelli theorem, and
ˆ
P,
ˆ
Q
a.s.
−−→
P,Q
.Notice that the inequality in (2) can be made stricter by considering the supremum over not just theproduct of the segments
(
−∞
,t
i
]
but over the all
2
n
−
1
possible products of the segments
(
−∞
,t
i
]
and
[
t
i
,
∞
)
in
n
dimensions [7]. However, the latter approach is computationally more expensive,and therefore, in this paper we only explore the former approach.
4 Extended CM divergence
We can extend equation (3) to derive a Cram´er–vonMises (CM) type divergence for point processes. Let
µ
=
P
+
Q/
2
, then
P,Q
are absolutely continuous with respect to
µ
. Note that,
F
(
n
)
P
,F
(
n
)
Q
∈
L
2
(Ω
n
,µ

n
)
where

n
denotes the restriction on
Ω
n
, i.e. the CDFs are
L
2
integrable,since they are bounded. Analogous to the relation between KS test and CM test, we would like touse the integrated squared deviation statistics in place of the maximal deviation statistic. By integrating over the probability measure
µ
instead of the supremum operation, and using
L
2
instead of
L
∞
distance, we deﬁne
d
CM
(
P,Q
) =
n
∈
N
R
n
P
(Ω
n
)
F
(
n
)
P
(
t
)
−
Q
(Ω
n
)
F
(
n
)
Q
(
t
)
2
d
µ

n
(
t
)
.
(8)This can be seen as a direct extension of the CM criterion. The corresponding estimator can bederived using the strong law of large numbers,
ˆ
d
CM
(
P,Q
) =
n
∈
N
12
i
ˆ
P
(Ω
n
) ˆ
F
(
n
)
P
(
x
(
n
)
i
)
−
ˆ
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
x
(
n
)
i
)
2
+ 12
i
ˆ
P
(Ω
n
) ˆ
F
(
n
)
P
(
y
(
n
)
i
)
−
ˆ
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
y
(
n
)
i
)
2
.
(9)4
Theorem 3
(
d
CM
is a divergence)
.
For
P
and
Q
with square integrable CDFs,
d
CM
(
P,Q
)
≥
0
(10)
d
CM
(
P,Q
) = 0
⇐⇒
P
=
Q.
(11)
Proof.
Similar to theorem 1.
Theorem 4
(Consistency of CM divergence estimator)
.
As the sample size approaches inﬁnity,
d
CM
−
ˆ
d
CM
a.u.
−−→
0
(12)
Proof.
Similar to (7), we ﬁnd an upper bound and show that the bound uniformly converges tozero. To simplify the notation, we deﬁne
g
n
(
x
) =
P
(Ω
n
)
F
(
n
)
P
(
x
)
−
Q
(Ω
n
)
F
(
n
)
Q
(
x
)
, and
ˆ
g
n
(
x
) =ˆ
P
(Ω
n
) ˆ
F
(
n
)
P
(
x
(
n
)
)
−
ˆ
Q
(Ω
n
) ˆ
F
(
n
)
Q
(
x
(
n
)
)
. Note that
ˆ
g
na.u.
−−→
g
by the GlivenkoCantelli theorem and
ˆ
P
a.s.
−−→
P
by the strong law of large numbers.
d
CM
−
ˆ
d
CM
=12
n
∈
N
g
2
n
d
P

n
+
n
∈
N
g
2
n
d
Q

n
−
n
∈
N
i
ˆ
g
n
(
x
i
)
2
−
n
∈
N
i
ˆ
g
n
(
y
i
)
2
=
n
∈
N
g
2
n
d
P

n
−
ˆ
g
2
n
d ˆ
P

n
+
g
2
n
d
Q

n
−
ˆ
g
2
n
d ˆ
Q

n
≤
n
∈
N
g
2
n
d
P

n
−
ˆ
g
2
n
d ˆ
P

n
+
g
2
n
d
Q

n
−
ˆ
g
2
n
d ˆ
Q

n
where
ˆ
P
=
i
δ
(
x
i
)
and
ˆ
Q
=
i
δ
(
y
i
)
are the corresponding empirical measures. Without lossof generality, we only ﬁnd the bound on
g
2
n
d
P

n
−
ˆ
g
2
n
d ˆ
P

n
, then the rest is bounded similarlyfor
Q
.
g
2
n
d
P

n
−
ˆ
g
2
n
d ˆ
P

n
=
g
2
n
d
P

n
−
ˆ
g
2
n
d
P

n
+
ˆ
g
2
n
d
P

n
−
ˆ
g
2
n
d ˆ
P

n
≤
g
2
n
−
ˆ
g
2
n
d
P

n
−
ˆ
g
2
n
d
P

n
−
ˆ
P

n
Applying GlivenkoCantelli theorem and strong law of large numbers, these two terms convergessince
ˆ
g
2
n
is bounded. Hence, we show that the CM test estimator is consistent.
5 Results
We present a set of twosample problems and apply various statistics to perform hypothesis testing. As a baseline measure, we consider the widely used Wilcoxon ranksum test (or equivalently, the MannWhitney U test) on the count distribution (e.g. [9]), which is a nonparametricmedian test for the total number of action potentials, and the integrated squared deviation statistic
λ
L
2
=
(
λ
1
(
t
)
−
λ
2
(
t
))
2
d
t
, where
λ
(
t
)
is estimated by smoothing spike timing with a Gaussiankernel, evaluated at a uniform grid at least an order of magnitude smaller than the standard deviationof the kernel. We report the performance of the test with varying kernel sizes.All tests are quantiﬁed by the power of the test given a signiﬁcance threshold (typeI error) at
0
.
05
.The null hypothesis distribution is empirically computed by either generating independent samplesor by permuting the data to create at least 1000 values.
5.1 Stationary renewal processes
Renewal process is a widely used point process model that compensates the deviation from Poissonprocess[10]. Weconsidertwostationaryrenewalprocesseswithgammaintervaldistributions. Sincethe mean rate of the two processes are the same, the rate function statistic and Wilcoxon test does5