Business

A novel family of non-parametric cumulative based divergences for point processes

Description
A novel family of non-parametric cumulative based divergences for point processes
Categories
Published
of 9
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A novel family of non-parametric cumulative baseddivergences for point processes Sohan Seth University of Florida Il “Memming” Park University of Texas at Austin Austin J. Brockmeier University of Florida Mulugeta Semework SUNY Downstate Medical Center John Choi, Joseph T. Francis SUNY Downstate Medical Center & NYU-Poly Jos´e C. Pr´ıncipe University of Florida Abstract Hypothesis testing on point processes has several applications such as model fit-ting, plasticity detection, and non-stationarity detection. Standard tools for hy-pothesis testing include tests on mean firing rate and time varying rate function.However, these statistics do not fully describe a point process, and therefore, theconclusions drawn by these tests can be misleading. In this paper, we introducea family of non-parametric divergence measures for hypothesis testing. A diver-gence measure compares the full probability structure and, therefore, leads to amore robust test of hypothesis. We extend the traditional Kolmogorov–Smirnovand Cram´er–von-Mises tests to the space of spike trains via stratification, andshow that these statistics can be consistently estimated from data without any freeparameter. We demonstrate an application of the proposed divergences as a costfunction to find optimally matched point processes. 1 Introduction Neurons communicate mostly through noisy  sequences of action potentials , also known as  spiketrains . A  point process  captures the stochastic properties of such sequences of events [1]. Many neuroscience  problems such as model fitting (goodness-of-fit), plasticity detection, change pointdetection, non-stationarity detection, and neural code analysis can be formulated as statistical infer-ence on point processes [2, 3]. To avoid the complication of dealing with spike train observations,neuroscientists often use summarizing statistics such as mean firing rate to compare two point pro-cesses. However, this approach implicitly assumes a model for the underlying point process, andtherefore, the choice of the summarizing statistic fundamentally restricts the validity of the inferenceprocedure.One alternative to mean firing rate is to use the distance between the inhomogeneous rate functions,i.e.    | λ 1 ( t ) − λ 2 ( t ) | d t , as a test statistic, which is sensitive to the temporal fluctuation of themeans of the point processes. In general the rate function does not fully specify a point process,and therefore, ambiguity occurs when two distinct point processes have the same rate function.Although physiologically meaningful change is often accompanied by the change in rate, there hasbeen evidence that the higher order statistics can change without a corresponding change of rate [4,5]. Therefore, statistical tools that capture higher order statistics, such as  divergences , can improvethe state-of-the-art hypothesis testing framework for spike train observations, and may encouragenew scientific discoveries.1  In this paper, we present a novel family of divergence measures between two point processes. Un-like firing rate function based measures, a divergence measure is zero  if and only if   the two pointprocesses are identical. Applying a divergence measure for hypothesis testing is, therefore, moreappropriate in a statistical sense. We show that the proposed measures can be estimated fromdata without any assumption on the underlying probability structure. However, a distribution-free(non-parametric) approach often suffers from having free parameters, e.g. choice of kernel in non-parametric density estimation, and these free parameters often need to be chosen using computa-tionally expensive methods such as cross validation [6]. We show that the proposed measures canbe consistently estimated in a  parameter free  manner, making them particularly useful in practice.One of the difficulties of dealing with continuous-time point process is the lack of well structuredspace on which the corresponding probability laws can be described. In this paper we follow a ratherunconventional approach for describing the point process by a direct sum of Euclidean spaces of varying dimensionality, and show that the proposed divergence measures can be expressed in termsof cumulative distribution functions (CDFs) in these disjoint spaces. To be specific, we representthe point process by the probability of having a finite number of spikes and the probability of spiketimes given that number of spikes, and since these time values are reals, we can represent them ina Euclidean space using a CDF. We follow this particular approach since, first, CDFs can be easilyestimated consistently using empirical CDFs without any free parameter, and second, standard testson CDFs such as Kolmogorov–Smirnov (K-S) test [7] and Cram´er–von-Mises (C-M) test [8] arewell studied in the literature. Our work extends the conventional K-S test and C-M test on the realline to the space of spike trains.The rest of the paper is organized as follows; in section 2 we introduce the measure space wherethe point process is defined as probability measures, in section 3 and section 4 we introduce theextended K-S and C-M divergences, and derive their respective estimators. Here we also prove theconsistency of the proposed estimators. In section 5, we compare various point process statistics ina hypothesis testing framework. In section 6 we show an application of the proposed measures inselecting the optimal stimulus parameter. In section 7, we conclude the paper with some relevantdiscussion and future work guidelines. 2 Basic point process We define a point process to be a probability measure over all possible spike trains. Let  Ω  be theset of all finite spike trains, that is, each  ω  ∈  Ω  can be represented by a finite set of action potentialtimings  ω  =  { t 1  ≤  t 2  ≤  ...  ≤  t n } ∈  R n where  n  is the number of spikes. Let  Ω 0 , Ω 1 , ···  denotethe partitions of   Ω  such that  Ω n  contains all possible spike trains with exactly  n  events (spikes),hence  Ω n  =  R n . Note that  Ω =   ∞ n =0  Ω n  is a disjoint union, and that  Ω 0  has only one elementrepresenting the empty spike train (no action potential). See Figure 1 for an illustration.Define a  σ -algebra on  Ω  by the  σ -algebra generated by the union of Borel sets defined on the Eu-clidean spaces;  F   =  σ (  ∞ n =0 B  (Ω n )) . Note that any measurable set  A  ∈ F   can be partitionedinto  { A n  =  A  ∩  Ω n } ∞ n =0 , such that each  A n  is measurable in corresponding measurable space (Ω n , B  (Ω n )) . Here  A  denotes a collection of spike trains involving varying number of action po-tentials and corresponding action potential timings, whereas  A n  denotes a subset of these spiketrains involving only  n  action potentials each.A (finite) point process is defined as a probability measure  P   on the measurable space  (Ω , F  )  [1].Let  P   and  Q  be two probability measures on  (Ω , F  ) , then we are interested in finding the diver-gence  d ( P,Q )  between  P   and  Q , where a divergence measure is characterized by  d ( P,Q )  ≥  0  and d ( P,Q ) = 0  ⇐⇒  P   =  Q . 3 Extended K-S divergence A Kolmogorov-Smirnov (K-S) type divergence between  P   and  Q  can be derived from the  L 1  dis-tance between the probability measures, following the equivalent representation, d 1 ( P,Q ) =   Ω d | P   − Q |≥  sup A ∈F  | P  ( A ) − Q ( A ) | .  (1)2  54320 timeInhomogeneous Poisson Firing 68 Figure 1: (Left) Illustration of how the point process space is stratified. (Right) Example of spiketrains stratified by their respective spike count.Since (1) is difficult and perhaps impossible to estimate directly without a model, our strategy is touse the stratified spaces  (Ω 0 , Ω 1 ,... )  defined in the previous section, and take the supremum only inthe corresponding conditioned probability measures. Let F  i  = F ∩ Ω i  := { F   ∩ Ω i | F   ∈F} . Since ∪ i F  i  ⊂F  , d 1 ( P,Q ) ≥  n ∈ N sup A ∈F  n | P  ( A ) − Q ( A ) | =  n ∈ N sup A ∈F  n | P  (Ω n ) P  ( A | Ω n ) − Q (Ω n ) Q ( A | Ω n ) | . Since each  Ω n  is a Euclidean space, we can induce the traditional K-S test statistic by further reduc-ing the search space to  ˜ F  n  =  {× i ( −∞ ,t i ] | t  = ( t 1 ,...,t n )  ∈  R n } . This results in the followinginequality, sup A ∈F  n | P  ( A ) − Q ( A ) |≥  sup A ∈ ˜ F  n | P  ( A ) − Q ( A ) | = sup t ∈ R n  F  ( n ) P   ( t ) − F  ( n ) Q  ( t )  ,  (2)where  F  ( n ) P   ( t ) =  P  [ T  1  ≤  t 1  ∧  ...  ∧  T  n  ≤  t n ]  is the cumulative distribution function (CDF)corresponding to the probability measure  P   in  Ω n . Hence, we define the K-S divergence as d KS  ( P,Q ) =  n ∈ N sup t ∈ R n  P  (Ω n ) F  ( n ) P   ( t ) − Q (Ω n ) F  ( n ) Q  ( t )  .  (3)Given a finite number of samples  X   =  { x i } N  P  i =1  and  Y   =  { y j } N  Q j =1  from  P   and  Q  respectively, wehave the following estimator for equation (3). ˆ d KS  ( P,Q ) =  n ∈ N sup t ∈ R n  ˆ P  (Ω n ) ˆ F  ( n ) P   ( t ) −  ˆ Q (Ω n ) ˆ F  ( n ) Q  ( t )  =  n ∈ N sup t ∈ X n ∪ Y  n  ˆ P  (Ω n ) ˆ F  ( n ) P   ( t ) −  ˆ Q (Ω n ) ˆ F  ( n ) Q  ( t )  ,  (4)where  X  n  =  X  ∩ Ω n , and  ˆ P   and  ˆ F  P   are the empirical probability and empirical CDF, respectively.Notice that we only search the supremum over the locations of the realizations  X  n  ∪ Y  n  and notthe whole R n , since the empirical CDF difference  ˆ P  (Ω n ) ˆ F  ( n ) P   ( t ) −  ˆ Q (Ω n ) ˆ F  ( n ) Q  ( t )  only changesvalues at those locations. Theorem 1  ( d KS   is a divergence) . d 1 ( P,Q ) ≥ d KS  ( P,Q ) ≥ 0  (5) d KS  ( P,Q ) = 0  ⇐⇒  P   =  Q  (6)3  Proof.  The first property and the  ⇐  proof for the second property are trivial. From the definitionof   d KS   and properties of CDF,  d KS  ( P,Q ) = 0  implies that  P  (Ω n ) =  Q (Ω n )  and  F  ( n ) P   =  F  ( n ) Q for all  n  ∈  N . Given probability measures for each  (Ω n , F  n )  denoted as  P  n  and  Q n , there existcorresponding unique extended measures P   and Q for  (Ω , F  )  such that their restrictions to  (Ω n , F  n ) coincide with  P  n  and  Q n , hence  P   =  Q . Theorem 2  (Consistency of K-S divergence estimator) .  As the sample size approaches infinity,  d KS   −  ˆ d KS   a.u. −−→ 0  (7) Proof.  Note that  |  sup ·−  sup ·| ≤   | sup ·− sup ·| . Due to the triangle inequality of thesupremum norm,  sup t ∈ R n  P  (Ω n ) F  ( n ) P   ( t ) − Q (Ω n ) F  ( n ) Q  ( t )  −  sup t ∈ R n  ˆ P  (Ω n ) ˆ F  ( n ) P   ( t ) −  ˆ Q (Ω n ) ˆ F  ( n ) Q  ( t )  ≤  sup t ∈ R n  P  (Ω n ) F  ( n ) P   ( t ) − Q (Ω n ) F  ( n ) Q  ( t )  −  ˆ P  (Ω n ) ˆ F  ( n ) P   ( t ) −  ˆ Q (Ω n ) ˆ F  ( n ) Q  ( t )  . Again, using the triangle inequality we can show the following:  P  (Ω n ) F  ( n ) P   ( t ) − Q (Ω n ) F  ( n ) Q  ( t )  −  ˆ P  (Ω n ) ˆ F  ( n ) P   ( t ) −  ˆ Q (Ω n ) ˆ F  ( n ) Q  ( t )  ≤  P  (Ω n ) F  ( n ) P   ( t ) − Q (Ω n ) F  ( n ) Q  ( t ) −  ˆ P  (Ω n ) ˆ F  ( n ) P   ( t ) + ˆ Q (Ω n ) ˆ F  ( n ) Q  ( t )  =  P  (Ω n ) F  ( n ) P   ( t ) − P  (Ω n ) ˆ F  ( n ) P   ( t ) − Q (Ω n ) F  ( n ) Q  ( t ) + Q (Ω n ) ˆ F  ( n ) Q  ( t )+ P  (Ω n ) ˆ F  ( n ) P   ( t ) −  ˆ P  (Ω n ) ˆ F  ( n ) P   ( t ) + ˆ Q (Ω n ) ˆ F  ( n ) Q  ( t ) − Q (Ω n ) ˆ F  ( n ) Q  ( t )  ≤ P  (Ω n )  F  ( n ) P   ( t ) −  ˆ F  ( n ) P   ( t )  + Q (Ω n )  F  ( n ) Q  ( t ) −  ˆ F  ( n ) Q  ( t )  + ˆ F  ( n ) P   ( t )  P  (Ω n ) −  ˆ P  (Ω n )  + ˆ F  ( n ) Q  ( t )  Q (Ω n ) −  ˆ Q (Ω n )  . Then the theorem follows from the Glivenko-Cantelli theorem, and  ˆ P,  ˆ Q  a.s. −−→ P,Q .Notice that the inequality in (2) can be made stricter by considering the supremum over not just theproduct of the segments  ( −∞ ,t i ]  but over the all  2 n − 1  possible products of the segments  ( −∞ ,t i ] and  [ t i , ∞ )  in  n  dimensions [7]. However, the latter approach is computationally more expensive,and therefore, in this paper we only explore the former approach. 4 Extended C-M divergence We can extend equation (3) to derive a Cram´er–von-Mises (C-M) type divergence for point pro-cesses. Let  µ  =  P   + Q/ 2 , then  P,Q  are absolutely continuous with respect to  µ . Note that, F  ( n ) P   ,F  ( n ) Q  ∈  L 2 (Ω n ,µ | n )  where | n  denotes the restriction on  Ω n , i.e. the CDFs are  L 2  integrable,since they are bounded. Analogous to the relation between K-S test and C-M test, we would like touse the integrated squared deviation statistics in place of the maximal deviation statistic. By inte-grating over the probability measure  µ  instead of the supremum operation, and using  L 2  instead of  L ∞  distance, we define d CM  ( P,Q ) =  n ∈ N   R n  P  (Ω n ) F  ( n ) P   ( t ) − Q (Ω n ) F  ( n ) Q  ( t )  2 d µ | n ( t ) .  (8)This can be seen as a direct extension of the C-M criterion. The corresponding estimator can bederived using the strong law of large numbers, ˆ d CM  ( P,Q ) =  n ∈ N  12  i   ˆ P  (Ω n ) ˆ F  ( n ) P   ( x ( n ) i  ) −  ˆ Q (Ω n ) ˆ F  ( n ) Q  ( x ( n ) i  )  2 + 12  i   ˆ P  (Ω n ) ˆ F  ( n ) P   ( y ( n ) i  ) −  ˆ Q (Ω n ) ˆ F  ( n ) Q  ( y ( n ) i  )  2  .  (9)4  Theorem 3  ( d CM   is a divergence) .  For   P   and   Q  with square integrable CDFs, d CM  ( P,Q ) ≥ 0  (10) d CM  ( P,Q ) = 0  ⇐⇒  P   =  Q.  (11) Proof.  Similar to theorem 1. Theorem 4  (Consistency of C-M divergence estimator) .  As the sample size approaches infinity,  d CM   −  ˆ d CM   a.u. −−→ 0  (12) Proof.  Similar to (7), we find an upper bound and show that the bound uniformly converges tozero. To simplify the notation, we define  g n ( x ) =  P  (Ω n ) F  ( n ) P   ( x ) − Q (Ω n ) F  ( n ) Q  ( x ) , and  ˆ g n ( x ) =ˆ P  (Ω n ) ˆ F  ( n ) P   ( x ( n ) ) −  ˆ Q (Ω n ) ˆ F  ( n ) Q  ( x ( n ) ) . Note that  ˆ g na.u. −−→ g  by the Glivenko-Cantelli theorem and ˆ P   a.s. −−→ P   by the strong law of large numbers.  d CM   −  ˆ d CM   =12  n ∈ N    g 2 n d P  | n  +  n ∈ N    g 2 n d Q | n −  n ∈ N  i ˆ g n ( x i ) 2 −  n ∈ N  i ˆ g n ( y i ) 2  =  n ∈ N    g 2 n d P  | n −    ˆ g 2 n d ˆ P  | n  +    g 2 n d Q | n −    ˆ g 2 n d ˆ Q | n  ≤  n ∈ N    g 2 n d P  | n −    ˆ g 2 n d ˆ P  | n  +    g 2 n d Q | n −    ˆ g 2 n d ˆ Q | n  where  ˆ P   =   i δ  ( x i )  and  ˆ Q  =   i δ  ( y i )  are the corresponding empirical measures. Without lossof generality, we only find the bound on    g 2 n d P  | n −    ˆ g 2 n d ˆ P  | n  , then the rest is bounded similarlyfor  Q .    g 2 n d P  | n −    ˆ g 2 n d ˆ P  | n  =    g 2 n d P  | n −    ˆ g 2 n d P  | n  +    ˆ g 2 n d P  | n −    ˆ g 2 n d ˆ P  | n  ≤    g 2 n − ˆ g 2 n  d P  | n  −    ˆ g 2 n d  P  | n −  ˆ P  | n  Applying Glivenko-Cantelli theorem and strong law of large numbers, these two terms convergessince  ˆ g 2 n  is bounded. Hence, we show that the C-M test estimator is consistent. 5 Results We present a set of two-sample problems and apply various statistics to perform hypothesis test-ing. As a baseline measure, we consider the widely used Wilcoxon rank-sum test (or equiva-lently, the Mann-Whitney U test) on the count distribution (e.g. [9]), which is a non-parametricmedian test for the total number of action potentials, and the integrated squared deviation statistic λ L 2  =    ( λ 1 ( t ) − λ 2 ( t )) 2 d t , where  λ ( t )  is estimated by smoothing spike timing with a Gaussiankernel, evaluated at a uniform grid at least an order of magnitude smaller than the standard deviationof the kernel. We report the performance of the test with varying kernel sizes.All tests are quantified by the power of the test given a significance threshold (type-I error) at  0 . 05 .The null hypothesis distribution is empirically computed by either generating independent samplesor by permuting the data to create at least 1000 values. 5.1 Stationary renewal processes Renewal process is a widely used point process model that compensates the deviation from Poissonprocess[10]. Weconsidertwostationaryrenewalprocesseswithgammaintervaldistributions. Sincethe mean rate of the two processes are the same, the rate function statistic and Wilcoxon test does5
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks