Screenplays & Play

A general proposal construction for reversible jump MCMC

Description
We propose a general methodology to construct proposal densi- ties in reversible jump MCMC algorithms so that consistent mappings across competing models are achieved. Unlike nearly all previous ap- proaches our proposals are not restricted to
Published
of 14
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A general proposal construction for reversible jumpMCMC Michail Papathomas ∗ , Petros Dellaportas † and Vassilis G.S. Vasdekis † June 18, 2009 Abstract We propose a general methodology to construct proposal densi-ties in reversible jump MCMC algorithms so that consistent mappingsacross competing models are achieved. Unlike nearly all previous ap-proaches our proposals are not restricted to operate to moves betweenlocal models, but they are applicable even to models that do not shareany common parameters. We focus on linear regression models andproduce concrete guidelines on proposal choices for moves betweenany models. These guidelines can be immediately applied to any re-gression models after applying some standard data transformations tonear-normality. We illustrate our methodology by providing concreteguidelines for model determination problems in logistic regression andlog-linear graphical models. Two real data analyses illustrate how oursuggested proposal densities together with the resulting freedom to pro-pose moves between any models improve the mixing of the reversible jump Metropolis algorithm. Keywords —  Bayesian inference; Graphical models; Linear regression; Log-linearmodels; Logistic regression 1 Introduction The reversible jump MCMC algorithm was introduced by Green (1995) asan extension to the standard Metropolis-Hastings algorithm to variable di-mension spaces; see also Tierney (1998). It is based on creating a Markovchain which can ‘jump’ between models with parameter spaces of differentdimension. In a Bayesian inference framework, its great impact stems fromthe fact that it allows the calculation of posterior model probabilities for alarge number of competing models. Here the key issue is not the calculationof marginal densities per se, but the ability to search, via a Markov chainsimulation, in a large space of models in which marginal densities are notavailable. Although reversible jump has been extensively used in many ap-plied model determination problems, its widespread applicability has been ∗ Department of Epidemiology and Public Health, Imperial College London, UK † Department of Statistics, Athens University of Economics and Business, Greece 1  hindered by the difficulty to achieve proposal moves between models thatemploy some notion of inter-model consistency that facilitates good mixingacross models. We provide a methodology that constructs moves betweenany models in model space in a general regression setting, and we illustrateits applicability in logistic regression and log-linear graphical models.The general reversible jump algorithm can be described as follows. As-sume that a data vector  y  is generated by model  i ∈M , where M is a finiteset of competing models. Each model specifies a likelihood  f  ( y | θ i ,i ), subjectto an unknown parameter vector  θ i  ∈  Θ i  of size  p i , where Θ i  ⊆  R  p i is theparameter space for model  i . Let ( θ i ,i ) be the current state of the Markovchain. Then, the reversible jump algorithm consists of the following steps: 1. Propose a new model  j  with probability  π ( i,j ) .2. Generate  u  from a proposal density  q  ( u | θ i ,i,j,y ) .3. Set  ( θ  j ,u ∗ ) =  g i,j ( θ i ,u ) , where the deterministictransformation function  g i,j  and its inverse aredifferentiable. Note that  p  j  + dim ( u ∗ ) =  p i + dim ( u )  and that g i,j  =  g − 1  j,i  .4. Accept the proposed move from model  i  to model  j  with aprobability  α i,j  = min(1 ,A ) , A  =  f  ( y | θ  j ,j ) f  ( θ  j |  j ) f  (  j ) π (  j,i ) q  ( u ∗ | θ  j ,j,i,y ) f  ( y | θ i ,i ) f  ( θ i | i ) f  ( i ) π ( i,j ) q  ( u | θ i ,i,j,y )  ×  ∂  ( θ  j ,u ∗ ) ∂  ( θ i ,u )  where  f  ( i )  and  f  ( θ i | i )  denote prior densities for model  i  andparameter vector  θ i  respectively. Step 1 of the algorithm seems to create a freedom of choice, but unfor-tunately proposed models should be carefully chosen such that  θ  j  in step 3belongs to a relatively high region of the posterior density  f  ( θ  j |  j,y ); other-wise, the proposed moves will often be rejected. This, in turn, implies thatthe functions  q   and  g  in steps 2 and 3 are key elements of the successfulapplication of the algorithm. Brooks  et. al.  (2003) have reviewed and sug-gested various ways to choose  q   efficiently. However, the requirement for theexistence of some consistency in the mapping between models has limitedall these methods to operate with ‘local’ moves in the model space. Thismeans that  θ i  and  θ  j  often have many common elements, and in fact in mostcases the one is a subset of the other, resulting in attempted moves betweennested models. A notable exception is the moment matching technique of Richardson and Green (1997) which retains the desired compatibility be-tween models through moment matching. Also, Ehlers and Brooks (2008)construct moves between non-nested autoregressive models by approximat-ing proposals from relevant posterior conditional densities, setting the first2  and higher order derivatives of the acceptance ratio with respect to  u  equalto zero.An intuitive description of our proposed methodology is based on thefollowing three points. First, it is sensible to specify the proposal density q   by exploiting some structural form of the residuals of the current model i  in relation with the expected residuals in model  j . This will provide therequired consistency between models. Second, when model jumps are pro-posed, it is desirable to propose parameter values  θ  j  that are relatively atan equally high posterior region in model  j  as that of   θ i  in model  i . Finally,these moves should be general enough to allow moves even when  θ i  and  θ  j do not have any common elements.After specifying the mathematical formulation that satisfies the threekey points above, we assume that  q   is a multivariate normal density andwe derive exact solutions for its mean vector and covariance matrix in thecase of linear regression models. We then investigate the applicability of our method to some binomial and contingency table data in which the dataare transformed to approximate normality. Although this approximationmight not be accurate, the derived proposal densities are still appealingand in fact provide an impressive improvement over the currently availablereversible jump proposed algorithm of Dellaportas and Forster (1999).All currently available ways to choose  q   and  g  are described in great detailin the paper by Brooks  et. al.  (2003) and the accompanying discussion. Seealso Sisson (2005) and Ehlers and Brooks (2008). As pointed out earlier,the majority of them refers to ‘local’ moves in  M . An interesting differentapproach, that is in very close line with our suggested proposal densities, isgiven by Green (2003) who empirically develops a method for constructingproposal distributions that is similar in spirit to the random walk Metropolissampler of Roberts (2003). He considers normal proposal densities andsuggests that their mean and variances should be functions of the mean andvariances of the target density, which can be adaptively estimated with apilot run. This requirement reduces the appeal of this method when thenumber of models is large.In an unpublished report, Green (2000) produces empirically similar re-sults with those presented here. In fact, our empirical findings show that,in some instances, the resulting reversible jump efficiency between the twoapproaches is comparable. Therefore, one can view our work as an effortto provide theoretical justifications to the intuitive empirical approach of Green (2000).The rest of the paper proceeds as follows. Section 2 gives the mathemat-ical derivation of our proposed methodology. In Section 3 we search throughreversible jump MCMC graphical models in a large contingency table andthrough a series of logistic regression models in a data set that containsbinomial observations. In Section 4 we conclude with a small discussion.3  2 The proposed approach We consider an  n -dimensional vector  y  of normal observations and com-peting linear models  N  ( η i ,V  i ),  i  ∈ M , where  η i  =  X  i θ i ,  X  i  is the designmatrix of model  i  and  θ i  is of dimension  p i . If   p  covariates are available,then  p i  ≤  p  and  M  contains 2  p models. We assume that the prior densitiesof the parameters in each model are conjugate and non-informative in thesense that they are constant over the important region of the correspondinglikelihood functions. Assume that the reversible jump MCMC algorithmhas a current state ( θ i ,V  i ,i ) and that a move is proposed to ( θ  j ,V   j ,j ) suchthat  V  i  =  V   j  =  V   . The equality of variances constraint between moves isplausible and it does not affect the stationary distributions of the variances V  i  since their values are updated in the within-model parameter updates of the MCMC algorithm. Then, our key idea is that the proposal density for θ  j ,  q  ( u | θ i ,i,j,y ), should satisfy the relationship f  ( θ i | i,V,y ) =  E  u { f  ( u |  j,V,y ) }  (1)where  f  ( . | i,V,y ) and  f  ( . |  j,V,y ) denote the conditional posterior densitiesof   θ i  and  θ  j  under models  i  and  j  respectively and  E  u  denotes expectationwith respect to the proposal density  q  ( u | θ i ,i,j,y ). Intuitively, (1) expressesthe desire to propose  θ  j  that should, on average with respect to the proposaldensity, obtain  f  ( θ i | i,V,y ) =  f  ( θ  j |  j,V,y ).We attack (1) by assuming that  q  ( u | θ i ,i,j,y ) is a Normal density  N  ( µ, Σ).Then, under usual prior conjugate assumptions, the conditional posteriordensities in (1) are multivariate normal and it remains to solve this equa-tion with respect to  µ  and Σ. There are clearly many values of   µ  andΣ that satisfy (1), and consequently many proposal densities  q  ( u | θ i ,i,j,y )with that property. This fact is taken care in our theoretical developmentbelow. When these solutions are available, they provide a yardstick to con-struct proposal densities for other linear regression models with non-normalresponses; we provide such examples in Section 3.Our approach has a similarity with the centering functions approachsuggested by Brooks  et al.  (2003), but the two methods are inherentlydifferent. The centering functions approach imposes exact equality betweenthe likelihood functions of models  i  and  j  so that a deterministic mappingcan be constructed. The function  g i,j  is predetermined, defined for the casewhere moves are attempted between nested models and common parametersare kept fixed. In contrast, we aim to explore (1) and construct proposals forcomplex moves between models that do not necessarily share parameters,with proposed values that change adaptively in accordance with the currentstate of the chain and no parameters are kept fixed. The following theoremprovides the required solution to (1): Theorem 1:  Under the model determination setup defined above, one so- 4  lution for the mean   µ  of the proposal distribution   N  ( µ, Σ)  is given by  µ  = ( X    j V   − 1 X   j ) − 1 X    j V   − 1  y  +  B − 1 V   − 1 / 2 ( X  i θ i − P  i y )  .  (2) where   B  = ( V   + X   j Σ X    j ) − 1 / 2 and   P  i  =  X  i ( X   i V   − 1 X  i ) − 1 X   i V   − 1 is the projec-tion matrix to the space generated by the columns of   X  i , weighted by   V   − 1 . The proof of Theorem 1 is given in the Appendix.This result has an interesting interpretation. The mean of the proposaldensity is the maximum likelihood estimate of the new model plus a correc-tion term that depends upon the difference between the fitted values underthe maximum likelihood estimate for model  i ,  P  i y , and the fitted valuesunder the currently accepted  θ i . Intuitively, the difference  X  i θ i − P  i y  deter-mines a distance between the current value  θ i  from the mode of its posteriordensity, so the proposed value of   θ  j  lies, in expectation, in a relatively equallyhigh posterior region in model  j .We now turn to the determination of Σ. Note that Σ appears in Theo-rem 1 through matrix  B  in such a way that any choice for Σ would make  B invertible. However, it should be recognized that when jumping from model i  to model  j  some elements of   θ i  and  θ  j  may be common to both models,so it would be desirable to propose a move with reduced variability to theseelements. Assume that the last  t  parameters in  θ  j  are common to bothmodels. There are at least two possible choices for the form of Σ. Setting Q ij  = ( X   i V   − 1 X   j ) − 1 , the first choice involves the matrix  Q  jj  which is the co-variance matrix associated with  f  ( θ  j |  j,V,y ). Σ can be formed by that partof rows and columns of   Q  jj  which correspond to the  p  j − t  uncommon param-eters between models  i  and  j , whilst all other elements of   Q  jj  are replacedby zero. The second choice involves the matrix  Q  jj − Q  jj Q − 1  ji  Q ii Q − 1 ij  Q  jj , of which a simplified version was proposed by Green (2000) in an unpublishedreport. This suggestion has two advantages; the first is that it is smallerthan  Q  jj  in the L¨ o wner sense (Harville, 1997) providing small variances forour proposals. The second is that the rank of this matrix is  p  j  − t  andthis matches the idea of using the already gathered information about the  t common parameters. Therefore, we suggest that a reasonable choice for ΣisΣ =  Q  jj − Q  jj Q − 1  ji  Q ii Q − 1 ij  Q  jj  +  cI   p j  (3)with any scalar  c >  0 which makes Σ invertible. Thus, the proposed  θ  j is constructed as  θ  j  = ( X    j V   − 1 X   j ) − 1 X    j V   − 1  y  +  B − 1 V   − 1 / 2 ( X  i θ i − P  i y )  +Σ 1 / 2 u  where  u ∼ N  (0 ,I   p j ).The constant  c  is clearly a tuning parameter that determines the vari-ability of the proposals for the common parameters of models  i  and  j . If  c >  0, then dim( u ) =  p  j  and dim( u ∗ ) =  p i , even if some of the parametersof the two models are common. In all analyses we have performed, smallvalues of   c  were very robust to the mixing performance, but of course some5
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks