A revised Tobit procedure for mitigating bias in the presence of non-zero censoring with an application to milk-market participation in the Ethiopian highlands

A revised Tobit procedure for mitigating bias in the presence of non-zero censoring with an application to milk-market participation in the Ethiopian highlands
of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  ELSEXIER zyxwvu vailable online at zyx CIENCC@DIFIEcTe Agricultural Economics 3 1 (2004) 97-106 zyxwv GRICULTURAL ECONOMICS A revised Tobit procedure for mitigating bias in the presence of non-zero censoring with an application to milk-market participation in the Ethiopian highlands Garth Hollowaya *, Charles Nicholson b, Chris Delgado', Steve Staald, Simeon Ehui '' zyxwvutsrqp eader in Agricultural Economics and Statistics, Department < Agricultural and Food Economics, University ojKeuding, zy   Earley Gate, Whiteknights Road, PO. Box 237, Reading RG6 6AR, UK Senior Research Associate, Department zyxwvuts f Applied Economics and Management, Cornell University, Ithuca, New York, USA Senior Research Fellow, International Food Policy Research Institute and International Livestock Research Institute, Washington, DC, USA Economist, Livestock Policy Analysis Program, International Livestock Research Institute, Nairobi, Kenya Economist, The World Bank, Washington, DC, USA Received 10 October 2002; received in revised form 10 January 2003; accepted 11 December 2003 Abstract Fixed transactions costs that prohibit exchange engender bias in supply analysis due to censoring ofthe sample observations. The associated bias in conventional regression procedures applied to censored data and the construction of robust methods for mitigating bias have been preoccupations of applied economists since Tobin [Econometrica 26 1958) 241. This literature assumes that the true point of censoring in the data is zero and, when this is not the case, imparts a bias to parameter estimates of the censored regression model. We conjecture that this bias can be significant; affirm this from experiments; and suggest techniques for mitigating this bias using Bayesian procedures. The bias-mitigating procedures are based on modifications of the key step that facilitates Bayesian estimation of the censored regression model; are easy to implement; work well in both small and large samples; and lead to significantly improved inference in the censored regression model. These findings are important in light of the widespread use of the zero-censored Tobit regression and we investigate their consequences using data on milk-market participation in the Ethiopian highlands. zyxw   004 Elsevier B.V. All rights reserved. JELclusslfication: C24; 012; QOl; Q12 Keywords: Revised Tobit procedure; Milk-market development 1. Introduction costs are prohibitive, conventional supply analysis contains bias due to censoring of the sample observa- tions. The associated bias in conventional regression procedures and the construction of robust methods for mitigating such bias have been preoccupations of applied economists since Tobin (1958). However, bias resulting from incorrectly assuming that the true point of censoring in the sample is zero appears to have gone largely unnoticed. The objective of this Often, non-negligible fixed costs are associated with market transactions and for Some economic agents these costs prohibit exchange. When fixed Corresponding author. Tel.: +44 118 378 6775; fax: +44 118 975 6467. E-mail address: garth.holloway (G. Holloway). 0169-5150/ see front matter 2004 Elsevier B.V. All rights reserved. doi : 0.1 0 6/j agecon.2003.12.00 1  98 G. zyxwvutsrq ollowuy zyxwvutsr t al. /Agricultural Economics 31 2004) zyxw 7-106 paper is to study this second source of bias and pro- pose procedures for mitigating it. Ignoring the true point of censoring imparts significant inaccuracy to estimates of censored regression parameters and we present three alternative techniques for mitigating this bias using Bayesian procedures. The bias-mitigating procedures are based on modifications of the key step that facilitates Bayesian estimation of the censored regression model. These procedures are straightfor- ward to implement; work well in both small and large samples; and lead to significantly improved inference in the censored regression model. These findings are important in light of the widespread application of the Tobit regression in which the zero-censoring as- sumption is applied. The preferred and conventional procedures are compared and contrasted in an appli- cation to milk-market participation in the Ethiopian highlands. zyxwvutsr 2. Conventional, censored regression and three alternative procedures Greene 1993, p. 691) lists a number of diverse sit- uations in which the Tobit model has been applied, including household purchasing decisions, extramari- tal behaviour, labour supply and criminal activity. In agricultural economics, important classes of applica- tions involve commodity supply decisions (Mundlak, 2002), disequilibrium models (Fair and Jaffee, l972), production economics (Paris, 1992), and development economics (Goetz, 1992). In the latter category a key interest lies in expanding the density of market par- ticipation (Stiglitz, 1989). The agricultural economics literature is replete with examples in which the cen- sored regression has been usefully adapted. Recent examples that have appeared in this Journal include Woldehanna et al. (2000), Angulo et al. (2001) and Kosarek et al. (2001), to mention a few. The basic model structure which this literature applies is zi zyxwvutsrq   xlp +EL, (1) where zi denotes a latent economic quantity of inter- est; xi--(xil, xi2, . . . , x,k) denotes a vector of charac- teristics associated with the latent zl; PI P2, . . , Pk)’ depicts the relationship between the characteris- tics and the latent zi; zyxwvut   denotes random error, which, we assume, is normally distributed with mean zero and variance 02 hat is, ~ - N(O, a2 ; nd for each yi = max(zi, 0). (2) Eqs. (1) and (2) comprise the standard Tobit model in which the point of censoring-henceforth, n is as- sumed to be zero. The situation we are interested in is when Eq. (1) is applied but, instead of Eq. (2) gov- erning the censoring of the data, they are instead gov- erned by the rule yi = zy j (3) We refer to Eqs. 1) and (2) as the conventional model and refer to Eqs. (1) and 3) as the true model. The two models are, of course, identical when n equals zero and, although it may be possible to infer the exact point of censoring in rare situations, usually the value of zyxw   will not be known a yriori. When n s random and unknown, three principal issues arise. The first issue is the magnitude of bias arising from the practice of incorrectly assuming that n quals zero when, in fact, n is greater than zero. The second issue is the derivation of procedures that may mitigate any bias, and the third issue is the gainful un- derstanding of the economic implications of the bias. Even when the censoring point is actually known, ordinary least squares applied to the censored data leads to biased and inefficient parameter estimates (Tobin, 1958). Classical procedures for correcting this bias rely on one of two approaches. A first approach is to correct, using iterative procedures, the bias in least-squares estimation. A second approach relies on maximum-likelihood techniques and local approxima- tion, for example Newton-Raphson. As an alternative, following seminal work on the Bayesian censored re- gression (Chib, 1992), we employ a data-augmented, Gibbs-sampling algorithm to simulate draws for the parameters from their intractable joint posterior distri- bution. In Chib (1992) this approach is compared to more conventional procedures and is shown to lead to accurate estimates of regression parameters. In the remainder of the paper we pursue the Bayesian approach to estimation. There are four main justifications. First, as demonstrated in Chib (1 992), the non-informative Bayesian and sampling theory approaches lead to estimates that are very similar, both in terms of their locations and their scales. Sec- ond, the Bayesian approach is conceptually appealing if yi L randyi = 0, otherwise.  G. zyxwvutsrq olloway zyxwvuts t al. Agricultural Economics 31 (2004) 97-106 99 and, we feel, somewhat simpler to implement. Third, in view of the paucity of applications in agricultural economics exploiting the Bayesian approach, our demonstration has the ancillary appeal of highlighting the power of an under-exploited technique in solving a problem with a considerable heritage in agricul- ture. Fourth, the development of alternatives to the traditional zero-censored regression relies on an idea embedded in the crucial step enabling application of the Gibbs sampler to the censored regression model. Gibbs sampling (Gelfand and Smith, 1990) and data augmentation (Tanner and Wong, 1987) are parts of a broader set of techniques in Bayesian inference known as Markov-chain, Monte-Carlo (MCMC) methods.’ Examples of their application to censored-, discrete-, and truncated-regression problems are Albert and Chib 1 993), George and McCulloch (1993) and Dorfman (1996, 1997, 1998). Because the ideas underlying improvements to the traditional approach rely on an understanding of the Gibbs algorithm, it is useful to examine its application in general terms. Note that the censored regression framework in Eqs. 1) and (2) can, instead, be written z zyxwvutsrq   xp + E,, (4) where zr(z1, z2, . . ., ZN)’; X=(XI, x2, . .., zyxw K , Xl-(xI I, x21, . . ., XNI)’, x2-(x12, x22, . ., XN2)’> . . , xK=(xlK, X2K9.. . , XNK)’; P-(P1, P2,. . . zyxwvutsrqponmlkji K)’; E-(EI, ~2, . , EN)’ - zyx  ON, a21~ ; nd we observe y, = max{z,, 0). Ordering the data so that the first NI observations are the observed, positive, quantities and the remaining N2 (= zyxwv  NI observations correspond to the censored data, we write y = (yl’, yz’)’, where y~ = (yl, yz, . . , YNI)’ re the observations associated with positive quantities and y2 (YNI+I, ~1+2, . , y~)’ re the censored data. Also, in z (z~’, z’)’, z1 . . . , zyxwvu N)’ y2 = 0, 0, . . . , 0)’. The interchange z1 = y1 is purely for notational convenience. Simplification of the model is now made possible by working with the latent z2 rather than the observed y2. The essential recognition (Chib, 1992, p. 88) is that the data-augmented posterior distributions condi- = (ZI, z2, ’. YNIS = y1 and z2 = (ZNl+l, ZN1+29 ’ Readable introductions to Gibbs sampling, data augmentation and the Metropolis-Hastings method of which Gibbs sampling is a special case) are Casella and George 1992) and Tanner 1993) and Chib and Greenberg 1995). tioned by the complete data (y), and the observed data (y ), converge in distribution. The former distribution is difficult to work with because it involves censoring, but no censoring is involved in the latter formulation, making it easy to characterise in terms of its fully con- ditional component forms. Conditioned by the regres- sion parameters, the latent, dependent variable has a normal distribution, truncated to be negative. Condi- tioned by the latent data and the error standard devia- tion, the regression parameters have a normal distribu- tion; and, conditioned by the data and the regression parameters, the error standard deviation has an inverse gamma distribution. The MCMC approach to estimation samples se- quentially from these three sets of fully conditional distributions and, in so doing, simulates draws from the marginal posterior densities of interest. The algo- rithm is implemented as follows: Step 1 : Select starting values for the regression coefficients and the error variance. Step 2 : For each of the observations in the censored part of the data draw a normal random variable truncated according to the appropriate censor value. Step 3 : Draw the regression coefficients from a multivariate normal distribution. Step 4 : Draw the error variance from an inverse gamma distribution. Step 5 : Repeat steps 1-4 for a ‘burn-in’ phase until convergence is achieved. Step 6 : Repeat steps 1-4 and collect the outputs of the respective draws. 5) Details of the distributions in question are contained in Appendix A. An important feature of this approach is that the outputs in the last step can be used to plot histograms, compute means and standard deviations, or estimate any desired posterior characteristics of in- terest (Gelfand and Smith, 1990). An important step in the revised procedure involves specification of the sampling interval for the censor value, n. We develop three alternative approaches to constraining the choice of n. o do so, it is useful to denote the censored observations through a generic symbol, and we use zy   i yi = O} to denote the censor set.  I00 G. zyxwvutsrq otbwuy zyxwvutsr t zyxwvut i ARricuttural Economic. ?I 2004) 97-106 2.1. Altesnative one The first approach to the censored regression relies on the logic that the minimum of the observed supply values defines a maximum for the censor value. In par- ticular, an upper bound on n is the minimum of the strictly positive sample quantities, or, the minimum of the set {y,, zyxwvutsr   zyxwvutsrq   zyxwvuts }. By similar reasoning, because the observed net supplies can never be negative, a logical lower bound on n is zero. In other words, logic con- strains the feasible choice for zyxwvu C to the closed interval n E lo, min(y,, i $ ell. 6) This interval provides the basis for an estimation algo- rithm in which the true point of censoring is permitted to vary, but vary only within the range of values that lie below the minimum of the observed, positive quan- tities. lmplementation requires three modifications to the algorithm in Eq. (5). First, a starting value for the true point of censoring, n, ust be added in the first step. We recommend using the minimum of the un- censored observations as this value. Second, instead of zero, the draws for the latent data are now trun- cated to be less than the value n. hird, a draw for n is appended as an additional step in the algorithm. In the absence of additional information, we recommend making this draw from a uniform distribution. 2.2. Alternative two Two problems arise with the interval in Eq. 6). First, estimates of n, lthough improved from assum- ing n = 0, are likely to be quite imprecise in the event that the true censor value is large. Second, neither the upper nor the lower bound for n in Eq. 6) is tied in any way to estimates of the Tobit regression. A sec- ond interval for estimating n arises from considering probit estimation on the zero-one, discrete outcome data. The rationale for this approach is that no censor- ing is involved in probit regression. The typical pro- bit regression estimates a linear relationship between a truncated-normal random variable and a linear com- bination of the covariates, including a constant term. The Bayesian approach to estimation is outlined in Albert and Chib 1993, Eqs. 3)- 6), p. 671) and is al- most identical to the Tobit algorithm in Eq. 5). There are two differences. First, due to the well-known prob- lem that the probit regression is identified only up to a scale-normalized transformation of the linear func- tion (see, for example, Greene, 1993, Section 21.3.2, p. 642), a parameter restriction must be imposed on the latent-variable regression. The typical restriction is to peg the error variance at a specified value, usu- ally assumed to be one. Second, latent quantities are estimated for both partitions of the data, with the set of latent quantities pertaining to non-participants constrained to be negative and the remaining quanti- ties pertaining to the participating observations con- strained to be positive. For later reference, let {v, < 0, zyx ~c} enote the latent quantities corresponding to the censored observations and let {vl 2 0, i c} de- note those corresponding to the uncensored observa- tions. Reconsidering both of these aspects of probit estimation leads ultimately to an improved range for the censor value, n. If instead of a forced separation at zero, the latent quantities could be transformed to produce a sepa- ration point endogenously, as part of the estimation itself, this separation point would provide a natural alternative to the bounds presented in Eq. (6). Using this fact, we derive estimates of the censor value by forcing the probit draws to mimic the actual observed data. However, in order to do so we require the latent data to possess the same error variance as the data in the Tobit regression. With this variance modifica- tion at hand, regressing the probit model through the srcin, using the Tobit variance as its fixed scale, we derive a potentially improved range for the censor value, namely the interval n E [max(vi, i E c}, min{yi, c)]. 7) Comparing the intervals in Eqs. 6) and 7), he latter interval is suspected to provide possible improvement due to the fact that the lower bound in Eq. 7) is es- timated from the given covariates. Conceivably, this range could result in a latent draw max{v,, i E c} that is negative, but this will only be the case when the estimate of the propensity to participate by the agent ‘closest’ to participation (that is, the largest latent value) is negative. In this case the interval defined by Eq. 7) provides less precise information than the one in Eq. 6). However, such an occurrence is unlikely to arise whenever fixed cost constrains participation by making positive latent quantities economically infeasible.  G. Holloway et ul. /Agricultural Economics 31 2004) 97-106 zyx 01 2.3. Alternative three zyxwvuts The link to the Tobit regression in Eq. 7) is an important one, but it is indirect. zyxwvu   third approach that provides a more direct linkage is to use the max- imum values of the latent draws for the censored observations-in other words, draws from the Tobit regression itself. This approach is a natural extension of the logic developed for the use of the probit latent variables and consists of drawing values for zyxwv   rom the interval (8) E [max{zi, zyxwvut   E c} min[yi, zyxwvut   zyxwvut   c}]. This approach links the interval for the censor value directly to the Tobit regression and, thus, eliminates the need for probit estimation. 2.4. Experimental evidence With three alternatives to the conventional Tobit available, there appears to be considerable scope for deriving ‘improvements’ over the traditional model and a possibility for identifying a ‘preferred procedure’ from the pool of available alternatives. These questions are pursued in the context of some fairly comprehensive experiments using simulated data and a wide range of censor values. Space limits reporting the experiments and their results details are available upon request), except to say that two fea- tures of the exercise are particularly noteworthy. First, some fairly definitive conclusions emerge from the experiments, including a consistent ranking among the three alternatives. Compared to the conventional Tobit regression, in which the censor value is assumed to be zero; the first, alternative procedure Eq. (6)) generates considerable improvements in estimation accuracy; compared to the first alternative, estimation accuracy is further enhanced by combining the probit and Tobit models Eq. 7)); and a further improve- ment is evident when the draws for the censor value are restricted by the latent data generated solely from the Tobit regression Eq. 8)). Second, these experi- ments suggest a clear candidate for comparison with the traditional model and raise considerable scope for empirical enquiry. Thus, the interval in Eq. 8) is the one applied in the empirical application that follows. 3. Empirical application As noted earlier, assessing factors influencing mar- ket participation in developing countries is a com- mon application of the conventional model Goetz, 1992). Where data on marketable surplus-that quan- tity of food product not consumed by the household itself-are available, a standard entry-analysing proce- dure regresses marketable surplus on a set of relevant household characteristics. Because nonparticipation is often at issue, some data are censored and Tobit estimation is relevant. The estimated regression is capable of identifying the subset of covariates that impact the entry decision and, also, predicting the lev- els of these covariates that are required for entry. Our concern lies in the extent to which the bias arising in this situation leads to significant biases in policy recommendations; leads, in turn, to false inferences about reform; and leads, therefore, to incorrect pre- scriptions for economic policy. The bias arises from a basic analogy to the theory of the firm: when fixed costs are relevant, there exist finite, non-negligible, quantities of marketable surplus net supply quan- tities) below which household participation in the relevant market becomes infeasible. Household-level data on milk sales in the Ethiopian highlands are used to compare the conventional model that is, the zero-censored regression) and the true model the Tobit regression that allows for a non-zero censor value). In the highlands, significant transac- tions costs prohibit entry for many households and recorded data on milk-market participation are there- fore censored. Identifying the levels of covariates that influence the entry decision is one relevant objective, but principal interest lies in characterising the rnini- mum efficient scale of operations for households to participate in the market. Domestic dairy production has potential to gen- erate income and employment on a large scale in the pen-urban areas of sub-Saharan Africa.2 How- ever, growth in dairying by small-holder farmers in peri-urban areas has been limited by transactions costs For the purpose of present discussion, the term ‘peri-urban’ defines those locations in geographical proximity to a major urban area (such as Addis Ababa) from which fluid milk and other dairy products are feasibly supplied to urban markets. The term is defined by Staal 1995).
Similar documents
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks