ELSEXIER
zyxwvu
vailable online
at
www.sciencedirect.com
zyx
CIENCC@DIFIEcTe
Agricultural Economics 3 1 (2004) 97106
zyxwv
GRICULTURAL ECONOMICS
www.elsevier.com/locate/agecon
A
revised Tobit procedure
for
mitigating bias in the presence
of
nonzero censoring with an application to milkmarket participation in the Ethiopian highlands
Garth
Hollowaya *,
Charles
Nicholson
b,
Chris
Delgado', Steve
Staald,
Simeon
Ehui
''
zyxwvutsrqp
eader in Agricultural Economics and Statistics, Department < Agricultural and
Food
Economics, University ojKeuding,
zy
Earley Gate, Whiteknights Road,
PO.
Box
237,
Reading RG6 6AR,
UK
Senior Research Associate, Department
zyxwvuts
f
Applied Economics and Management, Cornell University, Ithuca, New York, USA Senior Research Fellow, International Food Policy Research Institute and International Livestock Research Institute, Washington,
DC,
USA Economist, Livestock Policy Analysis Program, International Livestock Research Institute, Nairobi, Kenya Economist, The World Bank, Washington,
DC,
USA
Received 10 October 2002; received
in
revised form 10 January 2003; accepted 11 December 2003
Abstract
Fixed transactions costs that prohibit exchange engender bias in supply analysis due to censoring ofthe sample observations. The associated bias in conventional regression procedures applied to censored data and the construction of robust methods for mitigating bias have been preoccupations of applied economists since Tobin [Econometrica
26 1958)
241.
This literature assumes that the true point of censoring in the data is zero and, when this is not the case, imparts
a
bias to parameter estimates of the censored regression model. We conjecture that this bias can be significant; affirm this from experiments; and suggest techniques for mitigating this bias using Bayesian procedures. The biasmitigating procedures are based on modifications
of
the key step that facilitates Bayesian estimation of the censored regression model; are easy to implement; work well in both small and large samples; and lead to significantly improved inference in the censored regression model. These findings are important in light of the widespread use of the zerocensored Tobit regression and we investigate their consequences using
data
on milkmarket participation in the Ethiopian highlands.
zyxw
004 Elsevier
B.V.
All rights reserved.
JELclusslfication:
C24; 012;
QOl;
Q12
Keywords:
Revised Tobit procedure; Milkmarket development
1.
Introduction
costs
are
prohibitive, conventional supply analysis contains bias due to censoring of the sample observa tions. The associated bias in conventional regression procedures and the construction of robust methods for mitigating such bias have been preoccupations of applied economists since Tobin
(1958).
However, bias resulting from incorrectly assuming that the true point of censoring in the sample
is
zero
appears
to
have gone largely unnoticed. The objective
of
this Often, nonnegligible fixed costs are associated with market transactions and for Some economic agents these costs prohibit exchange. When fixed
Corresponding author. Tel.: +44
118
378 6775; fax: +44 118 975 6467.
Email address:
garth.holloway @reading.ac.uk
(G.
Holloway). 01695150/ see front matter 2004 Elsevier
B.V.
All rights reserved. doi
:
0.1
0
6/j agecon.2003.12.00
1
98
G.
zyxwvutsrq
ollowuy
zyxwvutsr
t
al.
/Agricultural Economics
31
2004)
zyxw
7106
paper is to study this second source of bias and pro pose procedures for mitigating it. Ignoring the true point of censoring imparts significant inaccuracy to estimates of censored regression parameters and we present three alternative techniques for mitigating this bias using Bayesian procedures. The biasmitigating procedures are based on modifications of the key step that facilitates Bayesian estimation of the censored regression model. These procedures are straightfor ward to implement; work well in both small and large samples; and lead to significantly improved inference in the censored regression model. These findings are important in light of the widespread application of the Tobit regression in which the zerocensoring as sumption is applied. The preferred and conventional procedures are compared and contrasted in an appli cation
to
milkmarket participation in the Ethiopian highlands.
zyxwvutsr
2. Conventional, censored regression and three alternative procedures
Greene
1993,
p.
691)
lists a number of diverse sit uations in which the Tobit model
has
been applied, including household purchasing decisions, extramari tal behaviour, labour supply and criminal activity. In agricultural economics, important classes of applica tions involve commodity supply decisions (Mundlak, 2002), disequilibrium models (Fair and Jaffee,
l972),
production economics (Paris, 1992), and development economics (Goetz, 1992). In the latter category a key interest lies in expanding the density of market par ticipation (Stiglitz, 1989). The agricultural economics literature is replete with examples
in
which the cen sored regression has been usefully adapted. Recent examples that have appeared in this
Journal
include Woldehanna
et al.
(2000), Angulo
et al.
(2001) and Kosarek
et al.
(2001), to mention a few. The basic model structure which this literature applies is zi
zyxwvutsrq
xlp
+EL,
(1)
where
zi
denotes a latent economic quantity of inter est; xi(xil, xi2,
.
. .
,
x,k) denotes a vector of charac teristics associated with the latent
zl;
PI
P2,
. .
,
Pk)’
depicts the relationship between the characteris tics and the latent zi;
zyxwvut
denotes random error, which, we assume, is normally distributed with mean zero and variance
02
hat is,
~

N(O,
a2 ;
nd for each yi
=
max(zi,
0).
(2)
Eqs.
(1)
and (2) comprise the standard Tobit model in which the point of censoringhenceforth,
n
is as sumed to be zero. The situation we are interested in is when Eq. (1) is applied but, instead of Eq.
(2)
gov erning the censoring
of
the data, they are instead gov erned by the rule yi
=
zy
j
(3)
We refer to
Eqs.
1)
and (2) as the
conventional model
and refer to Eqs. (1) and
3)
as the
true model.
The two models are,
of
course, identical when
n
equals zero and, although it may be possible to infer the exact point
of
censoring in rare situations, usually the value of
zyxw
will not be known
a yriori.
When
n
s random and unknown, three principal issues arise. The first issue is the magnitude
of
bias arising from the practice of incorrectly assuming that
n
quals zero when, in fact,
n
is greater than zero. The second issue is the derivation
of
procedures that may mitigate any bias, and the third issue is the gainful un derstanding
of
the economic implications of the bias. Even when the censoring point is actually known, ordinary least squares applied
to
the censored data leads
to
biased and inefficient parameter estimates (Tobin, 1958). Classical procedures for correcting this bias rely on one of two approaches. A first approach is to correct, using iterative procedures, the bias in leastsquares estimation. A second approach relies on maximumlikelihood techniques and local approxima tion, for example NewtonRaphson. As an alternative, following seminal work on the Bayesian censored re gression (Chib, 1992), we employ
a
dataaugmented, Gibbssampling algorithm to simulate draws for the parameters from their intractable joint posterior distri bution. In Chib (1992) this approach is compared to more conventional procedures and is shown to lead to accurate estimates of regression parameters. In the remainder
of
the paper we pursue the Bayesian approach to estimation. There
are
four main justifications. First, as demonstrated in Chib (1 992), the noninformative Bayesian and sampling theory approaches lead to estimates that are very similar, both in terms of their locations and their scales. Sec ond, the Bayesian approach is conceptually appealing if yi
L
randyi
=
0,
otherwise.
G.
zyxwvutsrq
olloway
zyxwvuts
t
al.
Agricultural Economics
31
(2004) 97106
99
and, we feel, somewhat simpler to implement. Third,
in
view of the paucity of applications in agricultural economics exploiting the Bayesian approach, our demonstration has the ancillary appeal of highlighting the power of an underexploited technique in solving
a
problem with a considerable heritage in agricul ture. Fourth, the development of alternatives to the traditional zerocensored regression relies on an idea embedded in the crucial step enabling application of the Gibbs sampler to the censored regression model. Gibbs sampling (Gelfand and Smith, 1990) and data augmentation (Tanner and Wong, 1987) are parts of
a
broader set of techniques in Bayesian inference known
as
Markovchain, MonteCarlo (MCMC) methods.’ Examples
of
their application
to
censored, discrete, and truncatedregression problems are Albert and Chib
1
993),
George and McCulloch (1993) and Dorfman (1996, 1997,
1998).
Because the ideas underlying improvements to the traditional approach rely on an understanding of the Gibbs algorithm, it is useful to examine its application in general terms. Note that the censored regression framework in
Eqs.
1)
and
(2)
can, instead, be written z
zyxwvutsrq
xp
+
E,,
(4)
where
zr(z1,
z2,
.
.
.,
ZN)’;
X=(XI,
x2,
.
..,
zyxw
K ,
Xl(xI
I,
x21,
.
.
.,
XNI)’, x2(x12,
x22,
.
.,
XN2)’>
. .
,
xK=(xlK, X2K9..
.
,
XNK)’;
P(P1,
P2,.
.
.
zyxwvutsrqponmlkji
K)’;
E(EI,
~2,
.
,
EN)’

zyx
ON,
a21~ ;
nd we observe y,
=
max{z,,
0).
Ordering the data
so
that the first
NI
observations are the observed, positive, quantities and the remaining
N2
(=
zyxwv
NI
observations correspond to the censored data, we write
y
=
(yl’,
yz’)’,
where y~
=
(yl, yz,
.
.
,
YNI)’ re the observations associated with positive quantities and
y2
(YNI+I, ~1+2,
.
,
y~)’ re the censored data. Also, in
z
(z~’,
z’)’,
z1
.
.
.
,
zyxwvu
N)’
y2
=
0,
0,
.
. .
,
0)’.
The interchange
z1
=
y1
is
purely for notational convenience. Simplification of the model is
now
made possible by working with the latent z2 rather than the observed
y2.
The essential recognition (Chib, 1992, p. 88) is that the dataaugmented posterior distributions condi
=
(ZI,
z2,
’.
YNIS
=
y1
and
z2
=
(ZNl+l,
ZN1+29
’
Readable introductions to Gibbs sampling, data augmentation and the MetropolisHastings method
of
which Gibbs sampling is
a
special case) are Casella and George
1992)
and Tanner
1993)
and Chib and Greenberg
1995).
tioned by the complete data
(y),
and the observed data
(y
),
converge in distribution. The former distribution is difficult to work with because it involves censoring, but no censoring is involved
in
the latter formulation, making it easy to characterise in terms of its fully con ditional component forms. Conditioned by the regres sion parameters, the latent, dependent variable has a normal distribution, truncated to be negative. Condi tioned by the latent data and the error standard devia tion, the regression parameters have a normal distribu tion; and, conditioned by the data and the regression parameters, the error standard deviation has an inverse gamma distribution. The MCMC approach to estimation samples se quentially from these three sets of fully conditional distributions and, in
so
doing, simulates draws from the marginal posterior densities
of
interest. The algo rithm
is
implemented
as
follows: Step 1
:
Select starting values for the regression coefficients and the error variance. Step
2
:
For each of the observations in the censored part of the data draw
a
normal random variable truncated according
to
the appropriate censor value. Step
3
:
Draw the regression coefficients from
a
multivariate normal distribution. Step
4
:
Draw the error variance from an inverse gamma distribution. Step
5
:
Repeat steps
14
for
a
‘burnin’ phase until convergence
is
achieved. Step
6
:
Repeat steps
14
and collect the outputs of the respective draws.
5)
Details of the distributions in question are contained in Appendix A. An important feature of this approach is that the outputs in the last step can be used to plot histograms, compute means and standard deviations, or estimate any desired posterior characteristics of in terest (Gelfand and Smith, 1990). An important step in the revised procedure involves specification
of
the sampling interval for the censor value,
n.
We develop three alternative approaches
to
constraining the choice of
n.
o do
so,
it is useful to denote the censored observations through
a
generic symbol, and we use
zy
i
yi
=
O}
to denote the censor set.
I00
G.
zyxwvutsrq
otbwuy
zyxwvutsr
t
zyxwvut
i
ARricuttural
Economic.
?I
2004)
97106
2.1.
Altesnative
one
The first approach to the censored regression relies on the logic that the
minimum
of the observed supply values defines a
maximum
for the censor value. In par ticular,
an
upper bound
on
n
is the minimum of the strictly positive sample quantities, or, the minimum of the set {y,,
zyxwvutsr
zyxwvutsrq
zyxwvuts
}.
By similar reasoning, because the observed net supplies can never be negative,
a
logical lower bound on
n
is zero. In other words, logic con strains the feasible choice for
zyxwvu
C
to the closed interval
n
E
lo,
min(y,,
i
$
ell.
6)
This interval provides the basis for an estimation algo rithm in which the true point of censoring is permitted to vary, but vary only within the range of values that lie below the minimum of the observed, positive quan tities. lmplementation requires three modifications to the algorithm in Eq.
(5).
First, a starting value for the true point
of
censoring,
n,
ust be added in the first step. We recommend using the minimum of the un censored observations as this value. Second, instead of zero, the draws for the latent data are now trun cated to be less than the value
n.
hird, a draw for
n
is appended as an additional step in the algorithm. In the absence
of
additional information, we recommend making this draw from a uniform distribution.
2.2.
Alternative two
Two problems arise with the interval in Eq.
6).
First, estimates
of
n,
lthough improved from assum ing
n
=
0,
are likely to be quite imprecise in the event that the true censor value is large. Second, neither the upper nor the lower bound for
n
in Eq.
6)
is tied in any way to estimates of the Tobit regression. A sec ond interval for estimating
n
arises from considering probit estimation
on
the zeroone, discrete outcome data. The rationale for this approach is that no censor ing is involved in probit regression. The typical pro bit regression estimates a linear relationship between a truncatednormal random variable and a linear com bination
of
the covariates, including a constant term. The Bayesian approach to estimation is outlined in Albert and Chib
1993,
Eqs.
3) 6),
p.
671)
and is al most identical to the Tobit algorithm in
Eq.
5).
There are two differences. First, due to the wellknown prob lem that the probit regression is identified only up to a scalenormalized transformation of the linear func tion (see, for example, Greene,
1993,
Section
21.3.2,
p.
642),
a parameter restriction must be imposed on the latentvariable regression. The typical restriction is to peg the error variance at a specified value, usu ally assumed to be one. Second, latent quantities are estimated for both partitions of the data, with the set of latent quantities pertaining to nonparticipants constrained to be negative and the remaining quanti ties pertaining
to
the participating observations con strained to be positive. For later reference, let {v,
<
0,
zyx
~c}
enote the latent quantities corresponding to the censored observations and let {vl
2
0,
i c}
de note those corresponding to the uncensored observa tions. Reconsidering both of these aspects
of
probit estimation leads ultimately to an improved range for the censor value,
n.
If instead of a forced separation at zero, the latent quantities could be transformed to produce a sepa ration point endogenously, as part of the estimation itself, this separation point would provide a natural alternative to the bounds presented in Eq.
(6).
Using this fact, we derive estimates of the censor value by forcing the probit draws
to
mimic the actual observed data. However, in order to do
so
we require the latent data
to
possess the same error variance as the data in the Tobit regression. With this variance modifica tion at hand, regressing the probit model through the srcin, using the Tobit variance as its fixed scale, we derive a potentially improved range for the censor value, namely the interval
n
E
[max(vi,
i
E
c},
min{yi,
c)].
7)
Comparing the intervals in Eqs.
6)
and
7),
he latter interval is suspected to provide possible improvement due to the fact that the lower bound in Eq.
7)
is es timated from the given covariates. Conceivably, this range could result in a latent draw max{v,,
i
E
c}
that is negative, but this will only be the case when the estimate of the propensity to participate by the agent ‘closest’ to participation (that is, the largest latent value) is negative. In this case the interval defined by Eq.
7)
provides less precise information than the one in Eq.
6).
However, such an occurrence is unlikely to arise whenever fixed cost constrains participation by making positive latent quantities economically infeasible.
G.
Holloway
et
ul.
/Agricultural Economics
31
2004)
97106
zyx
01
2.3.
Alternative three
zyxwvuts
The link to the Tobit regression
in
Eq.
7)
is an important one, but it is indirect.
zyxwvu
third approach that provides
a
more direct linkage is to use the max imum values of the latent draws for the censored observationsin other words, draws from the Tobit regression itself. This approach is a natural extension
of
the logic developed for the use of the probit latent variables and consists of drawing values for
zyxwv
rom the interval
(8)
E
[max{zi,
zyxwvut
E
c}
min[yi,
zyxwvut
zyxwvut
c}].
This approach links the interval for the censor value directly to the Tobit regression and, thus, eliminates the need for probit estimation.
2.4.
Experimental evidence
With three alternatives to the conventional Tobit available, there appears to be considerable scope for deriving ‘improvements’ over the traditional model and
a
possibility for identifying a ‘preferred procedure’ from the pool of available alternatives. These questions are pursued in the context of some fairly comprehensive experiments using simulated data and
a
wide range of censor values. Space limits reporting the experiments and their results details are available upon request), except
to
say that two fea tures
of
the exercise are particularly noteworthy. First, some fairly definitive conclusions emerge from the experiments, including
a
consistent ranking among the three alternatives. Compared to the conventional Tobit regression, in which the censor value is assumed to be zero; the first, alternative procedure Eq.
(6))
generates considerable improvements
in
estimation accuracy; compared to the first alternative, estimation accuracy is further enhanced by combining the probit and Tobit models Eq.
7));
and
a
further improve ment is evident when the draws for the censor value are restricted by the latent data generated solely from the Tobit regression Eq.
8)).
Second, these experi ments suggest
a
clear candidate for comparison with the traditional model and raise considerable scope for empirical enquiry. Thus, the interval in
Eq.
8)
is the one applied in the empirical application that follows.
3.
Empirical application
As
noted earlier, assessing factors influencing mar ket participation in developing countries is
a
com mon application
of
the conventional model Goetz,
1992).
Where data on marketable surplusthat quan tity of food product not consumed by the household itselfare available,
a
standard entryanalysing proce dure regresses marketable surplus on a set of relevant household characteristics. Because nonparticipation is often at issue, some data are censored and Tobit estimation
is
relevant. The estimated regression
is
capable
of
identifying the subset of covariates that impact the entry decision and, also, predicting the lev els of these covariates that are required for entry. Our concern lies in the extent to which the bias arising in this situation leads to significant biases in policy recommendations; leads, in turn, to false inferences about reform; and leads, therefore, to incorrect pre scriptions for economic policy. The bias arises from
a
basic analogy to the theory of the firm: when fixed costs are relevant, there exist finite, nonnegligible, quantities of marketable surplus net supply quan tities) below which household participation
in
the relevant market becomes infeasible. Householdlevel data on milk sales in the Ethiopian highlands are used to compare the conventional model that is, the zerocensored regression) and the true model the Tobit regression that allows for a nonzero censor value). In the highlands, significant transac tions costs prohibit entry for many households and recorded data on milkmarket participation are there fore censored. Identifying the levels of covariates that influence the entry decision is one relevant objective, but principal interest lies in characterising the rnini
mum
efficient scale of operations for households to participate in the market. Domestic dairy production has potential to gen erate income and employment on
a
large scale in the penurban areas of subSaharan Africa.2 How ever, growth in dairying by smallholder farmers
in
periurban areas has been limited by transactions costs
For the purpose
of
present discussion, the term ‘periurban’ defines those locations in geographical proximity
to
a major urban area (such as Addis Ababa) from which fluid milk and other dairy products are feasibly supplied to urban markets. The term
is
defined
by
Staal
1995).