Environ Ecol Stat (2007) 14:411–431DOI 10.1007/s106510070028x
A framework for predicting personal exposuresto environmental hazards
James V. Zidek
·
Gavin Shaddick
·
Jean Meloche
·
Chris Chatﬁeld
·
Rick White
Received: 1 September 2005 / Revised: 1 February 2007 / Published online: 6 September 2007© Springer Science+Business Media, LLC 2007
Abstract
Thispaperpresentsageneralframeworkforconstructingapredictivedistributionof the exposure to an environmental hazard sustained by a randomly selected member of adesignatedpopulation.Theindividual’sexposureisassumedtoarisefromrandommovementthrough the environment, resulting in a distribution of exposure that can be used for environmental risk analysis. A specialization of the general framework is that of predicting humanexposure to air pollution that can be used to develop models for such things as exposureto particulate matter; practical aspects of their construction are considered. These modelscan help answer questions such as what fraction of the population sustained ‘high’ levels of exposure for say 5days in a row. The immediate implementation of the above frameworktakes the form of a computing platform referred to as
pCNEM
. This provides a facility forsimulating exposures to airborne pollutants and is described in detail elsewhere. This paperconsiders some theoretical aspects underpinning probabilistic exposure models of this type,with the ideas illustrated in developing a model for predicting human exposure to
PM
10
.
Keywords
Environmental epidemiology
·
Air pollution
·
Health effects
·
Personalexposure
·
pNEM
·
APEX
·
SHEDS
J. V. Zidek (
B
)
·
R. WhiteDepartment of Statistics, University of British Columbia,6356 Agriculture Rd, Vancouver, BC, Canada V6T 1Z2email: jim@stat.ubc.caG. Shaddick
·
C. ChatﬁeldDepartment of Mathematical Sciences, University of Bath,Claverton Down, Bath, BA2 7AY, UKJ. MelocheAvaya, 233 Mount Airy Road, Basking Ridge, NJ 079202336, USA
1 3
412 Environ Ecol Stat (2007) 14:411–431
1 Introduction
This paper presents a framework for predicting human exposure to an environmental hazard.It begins with a general version and goes on to one specifically for air pollution. Finally, forcompleteness it brieﬂy describes pCNEM (Zidek et al. 2000) whose usage is discussed in
a companion paper (Zidek et al. 2005). More detail is available in Zidek et al. (2003). The
basic concepts described in the paper have been implemented in a number of models in asimilar context such as Ott et al. (1988) and MacIntosh et al. (1995).
The term environmental hazard is used to describe any agent thought to have a negativeimpact. Such hazards include, for example, water and airborne mercury, ultra–violet radiation, nuclear radiation, radon and electromagnetic radiation, although the latter remainsthe subject of much controversy. All these agents have potential negative impacts on bothhuman and nonhuman species. We focus primarily on human health/welfare impacts sincethey dominate environmental risk analysis. Moreover, the complexity of humans’ interactionwith the environment make the use of such models essential, if the deleterious effects of measurement error on impact assessment is to be avoided. However in principle, the generalframework could be applied to a much wider class of applications.These models have played important roles, for example, in formulating air quality criteriafor air pollutants (Ozone 2006). In this context, estimates of personal exposures could beused in studies relating pollution to health outcome, for example health impact analysis.The framework and predictive distributions built upon it have two important applications.First they can quantify the effects of abatement strategies (e.g., regulations and mandatorysurveillance) by running them before and after the hypothetical change. The importanceof that application led the Environmental Protection Agency (EPA) in the US to develop asequence of exposure forecasting models for carbon monoxide and ozone as special casesof the stochastic model pNEM (Law et al. 1997), which stands for ‘probabilistic version of NEM’ where NEM stands for ‘National Ambient Air Quality Standards Exposure Model.’The latter, unlike pNEM is deterministic, meaning that the output is the same on every simulationrun.
TheStochasticHumanExposureandDoseSimulationModel
(SHEDSPM;Burkeet al. 2001), a successor to pNEM developed to forecast
PM
2
.
5
exposure, is described in PM(2003).ThatreportdescribeshowSHEDSPMcanbeusedinsettingstandardsandreviewsasubstantialbodyofrelatedwork.Theozonecriteriondocument(Ozone2006)proposesuseof
another US EPA product, the
Air Pollutants Exposure Model
(APEX) for setting ozone standards.AswithpCNEM,whichisthepracticalimplementationoftheframeworkdescribedinthis paper (andis an enhancementof pNEM),APEX evolveddirectly from pNEM.AlthoughAPEX differs from pCNEM (Zidek et al. 2005) in certain fundamental respects they do haveimportant conceptual elements in common.A second largely unrecognized application is to improve population exposure estimation and hence reduce
measurement error
. Furthermore, a predictive distribution like thatprovided by pCNEM, used in conjunction with an exposureresponse function to computethe expected beneﬁt, incorporates the prediction uncertainty. Shaddick et al. (2006) investigated the use of such predictive distributions in analysing the effect of shortterm changes in
PM
10
in relation to mortality in senior London males living in the subregion of Bloomsbury.Theyinvestigatedthedifferencesbetweenusingambientmeasurementsandtheoutputsfromthe pCNEM simulator and found a marked increase in the relative risk (associated with a10
µ
gm
−
3
increase in
PM
10
) when using the latter. As noted by the anonymous referee,that is not altogether surprising since it is well known that ambient concentrations exceedpersonal exposures to ozone and hence the relative risk needs to adjust to that bias. However,
1 3
Environ Ecol Stat (2007) 14:411–431 413
what is surprising perhaps is that the relative risk goes from being just barely significant tostrongly significant, judging from the 95% predictive interval for that risk.Independently, Caldor et al. (2003) in other unpublished work found a significant association between
PM
2
.
5
and daily cardiovascular mortality in eight selected counties in NorthCarolina using a simpliﬁed version of SHEDSPM and a different approach to incorporatingtheexposureestimatesintothehealth(thanShaddicketal.(2006)whousedaBayesianframework to incorporate the variability in the exposure estimates into the health model). Howeverthe Caldor analysis does not discuss the advantages, or otherwise, that using SHEDSPMbrought to the health analysis.Berbaneetal.(2004)describethegeneralmicroenvironmentalmodelingapproach.Moreover,theyconstructsuchamodelfortheapplicationdescribedintheirpaper,albeitonebasedonstatic estimated fractionsof time spentin different MEsratherthan dynamicresamplingsof real time activity patterns as in pCNEM. They ﬁnd a modest improvement in the level of significance “in some cases.” However, the small within community variation in exposureestimated by their model meant that they did not ﬁnd much improvement in individual cases.Our framework recognizes that exposure to an environmental hazard is not determinedby external (ambient) levels alone. In fact, exposure depends on the temporal trajectoriesof the population’s members. That trajectory takes any given member through a sequenceof
microenvironments
(MEs). Moreover, the choice of these MEs may be a reﬂection of theambient conditions. For example, on a sunny day when UV radiation is strong, some maychoose not to be exposed at all by selecting appropriate MEs, while others choose to bemaximally exposed. That exposure will depend on location. Mercury, another environmentalhazard, provides a second example where in particular exposure may reﬂect ambient conditions. It can be found in air, sea and land. Sources include emissions from power plantssince it is found in coal. Once airborne, it can be transported over long distances and mayeventually end on land through wet and dry deposition. From there it can be leeched intolakes or taken up into the food chain. Levels of ambient mercury are thus determined byvarious environmental factors of which some like wind direction are random. These ambientlevels help determine exposure through such things as food consumption, ﬁsh or shellﬁsh(that contain a very toxic form called methylmercury) being a primary source. Alternatively,exposure can be from breathing air containing mercury vapor, particularly in warm or poorlyventilated rooms, that comes from evaporation of liquid mercury which is found in manyplaces like school laboratories. In any case, exposure will depend on the MEs visited, thetime spent in them and the internal sources therein. Clearly individual factors such as agedeterminethetimeactivitypatternsthataffectthelevelsofexposuretothehazard.Forexample, breast milk is a source of mercury in many MEs for infants. Some MEs may have theirown internal sources of the hazard and others not. The general framework we introduce inthis paper must allow for the installation of these sources. Finally, the random length of timespent in each ME will also help determine the average and cumulative level of exposure.Models built on the framework in this paper can help answer questions such as: (i) Whatfraction of the population sustained ‘high’ levels of exposure? (ii) What beneﬁt will a mitigation strategy have on those under the age of 4?The paper is organized as follows: Sect. 2 provides a general framework for exposuremodeling and the theoretical aspects of sampling variability and uncertainty that are encountered. Section 3 goes on to develop a more specialized framework for predicting exposure toany given air pollutant, using an illustrative application to
PM
10
to help clarify the abstractconcepts. A number of the practical difﬁculties in developing a model for a particular pollutant are discussed. Section 4 provides a discussion of the framework and ways in which itcould be improved.
1 3
414 Environ Ecol Stat (2007) 14:411–431
2 A general framework
This section presents the three components of a general framework for exposure assessment: (i) the underlying probabilistic structure that provides the inferential foundation; (ii)the ‘building blocks’ (structural elements); (iii) the links between these blocks (the stochastic structure). The generality of the framework ensures its broad applicability. Its stochasticfoundation provides an essential basis for assessing predictive performance. For example,we candetermine whether anexposureestimator is unbiasedandits standardpredictive errorfor example.The stochastic structure is based on the theoretical probability space,
(,
A
,
P
)
. Theindividual components of this space are as follows:
—thesamplespaceofallunderlyinginformationrelatingtoanindividual’sexposuretothehazard in question. Items sampled from this space are denoted by
ω
, a purely conceptualdevice labeling the sum of information about all factors associated with an individual’sexposure.[
A
]—the collection of subsets of
representing the outcome of a sampling experiment.Membership of this class (
C
∈
A
) occurs when
ω
lies in
C
. Intuitively,
P
(
C
)
is thefraction of all
ω
s in
C
if all elements of
have an equal probability of being sampled.[
P
]—the population distribution of the
ω
s, which is unknown.In reality, not all the information linked to
ω
can be observed. In practice only attributesrecognized as being relevant, in terms of there being a link to exposure which can be implemented within the simulation model, and indeed measurable are considered. This leads ingeneral to a random multidimensional array
X
=
X
(ω)
of responses.
X
might be a vectorrepresenting exposures in successive time periods. If several hazards were being monitoredover time,
X
would be a matrix. Such a matrix could even represent exposure to a singlehazard through several distinct media such as air and water in successive time periods. Theproposed setup is therefore general in nature.2.1 Estimating population distributionsThe population distribution
P
, induces a population distribution for
X
,
P
X
, deﬁned by
P
X
(
D
)
=
P
(
C
)
where
C
= {
ω
:
X
(ω)
∈
D
}
. Like
P
it is unknown and in practice needsto be estimated by sampling. Good sampling designs are vital to obtaining good estimatesof population distributions. To characterize such designs, suppose
˜
ω
is selected at randomfrom
. Sampling is ‘representative’ if the probability that
˜
ω
is in
C
equals
P
(
C
)
for all
C
. In that case, the sampling plan is ‘unbiased.’ That notion extends in an obvious way tothe sampling distribution of say
Y
=
X
(
˜
ω)
as well as to the case where a sequence of
ω
swill be required to construct sensible estimates of the population estimate. These ideas alsoextend to sampling from subpopulations when the population is stratiﬁed, for example, byage group. In this paper subpopulation sampling designs are assumed to be unbiased.Modeling has been simpliﬁed by splitting
ω
into two groupings, those associated withinternal factors (
I
), such as age and sex and those with factors which are external (
E
) tothe individual, such as the prevailing level of the hazard in question. Here, we consider thetheoretical aspects of using such groupings, a wider discussion of internal and external factors is presented in Sect. 3. Thus, we assume that
=
I
×
E
, or in other words, that
ω
=
(ω
I
,ω
E
)
for each
ω
∈
where
ω
I
∈
I
while
ω
E
∈
E
.
1 3
Environ Ecol Stat (2007) 14:411–431 415
Assume that
A
, the collection of subsets of
representing the outcome of a samplingexperiment, includes all events of the form C
I
×
C
E
where
C
I
and
C
E
are events in
I
and
E
, respectively.Given
ω
I
= ˜
ω
I
, for any given
˜
ω
I
, the (conditional) population distribution of
X
, theobservable information, is
P
(
X
∈
D

ω
I
= ˜
ω
I
)
for all
D
. Using a standard result fromprobability theory gives
P
X
(
D
)
=
E
I
[
P
(
X
∈
D

ω
I
= ˜
ω
I
)
]
,
where
E
I
denotes the average over all
˜
ω
I
in
I
with respect to the population distribution.This shows that
P
(
X
∈
D

ω
I
= ˜
ω
I
)
would, if known, be an unbiased estimator of
P
X
(
D
)
.When several successive
˜
ω
I
s are sampled in an unbiased manner, we readily see that anyweighted average of the conditional probabilities, if known, yield an unbiased estimator.Similar reasoning shows that the conditional probability distribution given
ω
E
= ˜
ω
E
, if known, also yields an unbiased estimator of
P
X
(
D
)
.As a generalization of the case of estimating
P
X
(
D
)
considered above, consider anyrealvalued function of
X
, i.e.,
G
(
X
)
. For example, suppose
X
=
(
X
(
t
1
),...,
X
(
t
n
))
fortimepoints,
t
i
,
i
=
1
,...,
n
where
X
(
t
)
is realvalued for all
t
. Interest may then focus onthe number of timepoints at which a typical individual sustains a high level of exposure,say when
X
(
t
i
) >
x
o
for some speciﬁed
x
o
. Now
G
(
X
)
=
ni
=
1
I
{
X
(
t
i
) >
x
o
}
wherein general
I
{
C
}
denotes the indicator function of the set
C
and is 1 or 0 according as
C
istrue or not. Note that we have the special case when
G
(
X
)
=
I
{
X
∈
D
}
, the 0–1 indicatorfunction of
D
. The population averages
E
[
G
(
X
)
]
and
E
[
G
(
X
)

ω
E
= ˜
ω
E
]
can then beexpressedintermsofpopulationandconditionalpopulationdistributions,respectively.Theirassociated unbiased estimators are then readily deduced. Where available, samples will bedrawn from a ‘real’ population, although there may be occasions where this is not feasible.In the latter case, it may be possible to use outputs from complex computer models (i.e., of an environmental process), which can be seen as providing a simulated population.Alternatively, the latter could be a small ﬁnite population of patterns of individual behavior combined with computer models to yield outcomes of randomized activity patterns, inwhich case
ω
E
will be ﬁxed. Continuing with this example, note that the model includes thepossibility of choosing different individuals at different timepoints. In other words,
ω
I
canindex a composite individual made up of successively sampled individuals, this allows anestimate of the uncertainty associated with each of the estimates to be calculated, in the formof a standard error.2.2 Standard errors of estimationIn order to use the model in statistical inference, a measure of the uncertainty associatedwith the exposure estimates must be obtained. Assume each of N individuals in a ﬁnitepopulation are exposed to a random pollutant hazard level at n successive times to yield
X
=
(
X
(
t
1
),...,
X
(
t
n
))
. To calculate the expected number,
ν
, of persontimes among these
N
individuals and
n
times whose individual exposures exceed a speciﬁed level,
x
o
, let
X
ik
be the exposure of individual k on day i for all i and k. Then
ν
=
E
ni
=
1
N k
=
1
I
{
X
ik
>
x
o
}
=
ni
=
1
N k
=
1
p
ik
, where
p
ik
=
E
[
I
{
X
ik
>
x
o
}]
. Note that if
p
ik
=
p
for all i and k,we obtain an expression involving the familiar formula for the expectation of the binomialdistribution, namely
ν
=
nNp
.To estimate
ν
within this sampling scheme, suppose a random individual
K
i
is sampledon day i where
π
k
=
P
(
K
i
=
k
),
k
=
1
,...,
N
depends on the sampling design used. Then
ˆ
ν
=
ni
=
1
I
{
X
iK
i
>
x
o
}
/π
K
i
is an unbiased estimator of
ν
. When, in particular,
π
k
=
N
−
1
1 3