Computation of the expected value and variance of the average annual yield for a stochastic simulation of rainwater tank clusters
John Mashford, Shiroma Maheepala, Luis Neumann and Esther Coultas
Commonwealth Scientific and Industrial Research Organization (CSIRO), Australia PO Box 56, Highett, Vic. 3190, Australia
Abstract

The problem of obtaining a detailed understanding of the behavior of a cluster (or a collection) of rainwater tanks is complex and can only be solved by simulation. If the collection of houses is very large, a tractable solution can only be obtained by stochastic simulation in which the parameters defining the houses and tanks are sampled from probability distributions. An important output from the rainwater tank simulation is the average annual yield of the cluster and it is of interest to know its expected value and variance for planning of urban water systems. This paper carries out a theoretical calculation of the expected value and variance of the average annual yield as functions of cluster size and presents an experimental confirmation of these results.
Keywords:
rainwater tank, stochastic simulation, yield, expected value, variance
1
Introduction
The amount of water that can be supplied from a rainwater tank (i.e. yield of a rainwater tank) depends on a number of properties of the house, the tank and the climate [1, 2, 3]. The relevant properties of the house can be
modeled as the roof area connected to the rainwater tank, the depression storage (i.e. retention storage of the roof which depends on the type of roof material and shape of the roof), the roof area loss factor (i.e. losses from the roof, which depends on roof material) and the way in which water stored in the rainwater tank is used by occupants of the house (i.e. demand time series). The demand time series is a function of the occupancy status of the house and the type of end uses for which rainwater is used. For example, rainwater can be used for garden use, toilet use, laundry use and hot water use. The principal relevant property of the rainwater tank is its volumetric capacity while the properties of the environment relevant to the rainwater tank’s evolution are the rainfall time series and the potential evapotranspiration (PET) time series as the evaporation from the roof of the house and the tank depend on the temperature, wind, humidity and so on. A quantity of considerable interest to urban water management planners is the yield of a collection of tanks over a period of time [4]. The behavior of a tank can be represented by time series {V
t
} and {Y
t
} where: V
t
= the volume of water in the tank at the end of time period t; and Y
t
= the yield from the tank during time period t. It has been shown that a daily time step is sufficient to accurately model a rainwater tank if there is no trickle supply to the tank from mains supply [3]. This study assumes that each rainwater tank is fitted with an appropriate valve which allows end uses of the tank to switch to mains supply when the tank has run out of water (i.e. there is no trickle supply from the mains). The behavior of the tank can be simulated by recursively solving the storage behavior equations for V
t
and Y
t
, t = 1, …, P where P is the number of time periods in the simulation [3]. In our simulation, the demand pattern is taken from a finite number, N_ds, of demand scenarios. The yield obtained in this way is a time series which is a function of the rainwater tank capacity C, the roof area A, the depression storage
δ
, the roof area loss factor L and the demand scenario number d. Thus: Y
t
= Y
t
(C,A,
δ
,L,d). (1) It is also implicitly a function of the rainfall time series and the PET time series which we consider to be the same for all houses. It can be shown using induction that Y
t
is a nonlinear continuous function of its continuous parameters for all t = 1, …, P. Let N_years be the number of years over which the simulation is taken. The average annual yield for a house defined by parameters C, A,
δ
, L and d is: Y = Y(C,A,
δ
,L,d) =
years N
_1
∑
=
Pt
1
Y
t
(C,A,
δ
,L,d). (2)
We will consider the problem of simulating a cluster of houses with rainwater tanks, where the parameters defining each house are chosen randomly according to probability distributions and the demand scenario for a house is chosen randomly from a number of possibilities associated with its occupancy status. In this paper, we will show that the expected value of the average annual yield is independent of the cluster size, while the variance of the average annual yield depends on the cluster size according to a hyperbolic function. This theoretical result will be confirmed by experimental computation. In Section 2, two deterministic examples motivating the development of the general formulation of the first result are presented. In section 3, the first result of the paper concerning the expected value of the average annual yield is proved. In Section 4, the second result, concerning the variance of the average annual yield, is proved. In Section 5, the experimental confirmation of these results is presented, while Section 6 provides a conclusion to the paper.
2
Two motivating examples
Consider a cluster made up of N identical copies of a house with parameters C, A,
δ
, L and d. The total yield of the cluster is:
∑
=
Pt
1
∑
=
N i
1
Y
t
(C,A,
δ
,L,d) =
∑
=
Pt
1
N Y
t
(C,A,
δ
,L,d). (3) Therefore the average annual yield (per house) is: Y =
N years N
_1
∑
=
Pt
1
N Y
t
(C,A,
δ
,L,d) =
years N
_1
∑
=
Pt
1
Y
t
(C,A,
δ
,L,d), (4) which is independent of the number of houses in the cluster. Averaging is being taken in two ways. Firstly, the annual yield is being averaged over all houses in the cluster. Secondly, the average annual value is being computed by summing over all time periods in the simulation and then dividing by the number of years in the simulation. Now, consider a slightly more complicated example. Suppose that we have a cluster made up of N different houses defined by parameters C
i
, A
i
,
δ
i
, L
i
, d
i
; i = 1, …, N. Now, scale up the cluster M times to form a cluster of MN houses in which each house in the srcinal cluster is duplicated M times. The total yield of the cluster is: Total yield =
∑
=
Pt
1
∑
=
N i
1
MY
t
(C
i
,A
i
,
δ
i
,L
i
,d
i
). (5) Therefore the average annual yield is: Y =
M N years N
_1
∑
=
Pt
1
∑
=
N i
1
MY
t
(C
i
,A
i
,
δ
i
,L
i
,d
i
) =
N years N
_1
∑
=
Pt
1
∑
=
N i
1
Y
t
(C
i
,A
i
,
δ
i
,L
i
,d
i
), (6) which is the same as the average annual yield of the srcinal cluster. Thus, scaling up a cluster has no effect on the average annual yield. This is true no matter how small the cluster is.
3
The expected value of the yield as a function of cluster size
We now want to consider the case in which a cluster is generated by sampling the parameters from probability distributions. The yield for a house during any time period is a function of the parameters for the house. We want to work out the average value of the average annual yield over a number of runs in which the probability distributions are sampled. We will first consider the case of one variable. Let v be a real nonnegative random variable distributed according to a probability distribution
ρ
: [0,
∞
)
→
[0,
∞
). Then: Pr(v
∈
[a,b]) =
∫
ba
ρ
(x)dx, (7) for 0
≤
a < b. Let f : [0,
∞
)
→
R
be a continuous function. Define, as usual [5], the expectation value of f to be: <f> =
∫
∞
0
f(x)
ρ
(x)dx. (8) A quick argument can be used to show that the mean of f(v) over a large number of trials is given by the expectation value of f. Suppose that we carry out n trials resulting in values v
i
of v. Then: mean =
n
1
∑
=
ni
1
f(v
i
) =
n
1
∑
∞=
0
j
∑
{f(v
i
) : v
i
∈
I
j
}
≈
n
1
∑
∞=
0
j
f(a_j)n
∫
jb ja
__
ρ
(x) dx
=
∑
∞=
0
j
f(a_j)
∫
jb ja
__
ρ
(x) dx
≈
∑
∞=
0
j
∫
jb ja
__
f(x)
ρ
(x) dx =
∫
∞
0
f(x)
ρ
(x) dx = <f>. In the above computation, {I
j
= [a_j,b_j]} is a fine partition of the interval [0,
∞
). The approximations become exact in the limit as the partition becomes sufficiently fine and the number of trials becomes infinite. If one has suitable closed form representations of the functions f and
ρ
, then it may be possible to evaluate the integral expression for <f> analytically. In other cases, it may be necessary to evaluate the integral numerically which may be as computationally expensive as evaluating the mean value by simulation [6, 7]. A similar argument holds for computing the long run mean value of a continuous function of more than one random variable. Consider a housing cluster of size N generated by random parameters C
i
, A
i
,
δ
i
, L
i
, d
i
; i = 1, …, N which are sampled from distributions
ρ
1
. : [0,
∞
)
→
[0,
∞
),
ρ
2
: [0,
∞
)
→
[0,
∞
),
ρ
3
: [0,
∞
)
→
[0,
∞
),
ρ
4
: [0,
∞
)
→
[0,
∞
) and
ρ
5
: {1, …, N_ds}
→
[0,1] with:
∫
∞
0
ρ
j
(x) dx = 1,
∀
j = 1, …, 4, (9) and:
∑
=
ds N k
_1
ρ
5
(k) = 1. (10) The average annual yield for one run or trial is : Y = Y(C
1
, …, C
N
, A
1
, …, A
N
,
δ
1
, …,
δ
N
, L
1
, …, L
N
, d
1
, …, d
N
) =
N years N
_1
∑
=
Pt
1
∑
=
N i
1
Y
t
(C
i
,A
i
,
δ
i
,L
i
,d
i
). (11) The expected or most probable annual average annual yield is given by : <Y> =
∑
=
ds N d
_11_
…
∑
=
ds N N d
_1_
ρ
5
(d
1
) …
ρ
5
(d
N
)
∫
∞
0
…
∫
∞
0
Y(C
1
, …, C
N
, A
1
, …, A
N
,
δ
1
, …,
δ
N
, L
1
, …, L
N
, d
1
, …, d
N
)
ρ
1
(C
1
) …
ρ
1
(C
N
)
ρ
2
(A
1
) …
ρ
2
(A
N
)
ρ
3
(
δ
1
) …
ρ
3
(
δ
N
)
ρ
4
(L
1
) …
ρ
4
(L
N
) dC
1
… dC
N
dA
1
… dA
N
d
δ
1
… d
δ
N
dL
1
… dL
N
=
N years N
_1
∑
=
Pt
1
∑
=
N i
1
∑
=
ds N id
_1_
ρ
5
(d
i
)
∫
∞
0
…
∫
∞
0
Y
t
(C
i
,A
i
,
δ
i
,L
i
,d
i
)
ρ
1
(C
i
)
ρ
2
(A
i
)
ρ
3
(
δ
i
)
ρ
4
(L
i
) dC
i
dA
i
d
δ
i
dL
i
=
N years N
_1
∑
=
Pt
1
∑
=
N i
1
∑
=
ds N d
_1
ρ
5
(d)
∫
∞
0
…
∫
∞
0
Y
t
(C,A,
δ
,L,d)
ρ
1
(C)
ρ
2
(A)
ρ
3
(
δ
)
ρ
4
(L) dCdAd
δ
dL =
years N
_1
∑
=
Pt
1
∑
=
ds N d
_1
ρ
5
(d)
∫
∞
0
…
∫
∞
0
Y
t
(C,A,
δ
,L,d)
ρ
1
(C)
ρ
2
(A)
ρ
3
(
δ
)
ρ
4
(L) dCdAd
δ
dL. This is independent of the number of houses in the cluster. The same average annual yield is obtained by doing many runs with one house as by doing fewer runs with many houses as long as the house and tank parameters are sampled from the same probability distributions. This implies that if the cluster size is varied with the same number of runs, the average annual yield will vary depending on the cluster size. As the cluster size increases (with the same number of runs) the average annual yield will become less variable and will begin to approach more closely the expected average annual yield. We will show evidence for this deduction in Section 5. It is difficult to evaluate the integrals representing the expected average annual yield <Y> because the yield functions Y
t
are not given in closed form but can only be obtained by simulation. Thus, in practice <Y> must be obtained by simulation. The number of trials required for the accurate simulation of <Y> can vary depending on the nature of the probability distributions of tank and house related variables and the nature of the yield functions Y
t
.
4
Variance of the yield as a function of cluster size
While the expected (average annual) yield is independent of the cluster size, the variance of the yield is dependent on the cluster size. The standard deviation
σ
of the yield is the square root of the variance
σ
2
where
σ
2
is the long run average of the square of the difference between the yield and the average yield. By the argument given above, this is given by:
σ
2
= <(Y<Y>)
2
>. (12) Now : <(Y<Y>)
2
> = <Y
2
+<Y>
2
2Y<Y>> = <Y
2
>+<Y>
2
2<Y><Y> = <Y
2
><Y>
2
. (13) Using Equation 11 we have: <Y
2
> =
∑
=
ds N d
_11_
…
∑
=
ds N N d
_1_
ρ
5
(d
1
) …
ρ
5
(d
N
)
∫
∞
0
…
∫
∞
0
(
N years N
_1
∑
=
Pt
1
∑
=
N i
1
Y
t
(C
i
,A
i
,
δ
i
,L
i
,d
i
))
2
ρ
1
(C
1
) …
ρ
1
(C
N
)
ρ
2
(A
1
) …
ρ
2
(A
N
)
ρ
3
(
δ
1
) …
ρ
3
(
δ
N
)
ρ
4
(L
1
) …
ρ
4
(L
N
) dC
1
… dC
N
dA
1
… dA
N
d
δ
1
… d
δ
N
dL
1
… dL
N
= (
N years N
_1
)
2
∑
=
ds N d
_11_
…
∑
=
ds N N d
_1_
ρ
5
(d
1
) …
ρ
5
(d
N
)
∫
∞
0
…
∫
∞
0
∑
=
Pst
1,
∑
=
N ji
1,
Y
t
(C
i
,A
i
,
δ
i
,L
i
,d
i
) Y
s
(C
j
,A
j
,
δ
j
,L
j
,d
j
)
ρ
1
(C
1
) …
ρ
1
(C
N
)
ρ
2
(A
1
) …
ρ
2
(A
N
)
ρ
3
(
δ
1
) …
ρ
3
(
δ
N
)
ρ
4
(L
1
) …
ρ
4
(L
N
) dC
1
… dC
N
dA
1
… dA
N
d
δ
1
… d
δ
N
dL
1
… dL
N
= (
N years N
_1
)
2
∑
=
Pst
1,
(
∑
=
N i
1
∑
=
ds N id
_1_
ρ
5
(d
i
)
∫
∞
0
…
∫
∞
0
Y
t
(C
i
,A
i
,
δ
i
,L
i
,d
i
)Y
s
(C
i
,A
i
,
δ
i
,L
i
,d
i
)
ρ
1
(C
i
)
ρ
2
(A
i
)
ρ
3
(
δ
i
)
ρ
4
(L
i
) dC
i
dA
i
d
δ
i
dL
i
+
∑
{
∑
=
ds N id
_1_
∑
=
ds N jd
_1_
ρ
5
(d
i
)
ρ
5
(d
j
)
∫
∞
0
…
∫
∞
0
Y
t
(C
i
,A
i
,
δ
i
,L
i
,d
i
) Y
s
(C
j
,A
j
,
δ
j
,L
j
,d
j
)
ρ
1
(C
i
)
ρ
1
(C
j
)
ρ
2
(A
i
)
ρ
2
(A
j
)
ρ
3
(
δ
i
)
ρ
3
(
δ
j
)
ρ
4
(L
i
)
ρ
4
(L
j
) dC
i
dC
j
dA
i
dA
j
d
δ
i
d
δ
j
dL
i
dL
j
: i, j = 1, …, N; i
≠
j}) = (
N years N
_1
)
2
∑
=
Pst
1,
(N
∑
=
ds N d
_1
ρ
5
(d)
∫
∞
0
…
∫
∞
0
Y
t
(C,A,
δ
,L,d)Y
s
(C,A,
δ
,L,d)
ρ
1
(C)
ρ
2
(A)
ρ
3
(
δ
)
ρ
4
(L) dCdAd
δ
dL + (N
2
N)
∑
=
ds N d
_1
ρ
5
(d)
∫
∞
0
…
∫
∞
0
Y
t
(C,A,
δ
,L,d)
ρ
1
(C)
ρ
2
(A)
ρ
3
(
δ
)
ρ
4
(L) dCdAd
δ
dL
∑
=
ds N d
_1
ρ
5
(d)
∫
∞
0
…
∫
∞
0
Y
s
(C,A,
δ
,L,d)
ρ
1
(C)
ρ
2
(A)
ρ
3
(
δ
)
ρ
4
(L) dCdAd
δ
dL) Thus the variance is given by :
σ
2
= (
N years N
_1
)
2
∑
=
Pst
1,
(N
γ
1
(t,s) + (N
2
N)
γ
2
(t)
γ
2
(s)) – <Y>
2
, = (
years N
_1
)
2
N
1
(
∑
=
Pst
1,
γ
1
(t,s) (
∑
=
Pt
1
γ
2
(t))
2
), (14) where :
γ
1
(t,s) =
∑
=
ds N d
_1
ρ
5
(d)
∫
∞
0
…
∫
∞
0
Y
t
(C,A,
δ
,L,d) Y
s
(C,A,
δ
,L,d)
ρ
1
(C)
ρ
2
(A)
ρ
3
(
δ
)
ρ
4
(L) dCdAd
δ
dL, (15) and :
γ
2
(t) =
∑
=
ds N d
_1
ρ
5
(d)
∫
∞
0
…
∫
∞
0
Y
t
(C,A,
δ
,L,d)
ρ
1
(C)
ρ
2
(A)
ρ
3
(
δ
)
ρ
4
(L) dCdAd
δ
dL. (16) If f : [0,
∞
)
4
×
{1, …, N_ds}
→
R
is a function define <f>
1
by : <f>
1
=
∑
=
ds N d
_1
ρ
5
(d)
∫
∞
0
…
∫
∞
0
f(C,A,
δ
,L,d)
ρ
1
(C)
ρ
2
(A)
ρ
3
(
δ
)
ρ
4
(L) dCdAd
δ
dL. (17) Then:
γ
1
(t,s) = <Y
t
Y
s
>
1
(18) and:
γ
2
(t) = <Y
t
>
1
. (19) The variance is a hyperbolic function of N and: limit as N
→
∞
of
σ
2
= 0. (20)
5
Experimental confirmation
The yield Y
t
is a nonlinear function of t and the system parameters which can only be computed by simulation. The results of computing the total yield over the period of simulation projected onto the parameters of tank size and roof area is shown in Figure 1. Simulation was carried out where the continuous system parameters were sampled from truncated normal distributions as described in [8]. In order to confirm the calculations of Section 3 and Section 4 in numerical detail it would be necessary to numerically compute multiple integrals such as those of Equations 15 and 16. However, the essential correctness of the results can be seen by examination of Figure 2 which shows the result of carrying out 50 runs for a variety of values for the cluster size. The curve labeled “Variable” shows the average over the 50 runs of the average annual yield and it is seen that this is approximately independent of the cluster size. The approximation becomes more accurate as the cluster size increases. This is because the number of houses sampled in the simulation is RN where R is the number of runs and N is the cluster size and as this number increases the average of the average annual yields tends more closely to the expected value, as described in Section 3. The curves labeled “Variable1SD” and “Variable+1SD” which show the standard deviation of the average annual yield over the 50 runs have the rational function form of the variance given by Equation 14. The curve labeled “Average” shows the result of computing the average annual yield for a house with parameters equal to the expected value of their respective probability distributions, which, since the yield function is nonlinear, is not equal to the expected value of the average annual yield (“Variable”) [9, 8]. Figure 1: Annual yield as a function of tank size and roof area (source: Neumann et al. [9]).