Journal of Archaeological Science
(1997)
24,
347–354
Some Archaeological Applications of Kernel Density Estimates
M. J. Baxter and C. C. Beardah
Department of Mathematics, Statistics and Operational Research, The Nottingham Trent University,Nottingham NG11 8NS, U.K.
R. V. S. Wright
Prehistoric and Historical Archaeology, University of Sydney, NSW 2006, Australia
(
Received 10 November 1995, manuscript accepted 11 March 1996
)
Kernel density estimates, which at their simplest can be viewed as a smoothed form of histogram, have been widelystudied in the statistical literature in recent years but used hardly at all within archaeology. They provide an e
ﬀ
ectivemethod of data presentation for univariate and particularly bivariate data and this is illustrated with a range of examples. The methodology can be used as an informal approach to spatial cluster analysis, and one example suggeststhat it is competetitive with other approaches in this area. A reason for the lack of use of kernel density estimates byarchaeologists may be the lack of accessible software. The analyses described here were undertaken in the MATLABpackage using routines developed by the second author, and are available on request.
1997 Academic Press Limited
Keywords:
KERNEL DENSITY ESTIMATES, BIVARIATE DATA, CONTOURING, SPATIALCLUSTERING, MATLAB.
Introduction
K
ernel density estimates (KDEs) at their simplestcan be thought of as an alternative to thehistogram. They typically provide a smootherrepresentation of the data and, unlike the histogram,their appearance does not depend on a choice of starting point. In this sense KDEs alleviate problemswith the histogram that have been perceived by somearchaeologists (Whallon, 1987).
The smoothness of the KDE means that it isaesthetically more pleasing than the histogram. It alsofacilitates the presentation of several data sets in asingle ﬁgure, and makes it easier to compare data sets.This has been argued and illustrated in Baxter &Beardah (1995
b
).It might be argued that, with univariate data, theadvantages of using a KDE as opposed to a histogramfor data representation are not so great as to causethem to be preferred on a routine basis. For bivariatedata the case for using KDEs is much stronger, and thepurpose of this paper is to illustrate this by example.Twodimensional histograms require large amounts of data, are unwieldy, may be di
ﬃ
cult to interpret, andcannot easily be used as the basis for other methods of data representation such as contouring. This paper willillustrate how KDEs readily overcome these problems.Although the possibility of using KDEs for archaeological data presentation is implicit in Orton’s (1988)comments on Whallon’s (1987) paper, we are notaware of any such uses outside our own work. Anexample of an application to bivariate data is given inBaxter & Beardah (1995
a
). This arose when one of us(MJB) wished to explore the potential of the methodology for representing results from a principal component analysis of archaeometric compositional data andasked the second author (CCB) if it was possible to dothis in the MATLAB package. Subsequent collaboration, described in Beardah & Baxter (1995) andBaxter & Beardah (1995
b
), has led to the developmentof a set of MATLAB routines that include many of theapproaches described in the recent book by Wand &Jones (1995). That book, the earlier text of Silverman(1986), and the paper by Bowman & Foster (1993) may
be referred to for the technical developments thatunderpin the work described here.The main ideas of kernel density estimation necessary for this paper are presented in the next section,with more technical detail and discussion of computational matters in the appendix. The main section of the paper illustrates applications of the methodology,and the concluding section summarizes what we thinkare its merits.
Kernel density estimation
Histograms are among the most common methodsof data presentation in archaeology. Anyone whohas drawn a histogram by hand will know that its
347
03054403/97/040347+08 $25.00/0/as960119
1997 Academic Press Limited
appearance may be crucially a
ﬀ
ected both by the pointat which the histogram is started—the srcin—and thewidth of the intervals used, or ‘‘binwidth’’. Goodcomputer software packages will make automatic andsensible choices for the origin and binwidth, but itshould be possible to vary these and this will a
ﬀ
ect theresults obtained.Let the srcin of the histogram be
m
0
, with subsequent interval boundaries at
m
1
,
m
2
, etc. and assumethat (
m
j
–
m
j–1
)=
c
for some constant
c
for
j
=1,2, . . . (i.e.intervals are of equal width). Let
and
q
be values suchthat
is small and
q
=
c
. It is then possible to imaginethe construction of successive histograms with srcinsat (
m
0
+
i
) for
i
=0,1, . . . ,
q
–1. If the
q
histograms soobtained are averaged then an average shifted histogram (ASH) (Scott, 1992) is obtained. The appearanceof the ASH will
not
be dependent on the choice of
m
0
.Its smoothness will depend on
c
, and increases as
c
increases. The limiting form of the ASH, as
0, is akernel density estimate. An example is given in Baxter& Beardah (1995
b
).Another way to think of KDEs is as follows. Given
n
points
X
1
,
X
2
, . . . ,
X
n
situated on a line a KDE canbe obtained by placing a ‘‘bump’’ at each point andthen summing the height of each bump at each pointon the Xaxis. The shape of the bump is deﬁned by amathematical function, the kernel
K
(
x
), that integratesto 1. The spread of the bump is determined by awindow or bandwidth,
h
, that is analogous to thebinwidth,
c
, of a histogram. The kernel is usually asymmetric probability density function.The shape of the resulting KDE does not depend ona choice of srcin and is relatively insensitive to theexact form of
K
(x), which is taken to be a normaldensity function in the rest of the paper. The choice of
h
is more critical and will be considered shortly.We have presented two simple ways of conceptualising what a KDE is. Mathematically, the latterapproach gives the KDE aswhere
f

(
x
) is an estimate of the density underlying thedata.Large values of
h
oversmooth, while small valuesundersmooth the data. A variety of approaches can beused to select
h
, including subjective choice and it mayoften be sensible to look at KDEs for several valuesof
h
.More objective or datadriven choices of
h
can bemade, and a wide range of methods have been proposed for this. These are described in detail in Wand& Jones (1995) and in summary form in Baxter &
Beardah (1995
b
). An outline of a subset of thesemethods is given here.The data can be thought of as a sample of
n
froman underlying and unknown true density,
f
(
x
). It ispossible to deﬁne a measure of ‘‘closeness’’ between theKDE and the true density, leading to an estimate of
h
that ‘‘maximizes’’ the closeness. If it is assumed thatthe true density is normal then it can be shown that anoptimal choice of
h
is
h
=1·06
n
1/5
ˆ,where
ˆ is an estimate (possibly robust) of
, the S.D.of the normal distribution. This is the
normal scale
ruleand will typically oversmooth the data if the underlying density is not normal.The estimate of
h
depends, in general, on propertiesof the true density that are unknown, and in particularon a quantity that may be interpreted as the ‘‘roughness’’ of the density. A family of direct plugin (DPI)estimates can be deﬁned in which an estimate of
h
canbe obtained by ‘‘pluggingin’’ an estimate of roughnessinto the equation that deﬁnes
h
. More details are givenin the Appendix.A related approach is the ‘‘solve the equation’’(STE) method, in which an equation that relates
h
to afunction of the unknown density is deﬁned. In essence,an initial estimate of
h
leads to an estimate of thedensity, that in turn leads to a new value for
h
and anew density estimate. The process continues until theestimate of
h
converges. Wand & Jones (1995: 96)suggest that a suitable data analytic strategy is to lookat several di
ﬀ
erent estimates of
h
, but that if a singlevalue is required DPI and STE estimates appear to beamong the more suitable.The prime purpose of the paper is to illustrate theuse of bivariate KDEs and the generalization to theseis relatively straightforward. By analogy with theprevious discussion of univariate KDEs we maythink in terms of
n
points in a plane deﬁned bycoordinates
X
(
i
)
=(
X
i
,
Y
i
), for
i
=1,2, . . . ,
n
. Locatinga ‘‘bump’’ at each point corresponds in this caseto centering a threedimensional bump or ‘‘hill’’ ateach point and then, at each point in the plane,summing the height of the bumps. The bump, orkernel, is taken in this paper to be a bivariate normaldistribution.For two variables,
X
and
Y
, a bivariate normaldistribution is deﬁned by the means of
X
and
Y
, takento be zero; their S.D.; and their correlation, whichdetermines the orientation of the bump. If this correlation is taken to be zero, as we do here, then smoothing will be in the direction of the coordinate axes andthe degree of smoothing is determined by the S.D. Onewill often not lose much by taking the correlation to bezero, whereas smoothing equally in both directions, byusing the same windowwidths, is not generally tobe recommended (Wand & Jones, 1995: 108).The theory underlying the optimal choice of windowwidths is not as well developed for the bivariate as for the univariate case. The examples in thispaper use windowwidths for the
X
and
Y
directionsdetermined as for the univariate case, using either STEestimates or the normal scale estimates.
348 M. J. Baxter
et al.
With the assumption of zero correlation therepresentation of the bivariate KDE,
f

(
x
,
y
), is given bywhere
h
1
and
h
2
are the windowwidths in the
X
and
Y
directions.An attraction of using KDEs is that they can be usedas a basis for producing contour plots of the data andthis leads to graphical representations of data of a kindthat archaeologists should ﬁnd familiar. The followingdiscussion of how contouring can be used is based onthe paper by Bowman & Foster (1993).After a bivariate KDE has been obtained each(twodimensional) data point is associated with adensity height that may be ranked from largest tosmallest. The ﬁrst 50% ranked observations, forexample, may be used to deﬁne contours that enclosethe densest 50% of the data. The level of contouringcan be varied to contain any speciﬁed proportion of the data, and several contours can be superimposedon a plot, with the original data if this is helpful.Bowman & Foster (1993: 173) note that in someways this provides a twodimensional analogy to theonedimensional boxplot, and also that the approachis useful for looking for modes or clusters in thedata.A further extension, noted in the same paper, occurswhen the data points can be classiﬁed, by period orcontext for example. In this case a particular contourlevel such as 75% might be selected and then contoursat this level drawn for each group separately, to revealhow similar or distinct they are. This will also beillustrated in the next section.
Examples
There are many ways in which univariate KDEs mightbe used in archaeology, and several of these have beenillustrated in our previous work. Data presentation fora single data set and comparison between the distributions of di
ﬀ
erent data sets are obvious uses. It isworth remarking that the boxplot, another good wayof looking at and comparing univariate data, does notwork well with multimodal data. Bounded data, in thesense that certain values are impossible, and dataa
ﬀ
ected by outliers can be handled using boundarykernels and adaptive estimates respectively, and thisis discussed and illustrated in Beardah & Baxter(1995).For practical purposes a distinction may be drawnbetween kernel density estimation as applied to simple,or simply transformed, variables, and as applied tocomposite variables such as those derived in principalcomponent and other forms of multivariate analysis.This latter greatly extends the potential for the use of KDEs and is illustrated in Examples 1, 3 and 4.
Example 1
Principal component analysis is one of the more commonly used multivariate methods in archaeology and adetailed account and bibliography is given in Baxter(1994). Typically, data are standardized and an analy
sis results in new, linear combinations of the srcinalvariables, called principal components, that can beinspected for structure using plots (usually) based onthe ﬁrst two or three components. If there is structurein the data it will often show in the ﬁrst component andit can be useful to examine this using a KDE.The data used for the ﬁrst example are 105 specimens of Roman waste glass, with a principal component analysis based on their chemical compositionwith respect to 11 oxides. The data are given, andextensively analysed, in Baxter (1994). The specimens
come from two sites and the statistical analyses suggestthat there are perhaps three clusters in the data that arerelated to, but do not exactly coincide with the siteclassiﬁcation.As a ﬁrst illustration of kernel density estimationFigure 1 shows two KDEs for the principal componentscores, based on the normal scale estimate of
h
and anSTE estimate of
h
. The normal scale estimate oversmooths the data, as expected, and misses the centraland smaller mode suggested by the STE approach.The usual bivariate component plot can be represented by a KDE in various ways. Figure 2 shows ascatter plot of the scores on the ﬁrst two componentsand Figure 3 shows a KDE using the STE estimate of
h
. Three main concentrations are evident. For thisexample inspection of the scatterplot has led one of us(Baxter, 1994) to the same conclusion, so that a KDE
is not essential. In Examples 3 and 4 much largerdata sets are used for which the scatterplot is a lessuseful tool.
80.30–8First component
R e l a t i v e f r e q u e n c y
20.050.250.20.150.1–6 –4 –2 0 4 6
Figure 1. Two univariate kernel density estimates for scores on theﬁrst principal component of an analysis of the chemical compositionof 105 specimens of RomanoBritish waste glass. ——: STE rule; – – –: normal scale rule.
Kernel Density Estimates 349
Example 2
An obvious use for bivariate KDEs is in the presentation and interpretation of spatial data in the form of coordinates of ﬁnd spots, for example. To illustratethis an ethnoarchaeological data set, Binford’s (1978)Mask Site data, is used. The data are taken fromappendix A of Blankholm (1991), who uses them to
test a variety of approaches to intrasite spatial analysis.The data, as presented by Blankholm, consists of thespatial coordinates of ﬁve classes of ﬁnd that mightoccur in the archaeological record, such as artefacts,large bones and bone splinters. We use the subsetbased on the coordinates of the locations of 276 bonesplinters.Figures 4 and 5 show analyses in which the normalscale rule and STE estimates have been used to determine window widths separately for the two coordinatedirections. Both analyses show 25, 50, 75 and 100%contours superimposed on the distribution of the bonesplinters. Once again the normal scale analysis produces a smoother picture. There are clearly three mainconcentrations in the data with the STE analysissuggesting a subdivision of one of these, in the bottomright of the graph, into two groups and a ﬁfth group inthe upper left of the ﬁgure.It is instructive to compare our results with thoseobtained by a variety of methods in Blankholm (1991).His ﬁgure 9, using contouring at equal heights (ratherthan encompassing speciﬁed proportions of the data),is less revelatory of structure than our ﬁgures, while a
k
means cluster analysis (his ﬁgure 17) suggests a threecluster distribution. Contour maps or clustering arisingfrom local density analysis (his ﬁgure 32) and nearestneighbour analysis (his ﬁgure 39) are also given. Wethink that our ﬁgures, and particularly that for theSTE analysis, suggest structure as well as—or moreclearly than—the analyses in Blankholm (1991).
54–8–5Component 1
C o m p o n e n t 2
2–620–2–4–4 –2 0 41 3–3 –1
Figure 2. Principal component plot for the ﬁrst two componentsfrom an analysis of the chemical composition of 105 specimens of RomanoBritish waste glass.
60.20–6Component 1
R e l a t i v e f r e q u e n c y
20.050.150.1–4 –2 0 4Component 2 –550
Figure 3. A KDE estimate, based on an STE rule for the selection of
h
, for the data.
13123Component 1
C o m p o n e n t 2
5981074 661145 1211107 8 9
Normal scale rule
Figure 4. A KDE of the Mask Site data using the normal scale rule.The contours are for 25, 50, 75 and 100% inclusion levels.
131243Component 1
C o m p o n e n t 2
5981074 66115 1211107 8 9
STE rule
Figure 5. As for Figure 4 but using an STE estimate.
350 M. J. Baxter
et al.