JOURNAL OF CHEMOMETRICS,
zyxwvut
OL.
zyxwvu
, 8193 1994)
SHORT COMMUNICATION A GRAPHICAL TECHNIQUE FOR ASSESSING DIFFERENCES AMONG A SET OF RANKINGS
z
AVID HIRST
zyxw
cottish Agricultural Statistics Service, Rowett Research Institute, Aberdeen,
U.K
AND TORMOD NAES
Matforsk, Osloveien
I
NI430 Aas, Norway
SUMMARY
A
graphical method
of
assessing differences between sets of rankings based on
cumulative
ranks
is
developed. The method can be used to identify rankings that differ over all
or
just part
of
the
range
of
objects ranked. The method is applied to an example of sensory evaluation of green peas
in
which
ten
assessors scored
six
attributes
on
each
of
60
samples.
KEY
WORDS
Sensory evaluation Cumulative ranks Assessor variation
1.
INTRODUCTION
Suppose there are
zyxwvuts
objects ranked from
1
to
n
by
zyxw
n
different assessors. It is
of
interest how these assessors differ in the way they have ranked the objects. Assume throughout that there are no tied ranks. Statistics such as Kendall’s coefficient of concordance’ can be used to get an overall measure
of
agreement among assessors, while pairwise rank correlations measure similarity between two assessors, but neither of these statistics gives any indication of how the relationship between assessors varies among the objects.
For
example, it is possible that the assessors agree exactly on the ranks of the ‘top ten’ objects but have no agreement among the others. Alternatively, they may agree on
a
division
of
the objects into two groups
of
high and low ranks but disagree on the ranks within those groups. In this communication a graphical technique based on cumulative ranks is developed for examining this kind
of
difference. One area where this kind
of
technique
is
useful is
in
the treatment
of
data from sensory analysis of food products. This kind of analysis, preferably done by
a
trained sensory panel, is a very useful way to measure important quality characteristics of food products. Typically,
a
number
of
assessors,
zyxwvu
n,
either rank or score on
a
numerical axis)
n
different products for
a
set of predetermined attributes. There are, however, a number of problems in this kind of analysis that require checking of the data before the final analysis.
For
instance, assessors can
08869383/94/01008
1
zyxwvutsrq
13 11
SO
zyxwvut
994
by John Wiley Sons, Ltd.
Received
18
March
1993
Accepted
3
July
1993
82
SHORT COMMUNICATION
zyxwv
misunderstand or confuse attributes, they can use the numerical axis differently
or
they may simply vary in their ability to detect differences between the products. Quick techniques such as the cumulative rank plots for revealing individual differences among the assessors are very useful since they allow problems to be detected and corrected at an early stage. An application of the plot in sensory analysis of green peas is discussed later. 2. THE CUMULATIVE
RANK
PLOT
Initially assume that there is some known underlying ordering of the objects. Let
r j
be the rank give by the jth assessor to the ith object according to this known order. Let its cumulative rank be cij, where This is minimized for any
zyxwv
=
1,
zyxwvu
..,
n
if
the assessor’s ranking agrees exactly with the underlying order, i.e.
if
r j
=
i
Therefore define the minimum cumulative rank as mini where If
all
the objects were given the same ranking, these rankings would clearly all equal
zy
n
+
1)/2 and the cumulative rankings would be equ;, where equ;
=
n
l)i/2. Now subtract this from c;j to get a polygonal graph yij
for
the jth assessor defined by
Yij
=
Cij
eqU;
(3)
This final stage creates the distinctive Ushape of the plots. Also define the minimum possible graph bi as
zyxwvut
4)
z
;
=
mini equi
3.
PROPERTIES
OF
THE PLOTS The basic idea of these plots is that
if
an assessor
is
in good agreement with the underlying order, then his graph will be close to the ‘baseline’ defined by the minimum graph. The following results are useful. a) The ‘area under the graph’, i.e. the area between an assessor’s graph and the baseline, is proportional to one minus the rank correlation Spearman’s rho) between that assessor’s ranks and the underlying order. Hence the further from the baseline an assessor’s graph is, the worse is his agreement with the underlying order.
For
proof see the Appendix. This also implies that rankings which are positively correlated with the underlying order will lie largely within the area bounded by the Ushape, while rankings with a negative correlation will lie largely within its reflection in the line
yij
=
0.
Hence it is very easy to distinguish between uncorrelated rankings and those with a high negative correlation. b) The plots are ‘selfscaling’ in that the size
of
the Ushape formed by the baseline depends only on the number
of
objects. All other graphs must remain above this line and below its ‘mirror image’ which corresponds to the reverse of the underlying order). This makes it very easy to compare two different plots. c) The height
of
the graph i.e. the difference between the graph and the baseline)
at
any point xis a measure of the ability
of
the assessor to distinguish objects
of
true rank
x
or lower
SHORT
COMMUNICATION
zyxwv
3
from those of higher rank, i.e. is
a
measure
of
the ‘confusion’ at that point. See the Appendix for details.
zyxwvut
4.
ESTIMATING THE UNDERLYING ORDER It is often the case that the underlying order is not known and therefore must be estimated. The simplest method, as recommended by Kendall, is to average the ranks over all assessors and to rank these mean ranks to get a ‘consensus’ ranking. This consensus has the property that of all possible rankings it maximizes the sum
of
correlations with the srcinal rankings. Therefore it minimizes the mean area under the graphs.
A
disadvantage of this method is that it gives equal weight to each assessor and therefore the consensus is distorted by ‘abnormal’ assessors. In particular,
if
one or more assessors are negatively correlated with the true order, the consensus will make no sense.
A
better alternative is to take the first eigenvector of
XXT,
where
X
is the
zyx
zyxw
rn
matrix of ranks, with columns representing assessors and rows objects, standardized by subtracting column means. Note that the elements of this eigenvector are the scores of the first component of
a
principal component analysis
of
the data, where the assessors are regarded as variables.) The elements of this vector are then ranked. This has the following advantage: the eigenvector is equal to
a
weighted sum of the srcinal rankings where the weights are proportional to the correlation with this sum. Hence negatively correlated assessors contribute in a sensible manner to the consensus, while uncorrelated assessors are downweighted. This is the method of estimating the underlying order used in all the examples in this paper.
zyxwv
5.
A
SCALE
FOR
THE PLOT Result a) in Section
zyxwvuts
says that there is
a
linear relationship between the correlation between any ranking and the consensus and the area under the graph. It is therefore possible to draw
a
scale on the plot which can be used to give a visual estimate of this correlation.
If
graphs corresponding to different correlations, e.g.
0.2,
0.4,
06
and
0.8,
are constructed, then the area under these graphs can be compared with the area under an assessor’s graph and an estimate
of
the correlation made. Clearly any graph with the correct area can be used as a scale, but there is one in particular that has
a
useful property. This is the ‘expected graph’ and is constructed as follows. Call the objects
01,
...,
on,
where the true rank
of
object
ok
is
k.
Assume that an assessor gives scores
s1
...,
,
to these objects, where
Sk

N k,
zyx
*).
Hence there
is
‘equal confusion’ about each object. Let the scores be ranked to give rank
rk
to object
Ok.
It can now be shown see Appendix) that the expected rank
of
Ok
is
E rk),
where
for
2k
<
n.
There is a simple extension to larger
k.
These expected ranks can be used to create an ‘expected graph’ whose shape and area depends only on
u
and
n.
For
any
u
it
is
easy to calculate this area and to relate it
to
the correlation.
A
plot of correlation against
IJ
for
n
=
60
is shown in Figure
1.
Using this plot, expected graphs corresponding to any desired correlations can be constructed and
so
a scale can be drawn on the cumulative rank plots. The advantage of this technique is twofold: firstly, it gives an easy method of drawing lines corresponding to any correlation; secondly, the shape
of
the line corresponds to an interesting
84
SHORT
COMMUNICATION
zyxw
1.0
zyxwvutsrq
zyxwvutsrqponmlk
0.9.
zyxwvuts
0.8.
0.7
0.6
0.5
0.4
0.3
zyxwvutsr
0.2
0.1
0
10
20
30
40
50
60
zyxw
0
80
90
100
Figure
1
Plot
of
standard deviation
of
scores
against correlation with consensus
for
zy
=
zy
0:
xaxis title, standard deviation; yaxis title, correlation with consensus
hypothesis, namely that there is equal confusion about all objects. Hence any marked divergence from this hypothesis can be seen.
6.
EXAMPLE
Sixty batches of peas consisting
of
27
varieties at different degrees of maturity were prepared and served to ten trained assessors in two replicates. The serving order was randomized. Six attributes were assessed on a continuous scale from
1
to
9.
They were ‘peaflavour’, ‘sweetness’, ‘fruitiness’, ‘offflavour’, ‘mealiness’ and ‘hardness’. This technique is known as ‘descriptive sensory analysis’. This experiment is perhaps rather unusual in the number
of
samples assessed, but it is by no means unique and serves as a good example of the use
of
the cumulative rank plot. It is possible that effects such as carryover and order of tasting will have
SHORT COMMUNICATION
85
z
influenced the results to some degree, but for the purposes of this example they will not be considered. In this kind
of
experiment there are often very large differences between assessors in terms of the proportion and area
of
the scale used and
in
their sensitivity to the attributes. For example, there is no particularly good reason why all the assessors’ scores should be linear functions
of
some underlying scale, though this is often implicitly assumed. See Reference
2
for a discussion of some possible sources of variation between assessors. Often all that can reasonably be assumed is that an assessor’s scores are some unknown monotonic function
of
an underlying scale. Therefore in this example only the ranks
of
the scores, meaned over the replicates, are considered, although this is not the usual way of analysing this type of data and it is possible that some information will be lost. Nevertheless, the cumulative rank plots give considerable insight to the performance of the assessors. The cumulative rank plots for all six attributes are given in Figure 2. Consider first ‘pea flavour’ in Figure 2(a). It is clear that three assessors differ markedly from the others, six are in good agreement with the consensus and one lies in between. The six ‘good’ assessors all lie below the
zyxwvut
.8
correlation line. It is also clear that there is much better overall agreement for peas with low ranks, i.e. those with the lowest peaflavour. This can be interpreted as a difference in sensitivity among the assessors. Six assessors can detect and assess peaflavour over the whole range, while three can only detect differences between the peas with least flavour. For ‘offflavour’ (Figure 2(d)) there is very little agreement for samples with low rank, but much better agreement for those peas with
a
high offflavour. Six assessors agree quite well for samples with ranks above
zyxwv
0,
while the other four only agree on the top ten ranks. There is clearly a difference in sensitivity here as well, but also there may well be no difference in offflavour for the lowestranking peas. For ‘mealiness’ (Figure 2(e)) there is very good agreement over the whole range among nine out of the ten assessors, with their graphs all lying below the
0.8
correlation line. One assessor, on the other hand, does not agree at all with the consensus. This was investigated further by constructing
a
‘replicate plot’. Here each assessor’s first replicate is regarded as the ‘consensus’ and the cumulative rank graph plotted for the second. All assessors are plotted on the same graph, though obviously the consensus is not the same in each case. The plot can then used to assess the consistency of the assessors. It is clear that for the one odd assessor there is no significant correlation between replicates, whereas
for
the others the correlations are all greater than 0.4, indicating a reasonable degree of consistency. The hardness plot is given in Figure 2(f). Here it is clear that all assessors agree very well on the rankings and this is supported by the replicate plot in Figure 3(b). There is also a suggestion in both
of
these plots that the assessors find it slightly easier to rank the harder samples. For comparison, Table
1
gives the rank correlations of each assessor with the consensus ranking for each attribute and also the coefficient of concordance for each attribute. (This is a measure of overall agreement among the assessors. See Reference
1
for details.) This table is in agreement with the conclusions from the plots, though less detailed information can be obtained from it. Finally, the plots have been used to investigate the relationship between the attributes for each assessor. The method used here is to construct
a
‘consensus attribute’ for each assessor by ranking the first eigenvector
of
YY
zyxwv
,
where
Y
is the
60
zyx
6
matrix
of
samples, by attributes. This is then used as the consensus and the cumulative rank plot constructed for the six attributes. These plots are shown in Figures 4(a) and 4(b) for two of the assessors. It is clear that for the first assessor the attributes are very highly correlated, with
all
six lines lying either