# Bivariate Dependence Orderings for Unordered Categorical Variables

Description
Bivariate Dependence Orderings for Unordered Categorical Variables
Categories
Published

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
Bivariate Dependence Orderings for UnorderedCategorical Variables Alessandra Giovagnoli 1 , Johnny Marzialetti 1 and Henry Wynn 2 1 Department of Statistical Sciences, Via Belle Arti 41, Bologna 40126, Italy. alessandra.giovagnoli@unibo.it, marzialetti@stat.unibo.it 2 London School of Economics and Political Science, Houghton Street, LondonWC2A 2AE, UK.  h.wynn@lse.ac.uk 1 Introduction Several statistical concepts (such as location, dispersion, concentration anddependence) can be studied via order and equivalence relations. The con-cept to be deﬁned can be described by means of a partial ordering or a pre-ordering among the variables of interest and the relative measures are thusorder-preserving functions. Bickel and Lehmann (1975) were among the ﬁrstauthors to introduce this approach to statistics. Many well-known propertiesof established statistical measures can be derived from the ordering represen-tation.Stochastic orderings (i.e. order relations among unidimensional or mul-tidimensional random variables) and the relative order-preserving functionshave a long history: an introduction is in Chapter 9 of the book by Ross(1995) and fundamental works are by Shaked and Shanthikumar (1994) andM¨uller and Stoyan (2002). Applications are to general statistical theory, inparticular testing, reliability theory, and more recently risk and insurance. Inthis chapter it is not our intention to cover the basic material but rather todescribe and investigate concepts related to association of random variables,as in Goodman and Kruskal (1979), namely the degree of dependence amongthe components of a multivariate variable. We stress that diﬀerent concepts of association are possible: for example  interdependence   (all the variables havean exchangeable role) and  dependence   of one variable on the others. Further-more, as pointed out in Chapter 11 of  Bishop  et al.  (1975), a special case of association among variables is  inter-observer agreement  , which has importantapplications in several ﬁelds. In this case, there is one characteristic of interestobserved on the same statistical units by diﬀerent observers who may partlyagree and partly disagree in their classiﬁcations or their scores, and the multi-dimensional random variables to be compared from the agreement viewpointexpress the judgements of the “raters” in vector form.  2 Alessandra Giovagnoli, Johnny Marzialetti and Henry Wynn So far studies of dependence and interdependence through orderings havefocussed mainly on real random variables (see Joe (1997)): in the tradition of Italian statistics, Forcina and Giovagnoli (1987) and Giovagnoli (2002a,b) havedeﬁned some dependence orderings for bivariate  nominal   random variablesi.e. with values in unordered categories. This is the type of variables thatwe restrict ourselves to in this chapter. We review the existing results andintroduce further developments in Section 2. In particular, we are not awareof a theory of   agreement orderings   so far, apart from a brief hint in the alreadymentioned paper by Giovagnoli (2002b). A possible deﬁnition and some newresults are the topic of Section 3. Section 4 points at directions for research.For reasons of simplicity and space in this chapter we only look at nominalvariables in two dimensions. A further development which deals with an agree-ment ordering for a diﬀerent type of multivariate variables, namely discreteor continuous, is the object of another paper by the same authors (Giovagnoli et al.  (2006)).We end this introduction with some terminology. An  equivalence   in a set S  is a reﬂexive, symmetric, transitive relation. A  pre-order     in  S   is a reﬂexiveand transitive relation (anti-symmetry is not required). To every pre-order   there corresponds an equivalence relation   : if   x,y  ∈ S   and  x    y, y    x then we say that  x    y.  All the one-to-one maps  ϕ  of   S   onto  S   such that ϕ ( x )  x  for all  x ∈S   form a set  G I  , called the  invariance   set of    . All themaps  ψ  of   S   into  S   such that  x  y  implies  ψ ( x )  ψ ( y )  ∀  x,y  ∈S   form the equivariance   set  G E   of    . All the maps  φ  of   S   into  S   such that  φ ( x )  x  forall  x ∈ S   form the  contraction   set  G K   of    . As well as the contractions, wecan deﬁne the  expansions  : all the maps ˜ φ  of   S   into  S   such that  x   ˜ φ ( x ) forall  x  ∈ S  .  Clearly  G I   ⊂ G E   and  G I   ⊂ G K  ;  G I   is a group, whereas  G E   and G K   are semigroups.A function  f   :  S →  R  is  order-preserving   if   x    y  implies  f  ( x )  ≤  f  ( y ) . A trivial remark: if   f   is order-preserving and  g  :  R  →  R  is non-decreasing,then  g ◦ f   too is order-preserving. Clearly order-preserving functions must beinvariant w.r.t.  G I  . 2 Dependence orderings for two nominal variables 2.1 S-dependence and D-dependence of one variable on the other Let  X   and  Y   be categorical variables with a ﬁnite number of nominal cate-gories, which we shall denote by  x 1 ,..., x r  and  y 1 ,...,y c  respectively, just inorder to label them, without the labels implying any order among the cate-gories, i.e.  x 1  does not “come before”or “is less than” x 2 , etc. We are interestedin the joint (frequency or probability) distribution of ( X,Y  ) ,  identiﬁed fromnow on by a table  Bivariate Dependence Orderings for Unordered Categorical Variables 3 P  r × c  =   p 11  p 12  ... p 1 c  p 21  p 22  ... p 2 c .........  p r 1  p r 2  ... p rc  = (  p ij ) i  = 1 ,...,r ;  j  = 1 ,...,c,  where  p ij  ≥  0 ,   ij  p ij  = 1. An alternative de-scription is by means of the conditional and marginal distributions i.e. either( P  ∗ , p r ), where  P  ∗ = (  p ij /p i + ) and  p r  = (  p 1+ ,...,p r + ) t , or ( P  ∗∗ , p c ), where P  ∗∗ = (  p ij /p + j ) and  p c  = (  p +1 ,...,p + c ) t . It is sometimes useful to includetables with null rows and/or columns, in which case  P  ∗ or  P  ∗∗ are deﬁned bysetting all zeroes in the corresponding row and/or column.To describe the dependence of   Y   on  X  , the following order relation  ≤ S  (we call it  S  − dependence ) was deﬁned in Forcina and Giovagnoli (1987). Deﬁnition 1. Let   P   and   Q  be two bivariate tables. Then   Q ≤ S   P   if and only if there exists a stochastic matrix   S   such that   Q  =  S  t P  . The relation  ≤ S   implies that the column margins of   P   and  Q  are equal: p c  =  q c . Proposition 1 (Forcina and Giovagnoli (1987)). Q ≤ S   P   is equivalent to the following two conditions holding simultaneously: i )  Q ∗ = ˜ SP  ∗ ii )  p r  = ˜ S  t q r with   ˜ S   another stochastic matrix. Forcina and Giovagnoli (1987) showed that ≤ S   satisﬁes some intuitive require-ments for a dependence ordering.We call  S  − equivalence  the equivalence relation  ≏ S   deﬁned by  ≤ S  . Sincepermutation matrices are doubly stochastic, a permutation of the rows of  P   gives an  S  -equivalent table; the converse is not true, namely not all  S  -equivalent tables can be obtained by permutation as the following resultshows. Proposition 2. (1)Row-aggregation, i.e. replacing one of two rows by their sum and the other by a row of zeroes, leads to a distribution with less   S  -dependence.(2)When two rows are proportional, both row-aggregation and row-splitting (namely the inverse operation to row-aggregation) imply   S  -equivalence.Proof.  The matrices S  1  =  1 01 0  00 t I r − 2   and  S  2  =  α  1 − α 1 0  00 t I r − 2   4 Alessandra Giovagnoli, Johnny Marzialetti and Henry Wynn are stochastic. Pre-multiplication by  S  t 1  gives aggregation of the ﬁrst two rows.On the other hand, if the second row is zero, pre-multiplication by  S  t 2  splitsthe ﬁrst row into two proportional ones.There is another type of dependence of   Y   on  X  . Deﬁnition 2. We deﬁne   D -dependence as  Q ≤ D  P   def  ⇐⇒  Q  =  PD  (1) with   D  a   T  -matrix, i.e. a product of   T  − transforms, namely matrices of the  form   T  α  = (1 − α ) I   + αΠ  (2) ,  0  ≤ α ≤ 1  and   Π  (2) a permutation matrix that exchanges only 2 elements; such a   D  is doubly-stochastic. This can be thought of as a model for errors in the  Y   variable: since perfectdependence is obtained when to each  x -category there corresponds preciselyonly one  y -category, then  α  stands for the probability (frequency) of mistak-enly exchanging two  y -categories. This ordering is known in the literature aschain-majorization (see Marshall and Olkin (1979)). The equivalence relation ≏ D  deﬁned by  ≤ D  is permutation of the columns of   P  .Clearly these two orderings can be combined. Deﬁnition 3. Deﬁne   SD -dependence as follows  Q ≤ SD  P   def  ⇐⇒  Q  =  S  t PD  (2) where   S   is a stochastic matrix and   D  a product of   T  -transforms. The  ≤ SD  ordering was deﬁned in Forcina and Giovagnoli (1987), whohowever did not carry out a proper investigation of its properties. Note that(2) can be written as  vec ( Q ) = ( S   ⊗ D ) t vec ( P  ) and means that there existsa bivariate distribution table  R  such that  R ≤ S   P   and  Q ≤ D  R,  and also thatthere exists ˜ R  such that ˜ R ≤ D  P   and  Q ≤ S   ˜ R . Clearly both  ≤ S   and  ≤ D  arespecial cases of   ≤ SD . On the other hand,  ≤ SD  allows more comparisons, inparticular matrices  P   and  Q  no longer need to have identical row or columnmargins.Relative to maximal and minimal elements w.r.t. the ordering  ≤ SD ,  thefollowing results hold true: Proposition 3. (1)All the tables with independent rows and a given marginal distribution of  Y   are   S  -equivalent and are smaller w.r.t.  ≤ S   than all the other tables with the same   Y  -margin.  Bivariate Dependence Orderings for Unordered Categorical Variables 5 (2)All the tables giving exact dependence of   Y   on   X   with the same   Y  -margin are   S  -equivalent and are greater w.r.t.  ≤ S   than all the other tables with the same   Y  -margin. The proof is easily obtainable by the same techniques employed in Theorem3 of Forcina and Giovagnoli (1987).Observe that Proposition 3 is not true for the ordering ≤ D . However fromProposition 3 there follows Corollary 1. (1)The independence table with uniform margins   (1 /rc ) J  r × c , where   J   stands  for the matrix of all ones, is smaller w.r.t.  ≤ SD  than any other   r × c  joint probability table.(2)All the tables giving exact dependence of   Y   on   X   are   SD -equivalent and are greater w.r.t.  ≤ SD  than all the other   r × c  tables. Let us now look at ways of transforming the order relation. Proposition 4. The invariance group and contraction set of   ≤ S  ,  ≤ D  and   ≤ SD  are as follows,where the pair   ( A,B )  denotes two matrices of dimension   r  ×  r  and   c  ×  c respectively, acting on   P   by pre- and by post-multiplication respectively.(1) G I  ( ≤ S  ) = { ( Π  1 ,I  c ); I  c  the identity,  Π  1  a permutation matrix  } G I  ( ≤ D ) = { ( I  r ,Π  2 );  I  r  the identity,  Π  2  a permutation matrix  } G I  ( ≤ SD ) = { ( Π  1 ,Π  2 );  Π  1 ,  Π  2  permutation matrices  } ==  G I  ( ≤ S  )  G I  ( ≤ D ) (2) G K  ( ≤ S  ) = { ( S  t ,I  c );  I  c  the identity,  S   stochastic  } G K  ( ≤ D ) = { ( I  r ,D );  I  r  the identity,  D  a   T  - matrix  } G K  ( ≤ SD ) = { ( S  t ,D );  S   stochastic,  D  a   T  - matrix  }  = =   G K  ( ≤ S  )  G K  ( ≤ D ) Furthermore it can be shown that (3) G E  ( ≤ S  )  { ( S  t 1 ,S  2 );  S  1  stochastic and of full rank,  S  2  stochastic  } G E  ( ≤ D )  { ( S  t ,D );  S   stochastic,  D  a   T  - matrix  } G E  ( ≤ SD ) =  G K  ( ≤ S  )  G K  ( ≤ D )Clearly, if we were interested in comparing bivariate tables as regards thedependence of   X   on  Y   we would consider the “transpose” order of   ≤ SD ,namely Q ≤ tSD  P   def  ⇐⇒ Q  =  DPS   where  S   stochastic,  D  a  T  -matrix .  (3)

Jan 8, 2019

#### Data processing functions

Jan 8, 2019
Search
Similar documents

### An Empirical, Nonparametric Simulator for Multivariate Random Variables with Differing Marginal Densities and Nonlinear Dependence with Hydroclimatic Applications

View more...
Related Search
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x