This paper proposes the exact distribution of squared DFFITS alias squared Welsch-Kuh () 2 WK distance measure used to evaluate the influential observations in a multiple linear regression analysis. The authors have explored the relationship between
  Journal of Reliability and Statistical Studies; ISSN (Print): 0974-0!4" (#nline): !!!9-$%%% &ol' 9" Issue  (!0%): %9-9 EXACT DISTRIBUTION OF SQUARED WELSCH-KUH DISTANCE AND IDENTIFICATION OF INFLUENTIAL OBSERVATIONS G.S. David Sam Jaya!ma" #  a$d A. S!%&'a$ (   Jaal Institute of *ana+eent" ,irucira..alli" India / *ail:  saaya771+ail'co" ! sultan901+ail'co' Recei2ed *arc !!" !0$ *odified *arc 30" !0% cce.ted .ril !7" !0% A)*&"a+& ,is .ro.oses te e5act distribution of s6uared 88I,S alias s6uared elsc-u ( ) ! WK  distance easure used to e2aluate te influential obser2ations in a ulti.le linear re+ression analysis' ,e autors a2e e5.lored te relationsi. beteen te ! WK   in ters of to inde.endent 8-ratio<s and tey a2e son te deri2ed density function of te ! WK   distance in a co.licated series e5.ression for in2ol2in+ =auss etric function it to sa.e .araeters . and n' *oreo2er" te ean" 2ariance of te distribution are deri2ed in ters of te sa.e .araeters and te autors a2e establised te control liit of ! WK  . Siilarly" te critical .oints of s6uared elsc-u ( ) ! WK  distance easure are co.uted at $> and > si+nificance le2esl for different sa.le si?es and 2aryin+ no' of .redictors' 8inally" te nuerical e5a.le sos te identification of te influential obser2ations and te results e5tracted fro te .ro.osed a..roaces are ore scientific" systeatic and teir e5actness out.erfors te elsc-u<s traditional a..roac' K,y W"d*  S6uared   elsc-u istance *easures" Influential #bser2ation" Series /5.ression 8or" =aussetric 8unction" *ean" &ariance" Aritical Points' A/S C%a**i0i+a&i$ %!@0 #.   I$&"d!+&i$ a$d R,%a&,d 1" ,e Studenti?ed residuals and te .lot of te residuals ere considered te ost statistical de2ices to detect .otentially critical obser2ations in te literature before te tird 6uarter of te !0t century' BenCen and  (97!) a2e clarified tat te estiated 2ariance of te residuals includes .ertinent inforation  beyond tat .ro2ided by .lots of residuals or studenti?ed residuals' Siilarly" tey discussed te 2ariances of residuals in se2eral ore co.licated desi+ns' @oa+lin and els (97) e5.ressed" .roection atri5 Cnon as te at atri5 tat contains tis inforation and to+eter it te studenti?ed residuals" .ro2ides a eans of identifyin+ e5ce.tional data .oints' AooC (977) as been te first to establis a si.le easure"  D i   tat incor.orates inforation fro te D-s.ace and E-s.ace used for assessin+ te influential obser2ations in re+ression odels' ,e .roble of outliers or influential data in te ulti.le or ulti2ariate linear re+ression settin+ as been torou+ly discussed it reference to .araetric re+ression odels by te .ioneers naely AooC (977)"   70 Journal of Reliability and Statistical Studies" June !0%" &ol' 9()   AooC and eisber+ (9!)" Belsey et al' (90) and Aatteree and @adi (9) res.ecti2ely' In non-.araetric re+ression odels" dia+nostic results are 6uite rare' on+ te" /ubanC (9$)" Sil2eran (9$)" ,oas (99)" and i (99%) studied residuals" le2era+es" and se2eral of AooC<s distance in sootin+ s.lines" and i and i (99 F !00) .ro.osed a ty.e of AooC<s distance in Cernel density estiation and in local .olynoial re+ression' ,e .rase Ginfluence easures< as +li.sed a +reat sur+e of researc interests' ,e de2elo.ents of different easures are in2esti+ated to identify te influential obser2ation fro te early criteria of AooC<s to te .resent and a definition about influence" ic a..ears ost suitable" is +i2en by Belsey et al' (90)' AooC<s statistical dia+nostic easure is a si.le" unifyin+ and +eneral a..roac for ud+in+ te local influence in statistical odels' s far as te influence easures are concerned in te literature" te .rocedures ere desi+ned to detect te influence of obser2ations on a s.ecific re+ression result' @oe2er" @adi (99!) .ro.osed a dia+nostic easure called @adi<s influence function to identify te o2erall .otential influence ic .ossesses se2eral desirable .ro.erties tat any of te fre6uently used dia+nostics do not +enerally .ossess suc as in2ariance to location and scale in te res.onse 2ariable and in2ariance to non-sin+ular transforations of te e5.lanatory 2ariables' It is an additi2e function of easures of le2era+e and of residual error and it is onotonically increasin+ in te le2era+e 2alues and in te s6uared residuals' Recently" H a?-=arcH a and =on?le?-8arH as (!004) odified te classical AooC<s distance it +enerali?ed *aalanobis distance in te conte5t of ulti2ariate elli.tical linear re+ression odels and tey also establised te e5act distribution for identification of outlier data .oints' Aonsiderin+ te abo2e re2ies" te autors a2e  .ro.osed te e5act distribution of S6uared elsc-u distance ( ) ! WK   to e5actly identify te influential data .oints and is discussed in te subse6uent sections' (. R,%a&i$*'i2 ),&1,,$ S3!a",d W,%*+'-K!' di*&a$+, ( ) ! WK   a$d F-"a&i* ,e ulti.le linear re+ression odel it rando error is +i2en by Y X e β  = +  () ere () nX  Y   is te atri5 of te de.endent 2ariable"   )( kX   β  is te 2ector of beta co-efficients or .artial re+ression co-efficients and () nX  e is te residual folloed noral distribution N (0" ne  I  ! σ  )' 8ro ()" statisticians concentrate and +i2e i.ortance to te error dia+nostics suc as outlier detection" identification of le2era+e .oints and e2aluation of influential obser2ations' Se2eral error dia+nostics tecni6ues e5ist in te literature .ro.osed by statisticians" but te 88I,S is te interestin+ tecni6ue based on te si.le fact tat te i.act of te i th  on te .redicted 2alue can be easured by scalin+ te can+e in .rediction at i  x , en te i th   obser2ations is oitted " i'e' ( )( ) iihiiiihi yi  x y σ σ   β  β      −  −=      (!) elsc and u (977)" elsc and Peters (97) and Belsley" u and elsc   /5act distribution of s6uared elsc-u distance K 7 (90) su++ested usin+ ( ) ! i σ   an estiate of ! σ  and called (!) as 88I,S' 8or si.licity" tey refer (!) by elsc-u distance ( ) i WK  " ( )( )  iiiiiiiiiii hh Rh xWK  −=−=          σ   β  β    (3) ere i  R is te absolute e5ternally studenti?ed residual" G n < is te sa.le si?e" and ii h  is te at 2alue of i th  obser2ation or dia+onal eleent of te at atri5 ))(( LL  X  X  X  X  H   − = ' elsc (90) su++ested i WK   as a dia+nostic tool and ( ) !M  p n +  as a calibration .oint for obser2ations' ,e 2alue of i WK   for obser2ations e5ceedin+ tis calibration .oint ic is treated as influential obser2ation and sees reasonable to noinate .oints for s.ecial attention" elsc-u distance easure can also be ritten in a s6uared alternati2e for as !!  iii iii hWK Rh =−  (4) ,ou+ te easure is scientific and te criterion ( ) !M  p n +  used to detect te influential obser2ation is not scientific and te autors belie2e tat it is based on rule of tub a..roac' In order to o2ercoe tis rule of tub a..roac" autors ade an atte.t to aCe tis a..roac ore scientific by fi5in+ eanin+ full criterion as calibration .oint' ,o identify te e5act influential obser2ations" e .ro.ose te e5act distribution for s6uared elsc-u distance easure' 8or tis" e utili?e te relationsi. aon+ te s6uared elsc-u distance ( ) ! i WK  " e5ternally studenti?ed residual ( ) i  R and at eleents () ii h ',e ters i  R  and ii h  are inde.endent because te co.utation of i  R  in2ol2es te error ter )"0( ! ei  N e  σ  ∼  and ii h  2alues in2ol2e te set of .redictors LL (())  H X X X X  − = ' ,erefore" fro te .ro.erty of least s6uares if ()0  E eX   = " ten i  R  and ii h  are also uncorrelated and inde.endent' sin+ tis assu.tion" e already Cno tat te e5ternally studenti?ed residual ( ) i  R e5actly follos t-distribution it n-p-2  de+rees of freedo and it<s s6uared for is +i2en as ( ) ( )  ( ) !"!!!   −−− ∼−=  pniiie sii  F he R   ($) 8ro ($)" it is te s6uared for of te e5ternally studenti?ed residual and it follos 8-distribution it (" n-p-2 ) de+rees of freedo' Siilarly" e identify te distribution of ii h  based on te relationsi. .ro.osed by Belsley et al' (90) o a2e son tat   7! Journal of Reliability and Statistical Studies" June !0%" &ol' 9()   if te set of .redictors follos ulti2ariate noral distribution it )"(  X  X   Σ µ  " ten ( ) )"( ))(( M)(  pn piiii  F h pnh pn −− ∼−−−−   (%) 8ro (%) it follos 8-distribution it (")  p n p − −  de+rees of freedo and it can be ritten in an alternati2e for as )"( )"( M  pn pi  pn pi ii  F  pn pn F  pn ph −−−− −−++    −−=   (%a) In order to deri2e te e5act distribution of s6uared elsc-u distance" itout loss of +enerality substitutin+ ($) and (%a) in (4)" e +et ! i WK   in ters of te to inde.endent 8-ratios it ("!) n p − −  and (")  p n p − −  de+rees of freedo res.ecti2ely and te relationsi. is +i2en as ( ) ( ) !""!  i i p n p i n p n pWK F F n n p n − − − −  −= + − −    (7) ( ) ( ) ( ) !""! !! i i p n p i n p n n p pWK F F n n p n n p − − − − − −   −= +  − − − −     () 8ro ()" it can be furter si.lified and ! i WK   is e5.ressed in ters of to inde.endent beta 2ariables of Cind-! naely  i θ   and ! i θ   by usin+ te folloin+ facts ( ) !" "!! ii p n p  p p n p F n p ∼ θ β  − − − − − =  −     (9)   ( ) !!"! !"!!! ii n p n p F n p ∼ θ β  − − − − =  − −     (0) ,en" itout loss of +enerality () can be ritten as ( ) !! ! i i i n n pWK n n θ θ  − −  = + −     ()   ( ) !! ! i i i n pWK nn θ θ  − −= +−   (!) ( ) ( ) !! " i i i WK p n n α θ θ  = +  (3) 8ro (3)" te autors a2e son te s6uared elsc-u distance easure in ters of ! "!! i n p ∼ θ β   − −     and !! !"!! i n p ∼ θ β   − −     ic folloed beta distribution of Cind-! it to sa.e .araeters  p "  n and ( ) ( ) "!M  p n n p n α   = − − −   is a norali?in+ function ic in2ol2es te sa.e .araeters res.ecti2ely' Based on te identified relationsi. fro (3)" te autors a2e deri2ed te distribution of te   /5act distribution of s6uared elsc-u distance K 73 s6uared elsc-u distance ic discussed in te ne5t section' 4. E5a+& Di*&"i)!&i$ 0 S3!a",d W,%*+'-K!' di*&a$+,   sin+ te tecni6ue of to-diensional Jacobian of transforation" te oint  .robability density function of te to beta 2ariables of Cind-! naely  i θ  and ! i θ    ere transfored into density function of ne rando 2ariables ! i WK   and i u ' It is +i2en as ( )  ( ) !! "" i i i i  f WK u f J  θ θ  =  (4) 8ro (4)" e Cno i  θ  and i ! θ   are inde.endent ten rerite (4) as ( )  ( ) ( ) !! " i i i i  f WK u f f J  θ θ  =  ($) sin+ te can+e of 2ariable tecni6ue" substitute ii  u = ! θ    in (3) e +et ( ) ! " iii WK n p n u θ α    = −     (%)   ,en .artially differentiate (%)" co.ute te Jacobian deterinant and rerite ($) as ( )  ( ) ( ) ( ) ( ) !!!! """ i ii i i ii i  f WK u f f WK u θ θ θ θ  ∂=∂  (7) ( )  ( ) ( ) !!!!!! " i iiii i i ii iii uWK  f WK u f f uWK  θ θ θ θ θ θ  ∂ ∂∂∂=∂ ∂∂∂  () 8ro ($)" e Cno tat  i θ    and ! i θ  are inde.endent and ten te density function of te oint distribution of  i θ   and  i θ   is +i2en as ( ) !!!!!!!!!! "()()!""!!!!  p n p n p pi i i ii i  f  p n p n p B B θ θ θ θ θ θ  − − − −   −− + − +− −       = + × +− − − −           (9) ere ! 0" i i θ θ  ≤ < ∞ " "0 n p  >  and ( ) ( ) ( ) ( ) !!!!!! """0 i iiiiii i iiii WK uWK n p n un p n u n p n uuWK  θ θ α θ θ  α  α  ∂ ∂∂∂ −= =∂ ∂∂∂  (!0) ,en substitutin+ (9) and (!0) in () in ters of te substitution of i u " e +et te
