# Outliers and Influential Observations

Description
Econometric Methods
Categories
Published

## Download

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
Outliers and Inﬂuential Observations August 16, 2014 1 Motivation Refer to graphs presented in class to distinguish between outliers (observa-tions with ”large” residuals) and inﬂuential observations (observations thatmay or may not be outliers, but which inﬂuence a subset/all coeﬃcients, ﬁts,or variances, in a ”substantial” way. 2 Single-row Diagnostics All tests consider what happens if an observation is dropped–what happensto ﬁt, the estimated coeﬃcients, t-ratios, etc.Let the model with all observations be denoted as usual as: y  =  Xβ   + ε  and the OLS estimator  b  = ( X   X  ) − 1 X   y .Denote the  t th diagonal of the projection matrix  P   =  X  ( X   X  ) − 1 X   as  h t ,and the  t th row of ( X   X  ) − 1 x t  as  c t , whose  k th element is  c kt . Note that  h t can also be written as  x  t ( X   X  ) − 1 x t .Say the  t th observation is dropped. Denote the corresponding dependentvariable as  y [ t ], the  X   matrix as  X  [ t ], the residual vector as  e [ t ] etc.The  t th observation can be considered to be inﬂuential if its omission has alarge impact on parameter estimates, ﬁt of the model etc. This is determinedby using some rules of thumb:1. DFBETA: As shown below: b k − b k [ t ] =  c kt e t 1 − h t 1  Proof: Without loss of generality let the  t th observation be placed last.I.e write the data matrices in partitioned form as follows: X   = [ X   [ t ]  x t ];  y  = [ y  [ t ]  y t ]where  X   is ( nXK  ),  X  [ t ] is (( n − 1) XK  ) and  x  t  is (1 XK  ).  y t  is a scalar,and  y [ t ] is (( n − 1) X  1). ⇒ X   X   =  X   [ t ] X  [ t ] + x t x  t ;  or X   [ t ] X  [ t ] = ( X   X  ) − x t x  t ⇒ X   y  =  X   [ t ] y [ t ] + x t y t ;  or X   [ t ] X  [ t ] =  X   y − x t y t Given that for any matrix  A  and vector  c ( A − cc  ) − 1 =  A − 1 + A − 1 c ( I  − c  Ac ) − 1 c  A − 1 Substitute ( X   X  ) for  A  and  c  =  x t .( X   [ t ] X  [ t ]) − 1 =  ( X   X  ) − 1 + ( X   X  ) − 1 x t (1 − x  t ( X   X  ) − 1 x t ) − 1 x  t ( X   X  ) − 1  Substituting  h t  =  x  t ( X   X  ) − 1 x t , a scalar,=  ( X   X  ) − 1 + ( X   X  ) − 1 x t x  t ( X   X  ) − 1 1 − h t  ⇒ b [ t ] = ( X   [ t ] X  [ t ]) − 1 X   [ t ] y [ t ] =  ( X   X  ) − 1 + ( X   X  ) − 1 x t x  t ( X   X  ) − 1 1 − h t  ( X   y − x t y t )= ( X   X  ) − 1 X   y − ( X   X  ) − 1 x t y t +( X   X  ) − 1 x t x  t ( X   X  ) − 1 X   y 1 − h t − ( X   X  ) − 1 x t x  t ( X   X  ) − 1 x t y t 1 − h t =  b − ( X   X  ) − 1 x t y t  + ( X   X  ) − 1 x t x  t b 1 − h t −  ( X   X  ) − 1 x t h t y t 1 − h t ⇒ b − b [ t ] = ( X   X  ) − 1 x t y t (1 − h t ) − ( X   X  ) − 1 x t x  t b + ( X   X  ) − 1 x t h t y t 1 − h t Recognizing that  h t  and  y t  are scalars, and that  x  t b  = ˆ y  so that  y t − x  t b  =  e t , after cancellation we get b − b [ t ] = ( X   X  ) − 1 x t ( y t − x  t b )1 − h t =  c t e t 1 − h t 2  Focusing only on the  k th coeﬃcient, we get the expression above b k − b k [ t ] =  c kt e t 1 − h t Some standardization is necessary to determine cut-oﬀs: DFBETA k  =  b k − b k [ t ] s [ t ]   Σ c 2 kt Cutoff   : ±  2 √  n 2. DFFITS: It can be shown that:ˆ y t − ˆ y t [ t ] =  x t [ b − b [ t ]] =  h t e t 1 − h t With standardization: DFFIT  t  = ˆ y t − ˆ y t [ t ] s [ t ] √  h t Cutoff   : ± 2 √  K  √  n This was the impact of deleting the  t th observation on the  t th predictedvalue. Can analogously consider ˆ y  j − ˆ y  j [ t ]3. RSTUDENT: RSTUDENT   =  e t s [ t ] √  1 − h t Cutoff   : ± 24. COVRATIO: COVRATIO  =  | s 2 [ t ]( X  [ t ]  X  [ t ]) − 1 || s 2 ( X   X  ) − 1 | Cutoff   : <  1 −  3 K n  → ” bad ”; >  1 + 3 K n  → ” good ” 3 Multiple-row Diagnostics If there is a cluster of more than one outlier, it is clear that single-row di-agnostics will not be able to identify inﬂuential observations because of themasking eﬀect, demonstrated in class.3  Multiple-row diagnostics can. Let  m  denote the subset of   m  deletedobservationsThe measures deﬁned above can be analogously determined: DFBETA  =  b k − b k [ m ] Var ( b k ) MDFIT   = ( b − b [ m ])  ( X  [ m ]  X  [ m ])( b − b [ m ]) VARRATIO  =  | s 2 ( X  [ m ]  X  [ m ]) − 1 || s 2 ( X   X  ) − 1 | This is, however, not practical, although there are packages that canconsider every permutation of 2, 3, 4,.... data points, and also methods tohelp identify  m . 3.1 Partial Regression Plots In a simple regression model (with one independent variable), inﬂuentialobservations–be they single or multiple–are easy to detect visually. But whatabout a multiple regression model? One easy and practical solution is tocollapse a multiple regression model to a series of single-regressions using theFWL Theorem.For example, say there are four explanatory variables:  y  =  β  1  +  X  2 β  2  + ... + X  4 β  4  + ε To know if there are observations inﬂuencing the estimated  b 2 .1. Regress  y  on  X  3  and  X  4  and obtain the residual ˆ u .2. Regress  X  2  on  X  3  and  X  4  and obtain the residual ˆ w .By the FWL Theorem, we know that the regression of ˆ u  on ˆ w  yields theOLS slope coeﬃcient for  X  2 . So, a plot of ˆ u  on ˆ w  enables us to collapsemulti-dimentional problem into a two-dimensional one.Visual inspection along the lines presented earlier of such partial regres-sion plots for each of the key parameters of interest can identify inﬂuentialobservations–singly or as a cluster. 4 What to do The point is that an inﬂuential observation/set of observations is/are notnecessarily to be jettisoned. A cluster of inﬂuential observations could wellbe an indication of structural change, for example.4

Jul 23, 2017

#### [Helen Jackman] Just Six Guests, . First-hand, Enc(BookFi.org)

Jul 23, 2017
Search
Similar documents

View more...
Tags

## Multivariate Statistics

Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks