School Work

Outliers and Influential Observations

Description
Econometric Methods
Categories
Published
of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Outliers and Influential Observations August 16, 2014 1 Motivation Refer to graphs presented in class to distinguish between outliers (observa-tions with ”large” residuals) and influential observations (observations thatmay or may not be outliers, but which influence a subset/all coefficients, fits,or variances, in a ”substantial” way. 2 Single-row Diagnostics All tests consider what happens if an observation is dropped–what happensto fit, the estimated coefficients, t-ratios, etc.Let the model with all observations be denoted as usual as: y  =  Xβ   + ε  and the OLS estimator  b  = ( X   X  ) − 1 X   y .Denote the  t th diagonal of the projection matrix  P   =  X  ( X   X  ) − 1 X   as  h t ,and the  t th row of ( X   X  ) − 1 x t  as  c t , whose  k th element is  c kt . Note that  h t can also be written as  x  t ( X   X  ) − 1 x t .Say the  t th observation is dropped. Denote the corresponding dependentvariable as  y [ t ], the  X   matrix as  X  [ t ], the residual vector as  e [ t ] etc.The  t th observation can be considered to be influential if its omission has alarge impact on parameter estimates, fit of the model etc. This is determinedby using some rules of thumb:1. DFBETA: As shown below: b k − b k [ t ] =  c kt e t 1 − h t 1  Proof: Without loss of generality let the  t th observation be placed last.I.e write the data matrices in partitioned form as follows: X   = [ X   [ t ]  x t ];  y  = [ y  [ t ]  y t ]where  X   is ( nXK  ),  X  [ t ] is (( n − 1) XK  ) and  x  t  is (1 XK  ).  y t  is a scalar,and  y [ t ] is (( n − 1) X  1). ⇒ X   X   =  X   [ t ] X  [ t ] + x t x  t ;  or X   [ t ] X  [ t ] = ( X   X  ) − x t x  t ⇒ X   y  =  X   [ t ] y [ t ] + x t y t ;  or X   [ t ] X  [ t ] =  X   y − x t y t Given that for any matrix  A  and vector  c ( A − cc  ) − 1 =  A − 1 + A − 1 c ( I  − c  Ac ) − 1 c  A − 1 Substitute ( X   X  ) for  A  and  c  =  x t .( X   [ t ] X  [ t ]) − 1 =  ( X   X  ) − 1 + ( X   X  ) − 1 x t (1 − x  t ( X   X  ) − 1 x t ) − 1 x  t ( X   X  ) − 1  Substituting  h t  =  x  t ( X   X  ) − 1 x t , a scalar,=  ( X   X  ) − 1 + ( X   X  ) − 1 x t x  t ( X   X  ) − 1 1 − h t  ⇒ b [ t ] = ( X   [ t ] X  [ t ]) − 1 X   [ t ] y [ t ] =  ( X   X  ) − 1 + ( X   X  ) − 1 x t x  t ( X   X  ) − 1 1 − h t  ( X   y − x t y t )= ( X   X  ) − 1 X   y − ( X   X  ) − 1 x t y t +( X   X  ) − 1 x t x  t ( X   X  ) − 1 X   y 1 − h t − ( X   X  ) − 1 x t x  t ( X   X  ) − 1 x t y t 1 − h t =  b − ( X   X  ) − 1 x t y t  + ( X   X  ) − 1 x t x  t b 1 − h t −  ( X   X  ) − 1 x t h t y t 1 − h t ⇒ b − b [ t ] = ( X   X  ) − 1 x t y t (1 − h t ) − ( X   X  ) − 1 x t x  t b + ( X   X  ) − 1 x t h t y t 1 − h t Recognizing that  h t  and  y t  are scalars, and that  x  t b  = ˆ y  so that  y t − x  t b  =  e t , after cancellation we get b − b [ t ] = ( X   X  ) − 1 x t ( y t − x  t b )1 − h t =  c t e t 1 − h t 2  Focusing only on the  k th coefficient, we get the expression above b k − b k [ t ] =  c kt e t 1 − h t Some standardization is necessary to determine cut-offs: DFBETA k  =  b k − b k [ t ] s [ t ]   Σ c 2 kt Cutoff   : ±  2 √  n 2. DFFITS: It can be shown that:ˆ y t − ˆ y t [ t ] =  x t [ b − b [ t ]] =  h t e t 1 − h t With standardization: DFFIT  t  = ˆ y t − ˆ y t [ t ] s [ t ] √  h t Cutoff   : ± 2 √  K  √  n This was the impact of deleting the  t th observation on the  t th predictedvalue. Can analogously consider ˆ y  j − ˆ y  j [ t ]3. RSTUDENT: RSTUDENT   =  e t s [ t ] √  1 − h t Cutoff   : ± 24. COVRATIO: COVRATIO  =  | s 2 [ t ]( X  [ t ]  X  [ t ]) − 1 || s 2 ( X   X  ) − 1 | Cutoff   : <  1 −  3 K n  → ” bad ”; >  1 + 3 K n  → ” good ” 3 Multiple-row Diagnostics If there is a cluster of more than one outlier, it is clear that single-row di-agnostics will not be able to identify influential observations because of themasking effect, demonstrated in class.3  Multiple-row diagnostics can. Let  m  denote the subset of   m  deletedobservationsThe measures defined above can be analogously determined: DFBETA  =  b k − b k [ m ] Var ( b k ) MDFIT   = ( b − b [ m ])  ( X  [ m ]  X  [ m ])( b − b [ m ]) VARRATIO  =  | s 2 ( X  [ m ]  X  [ m ]) − 1 || s 2 ( X   X  ) − 1 | This is, however, not practical, although there are packages that canconsider every permutation of 2, 3, 4,.... data points, and also methods tohelp identify  m . 3.1 Partial Regression Plots In a simple regression model (with one independent variable), influentialobservations–be they single or multiple–are easy to detect visually. But whatabout a multiple regression model? One easy and practical solution is tocollapse a multiple regression model to a series of single-regressions using theFWL Theorem.For example, say there are four explanatory variables:  y  =  β  1  +  X  2 β  2  + ... + X  4 β  4  + ε To know if there are observations influencing the estimated  b 2 .1. Regress  y  on  X  3  and  X  4  and obtain the residual ˆ u .2. Regress  X  2  on  X  3  and  X  4  and obtain the residual ˆ w .By the FWL Theorem, we know that the regression of ˆ u  on ˆ w  yields theOLS slope coefficient for  X  2 . So, a plot of ˆ u  on ˆ w  enables us to collapsemulti-dimentional problem into a two-dimensional one.Visual inspection along the lines presented earlier of such partial regres-sion plots for each of the key parameters of interest can identify influentialobservations–singly or as a cluster. 4 What to do The point is that an influential observation/set of observations is/are notnecessarily to be jettisoned. A cluster of influential observations could wellbe an indication of structural change, for example.4
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks