Spatial data analysis 2

The present study evaluates the possibility of spatial heterogeneity in the effects on municipal-level crime rates of both demographic and socio-economic variables. Geoggraphically weighted regression (GWR) is used for exploring spatial heterogeneity and confirms that place matters.
of 63
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  • 1. Spatial Data Analysis 2/2 Johan Blomme | Leenstraat 11 | 8340 Damme
  • 2. There is an increased interest in understanding spatial varying processes to explain various social, political and economic outcomes. Using global and local statistics can lead to completely different insights into the relationship between area-level characteristics and outcomes. Two types of spatial analysis are especially relevant : – spatial autocorrelation : the application of local clustering analysis to establish significant local patterns and the use of spatial econometrics to account for spatial effects in regression analysis ; – spatial heterogeneity : the application of geographically weighted regression analysis to explore the spatial variation in the relationships between area-level characteristics and various outcomes. In this guide, various techniques to perform global and local spatial regression analysis are explored. The examples used are for illustrative purposes only and are not intended to test the theoretical underpinnings that exist in the research field of the chosen cases. Introduction
  • 3. • In recent years, there has been a growing interest in adding a spatial perspective to the study of complex patterns of interrelated social, behavioral, economic and environmental phenomena. It is increasingly argued that spatial thinking and spatial analytical perspectives have an important role to play in uncovering answers that could prove helpful in addressing research and policy questions*. • Spatial analysis of data focuses on four methodological areas : – spatial econometrics ; – geographically weighted regression ; – multilevel models ; – spatial pattern analysis. * It is worth noting that the term “spatial analysis” applies equally to the study of incident level point patterns (e.g. crime hot spots) as well as to the study of aggregated counts or rates at the area level (e.g. census block groups, tracts or “neighborhoods”). i
  • 4. 1. Spatial econometrics • Spatial econometrics account for spatial effects in regression analysis. If geography or place matters (and it frequently does), then things that are more related geographically (i.e. more proximate geographically) are also correlated in other ways. Therefore, assumptions about the independence of covariates and about the independence and distribution of error terms are violated in an OLS regression framework. • Let’s take the example of the analysis of crime data. The growing spatial analysis of crime data enabled criminologists to move beyond simply mapping crime and demonstrating that crime does indeed cluster in space. An important issue became the question why crime clustered in space. Spatial regression models were being estimated to explain the observed patterns of spatial clusters. In addition to crime, many researchers began to use spatial regression models to demonstrate that many negative health issues such as low birth weight, infant mortality and depression cluster spatially. From these studies emerged a consistent set of explanatory variables that characterise “bad” neighborhoods (e.g. concentrated poverty, stability of residents, female headed households, minority population) and that there appeared to be an aggregate “neighborhood “ effect. For instance, concentrated poverty negatively impacts all residents of a community regardless of one’s own level of personal income. ii
  • 5. • That such places also cluster in space suggests that neighborhoods are not independent units of observation. There might be forces at work that make the level of crime in one neighborhood dependent upon the actions and activities occurring in other areas. That is, social processes might be at work that result in the diffusion across space. • In trying to understand these patterns, spatial regression became the methodology of choice. As noted, spatial autocorrelation occurs when the values of variables sampled at nearby locations are not independent of each other. This lack of independence makes the use of OLS regression inappropriate. To address spatial autocorrelation spatial lag and spatial error models became most popular. • When the level of crime in one neighborhood is directly dependent upon the activities or social processes occurring in a neighboring area, one must apply a spatial lag (spatial dependence) model. Spatial error models are appropriate for modeling unobservable processes (e.g. norms or beliefs) that are shared among individuals residing in proximate places, or when boundaries that separate “places” are arbitrary to the extent that two different places are actually very similar across various social, economic or demographic features. iii
  • 6. • By examining the statistically significant coefficient on the spatially lagged dependent variable or the spatial error term, specific explanations were offered regarding the forces driving the diffusion of the study object. • The selection of which model, lag or error, has, and continues to be, driven by goodness of fit tests rather than theory. • What causes spatial autocorrelation ? • Feedback. For most social processes, individuals and households interact with each other and thereby influence each other. The influence of such an interaction is likely to be stronger for those who are in frequent contact. Residential proximity generally increases the frequency for those who are in frequent contact. However, it is also possible to geographically “unbound” the autocorrelation matrix. For example, social similarity increases the probability of communication and social interaction. In this way, events in an area can be influenced more by events in non-adjacent but socially similar areas than in adjacent but socially dissimilar areas. One might model the diffusion of youth violence by considering social interactions that occur within schools. In such a case, neighborhoods would be linked if and only if they send students to the same school buildings. Studies that capture social networks and communication networks can provide an empirical validation of this approach (instead of using a geographically based matrix, the potential for activities in one area to influence other areas can be based on social distance between places). iv
  • 7. • Grouping forces. Individuals and households with common characteristics sometimes are found clustered together by choice or they are constrained to co-locate by the coercive operation of social, economic or political forces. When this type of constraint is responsible for spatial autocorrelation in a dependent variable, it may be possible to identify the variable or variables involved in the process and operationalize them on the right-hand of the regression equation. Sometimes the spatial autocorrelation in the dependent variable (and the regression residuals) can be explained by autocorrelated covariates (independent) variables, and standard regression approaches will work fine. If a causal variable cannot be identified, then the source of the autocorrelation will remain in the error term, necessitating what is referred to as a spatial error model. • Grouping responses. Individuals or households that share a common attribute or a set of common characteristics may respond similarly to external forces. Often there exist contextual forces that affect individuals and households in an area (e.g. geophysical conditions, cultural influences). A data analyst can deal with these contextual influences by declaring different “spatial regimes”. If not, spatial autocorrelation will remain in the regression error term, the result of an omitted variable in the specification, and spatial econometric approaches must again be considered. v
  • 8. • Nuisance autocorrelation. This occurs when the underlying spatial process creates regions that are much larger than the units of observation chosen or available to the analyst. The choice of the proper level of aggregation when estimating neighborhood effect remains problematic . Data is typically aggregated to geographical areas which serve as the units of analysis (e.g. census tracts). The modifiable area unit problem (MAUT) arises from the fact that units are usually arbitrarily defined in the sense that they can be aggregated or disaggregated to form units of different size. Innovative advances are being undertaken that define the geography of a community no longer on boundaries for administrative purposes (e.g. census tracts, zip codes) but capture the spatial dimension of social networks. vi
  • 9. • The challenges for future work are not those that pertain to the development of new mapping technologies or more sophisticated statistical methodologies : “That is, regardless of how sophisticated our methodologies become for the estimation of spatial models, the key will always be that the specification of these models be sound in terms of the measurement and definition of place and the manner in which areas are deemed “neighbors”. … Though the ability for a crime in a focal area to influence crime in another area might decay over distance, it is possible that there are other networks of social interactions (e.g. interactions that occur outside the neighborhood at work or school, participation in voluntary or religious organizations, …) that make events in one area extremely salient in the commission of future events in otherwise geographically distant areas” (Tita & Radil, 2010, pp. 476). vii
  • 10. • Recently techniques for the analysis of local spatial relationships have been developed. • In conventional regression, one parameter is estimated for the relationship between each independent variable and the dependent variable and the relationship is assumed to be constant across the study area. The term “global” implies that all of the data are used to compute a single statistic or model, and that the relationships between variables in the model are stationary across the study area. The GWR approach extends this framework to estimate local rather than global parameters. Instead of calibrating a single regression equation, GWR generates a separate regression equation for each observation. Each equation is calibrated using a different weighting of the observations contained in the data. • In traditional OLS all places have the same weight as if all places shared the same location. In GWR, as we move over space observations are weighted according to their proximity to a location. 2. Geographically weighted regression viii
  • 11. • Two problems arise with GWR. If the subset of the full sample is too small, standard errors will be high. Second, if the subsample is too large, coefficients will be biased because they drift across space. If the process is spatially non- stationary, a regression with a large subsample will result in estimates that are spatial averages. To overcome these problems in GWR a weighted calibration is used. • Observations in close spatial proximity to region i have a larger influence in the estimation of the parameters for region i than those further away. That is why those observations have larger weight in the sample than the observations from regions further away. This weighted calibration implies that the weighting of an observation is not constant but varies with i. Region j has a large weight in the estimation of region i if they are close to each other, and the weight of region j in the estimation of region m might be small if the regions are separated by a larger distance. Every single region i has a different weight matrix. • GWR is run in several steps. The first point is how observations should be weighted. The two most applied weighting functions (Kernel functions) are the Gaussian and the bi-square kernel. Using the Gaussian kernel, in which space is considered continuous, the weighting of data will decrease according to a Gaussian curve as the distance between i and j increases. Up to a certain bandwith, the observations will have a weight of at least 0.5. A binary scheme implies the notion that space is discrete or discontinuous. Beyond a bandwith, the weights are set to zero. ix
  • 12. • It is often stated that the GWR results are relatively insensitive to the choice of the weighting function, but they are not insensitive to the choice of the bandwith. As the density of regions in a dataset can vary, we cannot use just one bandwidth. For example, in a study of European regions a fixed bandwidth of 800 km is too small for the estimation of coefficients in Finland, because there are few regions and, accordingly, few data points in close proximity. The most northern Finish region would have only 3 neighbors. Such a small sample would result in large standard errors. Similarly, this bandwidth is too large for place like Austria, where the density of regions is much higher. The region of Tirol would have 129 neighboring regions within a distance of 800 km. Such a large sample could result in serious drift bias. That is why an adaptive kernel is most appropriate : an optimal adaptive number of neighbors will be applied. Adaptive kernel means a fixed proportion of all observations is included in the estimation, for example 20 percent of all regions. • An adaptive kernel is smaller in regions where the density of observations is high (like in Austrian regions) and larger in regions where the density is low (like in Finish regions). While the advantage of an adaptive kernel is obvious for regions with a high density of observations, the coefficients of regions with a low density of observations are likely to be drift biased, as they are also influenced by observations of regions which are in large distance. x
  • 13. • The bandwidth can be understood as the area of influence of each place. A small bandwidth means a small area of influence, meaning a rapid distance decay function, whereas a large bandwidth implies a larger area of influence, thus a smoother weighting scheme. In a regression context, a small bandwidth (slighter smooting) produces estimates with large local variation, whereas large bandwidths (greater smoothing) produce estimates with little spatial variation (larger bandwidths will make local coefficient estimates similar to OLS global estimates). • There are two methodes for the estimation of the optimal bandwidth : AICc (corrected Akaike Information Criterion) and CV (cross validation). When comparing between GWR models with different bandwidths, the model with the lowest AICc or CV can be considered the most appropriate as it will determine which radius size (bandwidth) is optimal. xi
  • 14. • The output from GWR is a set of surfaces that can be mapped , with each surface depicting the spatial variation of a relationship. Standard global modeling techniques, such as OLS or spatial regression models, cannot detect nonstationarity, and thus their use may obscure regional or local variation in the relationships between predictors and the outcome variable. Public policy inferences based on results from global models in which nonstationarity is present but not detected may be quite poor in specific local/regional settings. • GWR analysis and interpretation are largely dependent on GWR maps. Such maps can be problematic if they illustrate the size of parameter estimates while failing to illustrate their relative significance. A method to address this issue is the mapping of GWR statistics by combining local parameter estimates and t-values on a single map. • It is important to note that GWR is an exploratory technique, and, as with ESDA and spatial econometric approaches, the insights gained from GWR can be utilized to improve model specification in global models. Limitations associated with GWR include the computationally demanding calculation of multiple regressions, multicollinearity and kernel bandwidth selection. It should also be taken into account that GWR studies that find that parameter coefficients vary across space have a tendency to focus on this result and do not always seek to explain the results with further analysis. It is important that GWR and ESDA methods are utilized to help improve model specification, and that efforts be made to find explanations. xii
  • 15. • A methodological focus on multilevel or hierarchical modeling is relevant when assessing to what extent individual behaviors and demographic and health outcomes are influenced by an individual’s own characteristics, and by the attributes of the larger geographic area (neighborhood, village, district, state). • To some extent, nested data are inherently spatial. Statistical methods that incorporate neighborhood, city or regional effects are in essence considering the effects of places and spaces on their outcome(s) of interest. While traditional research has looked at de jure classifications of space (e.g. census tracts), it is increasingly acknowledged that legal and political boundaries frequently have little to do with actual lived spaces. Furthermore, many scientists are working in regions that do not have synonymous spatial categories : that is, neighborhoods and other administrative bounded areas may have different meanings in some of the non-industrialized and/or industrializing nations than they do in the developed world. 3. Multilevel modeling xiii
  • 16. • A wide range of methods new exist for analyzing spatial clusters of point data, such as disease or crime events, in which the goal is to discover whether the observed events exhibit any systematic pattern, as opposed to being distributed at random within a study area. Recent applications of spatial pattern analysis include the use of local statistics of spatial association. 4. Spatial pattern analysis xiv
  • 17. What does the near future hold for spatial data analysis ? • We can predict with some confidence that things will change rapidly, as the geospatial data and methodological development environment is dynamic. • It must be emphasized that the volume, sources and forms of geospatial data are growing rapidly. Data from wireless and sensor technologies and developments in data storage and handling (e.g. cloud computing, geospatial data warehouses, data mining techniques) will continue to change what, how and when we collect data on individuals and their environments. New data formats will be tagged with both a geographic location and a time stamp, providing unparalleled spatial and temporal precision. xv
  • 18. Global and Local Spatial Regression 1
  • 19. • Traditional regression analysis describes a modelled relationship between a dependent variable and a set of independent variables. When applied to spatial data, the regression analysis often assumes that the modelled relationship is stationary over space and produces a global model which is supposed to describe the relationship at every location in the study area. This would be misleading, however, if relationships being modelled are intrinsically different across space. One of the spatial statistical methods that attempts to solve this problem and explain local variation in complex relationships is Geographically Weighted Regression (GWR). • In a global regression model, the dependent variable is often modelled as a linear combination o be stationary over the whole area (i.e. the model returns one value for each parameter). GWR extends this framework by dropping the stationarity assumption: the parameters are assumed to be continuous functions of location. The result of the GWR analysis is a set of continuous localised parameter estimate surfaces, which describe the geography of the parameter space. These estimates are usually mapped or analysed statistically to examine the plausibility of the stationarity assumption of the traditional regression and different possible causes of nonstationarity. 2 The definitive text on GWR is : Fotheringham, A.S., B
  • We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks