Classical statistics attaches a probability distribution to the residuals of prediction equation. Spatial statistics modifies this situation by specifying a prediction function that has \(Y\) on both sides of the equation, which means that a value at location \(i\) is at least partially a function of the values of \(Y\) at nearby locations. This conceptualization captures the essence of spatial autocorrelation.
SAR, simultaneous autoregression:
If considering \(\epsilon\) in the standard linear regression model \(Y=\pmb{X}\pmb{\beta}+\epsilon\) is spatial auto-related, then the model can be written as \(Y=\pmb{X}\pmb{\beta}+(\pmb{I}-\rho\pmb{W})^{-1}\epsilon\), and the spatial regression equation can be written as:
\(Y=\rho{W_1}Y+X\beta+u, u=\lambda{W_2}\epsilon+\mu, \mu\)~\(N[0,\sigma^2I]\)
情形 | 模型名 | 说明 |
\(\rho=0,\lambda=0\) | 普通线性回归 | 模型中没有空间自相关的影响 |
\(\rho\not=0,\beta=0,\lambda=0\) | 一阶空间自回归模型 | 因变量受邻域内的其他因变量的影响 |
\(\rho\not=0,\beta\not=0,\lambda=0\) | 空间滞后模型 | Spatial Lag model,SLM,研究的因变量不仅与本区域的解释变量有关,还与相邻区域的因变量有关 |
\(\rho=0,\beta\not=0,\lambda=0\) | 空间误差模型 | Spatial Error Model,SEM,研究的因变量不仅与本区域的解释变量有关,还与相邻区域的解释变量有关 |
\(\rho\not=0,\beta\not=0,\lambda=0\) | 空间杜宾模型 | Spatial Dubin Model,SDM |
模型的选择步骤:先进行OLS回归,再进行LM-Lag和LM-Error(二者都基于Lagrange Multiplier,拉格朗日乘数诊断)诊断,若:
在本节的应用案例中也提到了判别分析(Discriminant Fuction Analysis),这里也作一简单回顾。
判别方法 | 描述 | 评价 |
距离判别 | 根据已知分类的数据,分别计算各类的重心(各组的均值),判别准则是对任意给定样品,计算其到各类均值的距离1,哪个距离最小就将其归至哪个类 | 简单实用;未考虑先验概率,未考虑错判的损失 |
贝叶斯判别 | 计算被判样本\(x\)属于\(k\)个总体的条件概率\(P(n|x),n=1,2,3...,k\),比较\(k\)个概率的大小,将样本判归为出现概率最大的总体(或错判概率最小的总体) | 样本分布往往不满足属性条件独立性假设 |
费歇尔判别 | 设有A、B两个总体,分别有\(n_1、n_2\)个样本数据,每个样本有\(p\)个观测指标,每个样本可以看作\(p\)维空间中的一点,借助方差分析的思想构建一个线性判别函数 | Fisher LDA 在有监督的情况下,最大化地保留了分类信息,这一分类信息由一个非参指标,Fisher 指标来衡量;可能损失精度 |
In the preceding section, implementation of spatial autoregressive models requires nonlinear regression techniques but the error term assumption is still the normal probability model.
Eigenvector spatial filtering furnishes a sound methodology for estimating non-normal probability models with georeferenced data containing non-zero spatial autocorrelation. This methodology accounts for spatial autocorrelation in random variables by incorporating heterogeneity into parameters in order to model non-homogeneous populations. It renders a mixture of distributions that can be used to model observed georeferenced data whose various characteristics differ from those that are consistent with a single, simple, underlying distribution with constant parameters across all observations. The aim of this technique is to capture spatial autocorrelation effects with a linear combination of spatial proxy variables – namely, eigenvectors – rather than to identify a global spatial autocorrelation parameter governing average direct pairwise correlations between selected observed values. As such, it utilizes the misspecification interpretation of spatial autocorrelation, which assumes that spatial autocorrelation is induced by missing exogenous variables, which themselves are spatially autocorrelated, and hence relates to heterogeneity.
Eigenvector spatial filtering conceptualizes spatial dependency as common factor that is a linear combination of synthetic variates (the eigenvector of the matrix \(\pmb{(I-11^T/n)C(I-11^T/n)}\)) summarizing distinct features of the neighbors’ geographic configuration structure for a given georeferenced dataset.
这里涉及到广义线性模型 Generalized Linear Model,也作一简单回顾。
指数分布族(exponential family,如泊松分布、二项分布、几何分布等都属于指数分布族):$$f(y | \theta,\phi)=exp(\frac{y\theta-b(\theta)}{a(\theta)}+c(y,\phi))\(,其中\)\theta\(为自然参数,与均值有关;\)\phi\(为散布参数,与方差有关;\)a(\phi),b(\theta),c(y,\phi)$$为已知函数。 |