Schalal

Chapter 2 Spatial Autocorrelation

Much of classical statistical theory assumes that observations are iid (independent and identically distributed).

Correlated samples theory: repeated measures —-> multivariate statistical theory: observations are paired with the distribution of pairs being iid while pairs themselves are correlated.

(也就是一组观测值与另一组观测值认为无关,但一组观测值内的各个变量认为是相关的,所以称为autocorrelation)

—-> time series analysis —-> spatial series analysis (spatial autocorrelation)

2.1 Indices measuring spatial dependency

Moran coefficient (MC)

\[\rho=\frac{\sum_{i=1}^{n}{1(x_i-\overline{x})(y_i-\overline{y})}/n}{\sqrt{\sum_{i=1}^{n}{(x_i-\overline{x})^2}/n}\sqrt{\sum_{i=1}^{n}{(y_i-\overline{y})^2}/n}}=\frac{\sum_{i=1}^{n}{(x_i-\overline{x})(y_i-\overline{y})}}{\sqrt{\sum_{i=1}^{n}{(x_i-\overline{x})^2}}\sqrt{\sum_{i=1}^{n}{(y_i-\overline{y})^2}}}\] \[MC=\frac{\sum_{i=1}^{n}{\sum_{j=1}^{n}{c_{ij}(y_i-\overline{y})(y_j-\overline{y})/\sum_{i=1}^{n}{\sum_{j=1}^{n}{c_{ij}}}}}}{\sqrt{\sum_{i=1}^{n}{(y_i-\overline{y})^2}/n}\sqrt{\sum_{j=1}^{n}{(y_i-\overline{y})^2}/n}}=\frac{n}{\sum_{i=1}^{n}{\sum_{j=1}^{n}{c_{ij}}}}\frac{\sum_{i=1}^{n}{\sum_{j=1}^{n}{c_{ij}(y_i-\overline{y})(y_j-\overline{y})}}}{\sum_{i=1}^{n}{(y_i-\overline{y})^2}}\]

Important properties of MC

Geary Ratio (GR)

Relationships between MC and GR

\[GR=\frac{n-1}{2\sum_{i=1}^{n}{\sum_{j=1}^{n}{c_{ij}}}}\frac{\sum_{i=1}^{n}{(x_i-\overline{x})^2}(\sum_{j=1}^{n}{c_{ij}})}{\sum_{i=1}^{n}{(x_i-\overline{x})^2}}-\frac{n-1}{n}MC\]

这也与上述两个系数之间的属性相符(inverse relationship),GR也更容易受离群值影响,如果GR+MC≈1,那么说明数据质量较好(没有极端离群值),因为GR变化的范围更大(更不稳定),且MC在任意量表下都适用,所以MC是更优的度量空间自相关的指数。

小结——二者的比较

Method MC GR
What does it measure? Clustering data across the space Dissimmilarities between data points of juxtaposition
What’s the reference? Global mean of data points Differences between neighboring data points
Similar to PPMCC Variogram
Weakness Have to know the mean Not influenced by sample size and spatial weight:less robust statistics

补充:Getis-Ord G

2.2 Graphic portrayals: the Moran scatterplot and semi-variogram plot

The Moran scatterplot is a two-dimensional diagram using Cartesian coordinates to display pairs of values in a manner that summarizes the relationship between the observations comprising a univariate georeferenced dataset.

Methods:

注意:这里的莫兰散点图也就是所谓的LISA(Local Indicators of Spatial Association)图,只不过表述方式有区别(《空间数据分析》教材上的表述是“描述观测变量\(x\)和其空间滞后变量\(W_x\)(即该空间单元周围单元的观测变量的值得加权平均值)”),LISA图的斜率就是未标准化的MC(需要除以距离矩阵权重之和),右上、左下、右下、左上四个区域分别代表高-高、低-低、高-低、低-高四种聚类形式。

这里再来补充一下局部莫兰指数的定义。

探究是否存在观测值的局部聚集、哪个空间单元对全局空间自相关的有更大的贡献时,需要使用局部空间自相关分析。局部莫兰指数就是一种常用的度量指数。

\[MC_i=\frac{(x_i-\overline{x})}{S^2}\sum_{j=1}^{n}{c_{ij}(x_j-\overline{x})}=\frac{n(x_i-\overline{x})\sum_{j=1}^{n}{c_{ij}(x_j-\overline{x})}}{\sum_{i=1}^{n}{(x_i-\overline{x})^2}}=\frac{nz_i\sum_{j=1}^{n}{c_{ij}z_j}}{Z^TZ}\]

其中 \(S^2=\frac{\sum_{i=1}^{n}{(x_i-\overline{x})^2}}{n}\), \(\overline{x}=\frac{1}{n}\sum_{i=1}^{n}{x_i}\), \(z_i\) 和 \(z_j\)是经过标准差标准化后的观测值。

有\(\sum_{i=1}^{n}{MC_i}=S_0MC\)

(此外,Join Count统计量也是一种衡量空间自相关的方法(主要用于衡量名义量表变量的空间自相关,也有简单的介绍,这里跳过了)

The semi-variogram scatterplot is a two-dimensional diagram using Cartesian coordinates of the first quadrant (i.e., all values are non-negative) to display pairs of values in a manner that summarizes the relationship between the variation for a univariate georeferenced variable and distance separating the georeferenced observations.

也就是变差函数图,横轴为标准化之后的距离,纵轴为半方差值,与横轴为Topological Lag(空间滞后),纵轴为GR值的散点图趋势一致(因为二者都与方差有关),由三个主要部分组成:

(后面应该还会见到)

2.3 Impacts of spatial autocorrelation

Variance Inflation

也就是方差膨胀,变量之间存在着相关性(多重共线性)

2.4 Testing for spatial autocorrelation in regression residuals

回归残差的空间自相关检验,对于不同的回归方程Cliff和Ord给出了不同的检验量。(跳过)

2.5 R Code for concept implementations

00100000 11000000

00000000