forked from edzer/mstp
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlec4.Rmd
195 lines (169 loc) · 6.15 KB
/
lec4.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
### Spatial modelling and spatial interpolation
\(
\newcommand{\Cor}{{\rm Corr}}
\)
* Simple ways of interpolation
* Simple statistical models for interpolation
* Geostatistical interpolation
* Deterministic models
* Combined approaches
### Taking a step back
Why do we need models?
* to understand _relations_ or _processes_
* to assess (predict, forcast, map) something we do
or did not measure _and cannot see_
* to assess the consequence of decisions (scenarios)
where we cannot measure
### A sample data set
```{r}
library(sp)
data(meuse.riv)
data(meuse)
coordinates(meuse) = c("x", "y")
meuse.sr = SpatialPolygons(list(Polygons(list(Polygon(meuse.riv)),"meuse.riv")))
spplot(meuse, "zinc", col.regions=bpy.colors(), main = "zinc, ppm",
cuts = c(100,200,400,700,1200,2000), key.space = "right",
sp.layout= list("sp.polygons", meuse.sr, fill = "lightblue")
)
```
### Thiessen "polygons", 1-NN
```{r}
library(gstat)
data(meuse)
coordinates(meuse) = ~x+y
data(meuse.grid)
gridded(meuse.grid) = ~x+y
meuse.th = idw(zinc~1, meuse, meuse.grid, nmax = 1)
spplot(meuse.th[1],
col.regions = bpy.colors(30), cuts = 29,
sp.layout = list("sp.points", meuse, col = 3, cex=.5),
main = "Zinc, 1-nearest neighbour")
```
### Zinc concentration vs. distance to river
```{r}
library(sp)
data(meuse)
plot(log(zinc)~sqrt(dist),meuse)
abline(lm(log(zinc)~sqrt(dist),meuse))
```
### Zinc conc. vs. distance to river: map of linear trend
```{r}
meuse.grid$sqrtdist = sqrt(meuse.grid$dist)
spplot(meuse.grid["sqrtdist"], col.regions = bpy.colors())
```
### Inverse distance weighted interpolation
Uses a **weighted** average:
$$\hat{Z}(s_0) = \sum_{i=1}^n \lambda_i Z(s_i)$$
with $s_0 = \{x_0, y_0\}$,
or $s_0 = \{x_0, y_0, \mbox{depth}_0\}$
weights inverse proportional to power $p$ of distance:
$$\lambda_i = \frac{|s_i-s_0|^{-p}}{\sum_{i=1}^n |s_i-s_0|^{-p}}$$
* power $p$: tuning parameter
* if for some $i$, $|s_i-s_0| = 0$, then $\lambda_i = 1$ and other
weights become zero
* $\Rightarrow$ **exact** interpolator
### Effect of power $p$
```{r}
library(sp)
set.seed(13531)
pts = data.frame(x = 1:10, y = rep(0,10), z = rnorm(10)*3 + 6)
coordinates(pts) = ~x+y
xpts = 0:1100 / 100
grd = data.frame(x = xpts, y = rep(0, length(xpts)))
gridded(grd) = ~x+y
library(gstat)
plot(z ~ x, as(pts, "data.frame"), xlab ='location', ylab = 'attribute value')
lines(xpts, idw(z~1, pts, grd, idp = 2)$var1.pred)
lines(xpts, idw(z~1, pts, grd, idp = 5)$var1.pred, col = 'darkgreen')
lines(xpts, idw(z~1, pts, grd, idp = .5)$var1.pred, col = 'red')
legend("bottomright", c("2", ".5", "5"), lty = 1,
col=c("black", "red", "darkgreen"), title = "invese distance power")
```
### Simple statistical models for interpolation
```{r}
library(sp)
data(meuse)
plot(zinc~dist, meuse)
plot(log(zinc)~sqrt(dist), meuse)
```
### Time series versus spatial data
Differences:
* spatial data live in 2 (or 3) dimensions
* there's no past and future
* there's no simple conditional independence (AR)
Correspondences
* nearby observations are more alike (auto-correlation)
* we can form moving averages
* coordinate reference systems are a bit like time zones and DST
### What information do we have?
* We have measurements $Z(x)$, with $x$ two-dimensional
(location on the map)
* we have $x$ and $y$
* we may have land use data
* we may have soil type or geological data
* we may have remote sensing imagery
* we may have all kinds of relevant information, related
to processes that cause (or result from) $Z(x)$
* we have google maps
_We don't want to ignore anything important_
### Regression or correlation?
(Correlation between two different variables, no auto-correlation:)
```{r}
n = 500
x = rnorm(n)
y = rnorm(n)
R = matrix(c(1,0,0,1),2,2)
R
plot(cbind(x,y) %*% chol(R))
title("zero correlation")
R[2,1] = R[1,2] = 0.5
plot(cbind(x,y) %*% chol(R))
title("correlation 0.5")
R[2,1] = R[1,2] = 0.9
plot(cbind(x,y) %*% chol(R))
title("correlation 0.9")
R[2,1] = R[1,2] = 0.98
plot(cbind(x,y) %*% chol(R))
title("correlation 0.98")
```
### Correlation vs. regression:
Correspondences:
* both are in the "classic statistics" book, and may involve hypothesis testing
* both deal with _two_ continuous variables
* both look at (first order) _linear_ relations
* when correlation is significant, the regression _slope_ is significant
Differences:
* Regression distinguishes $y$ from $x$: $y$ depends on $x$, not reverse;
* the line $y=ax+b$ is _not_ equal to the line $x = cy + d$
* Correlation is symmetric: $\Cor(x,y)=\Cor(y,x)$
* Correlation coefficient is unitless and within $[-1,1]$,
regression coeficients have data units
* Regression is concerned with _prediction_ of $y$ from $x$.
### The power of regression models for spatial prediction
... is hard to overestimate. Regression and correlation are the fork
and knife of statistics.
* linear models have endless application: polynomials, interactions, nested
effects, ANOVA/ANCOVA models, hypothesis testing, lack of fit testing, ...
* predictors can be transformed non-linearly
* linear models can be generalized: logistic regression, Poisson regression, ...,
to cope with discrete data (0/1 data, counts, log-normal)
* many derived techniques solve one particular issue in regression, e.g.:
* ridge regression solves collinearity (extreme correlation among predictors)
* stepwise regression automatically selects "best" models among many candidates
* classification and regression trees
# Why is regression difficult in spatial problems?
Regression models assume independent observations. Spatial data are always
to some degree spatially correlated.
This does not mean we should discard regression, but rather think about
* to which extent is an outcome dependent on independence?
* to which extent is regression _robust_ agains a violated assumption
of independent observations?
* to which extent _is_ the assumption violated? (how strong is the
correlation)
### What is spatial correlation?
Waldo Tobler's first law in geography:
Everything is related to everything else, but near things are more
related than distant things." [Tobler, 1970, p.236]
TOBLER, W. R. (1970). "A computer model simulation of urban growth in
the Detroit region". Economic Geography, 46(2): 234-240.
_But how then is "being related" expressed?_