si6.qmd

$$
\newcommand{\E}{{\rm E}}       % E expectation operator
\newcommand{\Var}{{\rm Var}}   % Var variance operator
\newcommand{\Cov}{{\rm Cov}}   % Cov covariance operator
\newcommand{\Cor}{{\rm Corr}}
$$

# Cokriging, cross validation, conditional simulation

## cokriging
Cokriging sets the multivariate equivalent of kriging, which is, in
terms of number of dependent variables, univariate. Kriging:
$$Z(s) = X(s)\beta + e(s)$$
Cokriging:
$$Z_1(s) = X_1(s)\beta_1 + e_1(s)$$
$$Z_2(s) = X_2(s)\beta_2 + e_2(s)$$
$$Z_k(s) = X_k(s)\beta_k + e_k(s)$$
with $V = \Cov(e_1,e_2,...,e_k)$


Cases where this is useful: multiple spatial correlated variables such as

* chemical properties (auto-analyzers!)
* sediment composition
* electromagnetic spectra (imagery/remote sensing)
* ecological data (abiotic factors; species abundances)
* (space-time data, with discrete time)

Two types of applications:

* undersampled case: secondary variables help prediction of a primary, because we have more samples of them (image?)
* equally sampled case: secondary variables don't help prediction much, but we are interested in _multivariate prediction_, i.e. prediction error covariances.

### Cokriging prediction
Cokriging prediction is not substantially different from kriging
prediction, it is just a lot of book-keeping.

How to set up $Z(s)$, $X$, $\beta$, $e(s)$, $x(s_0)$, $v$, $V$?

Multivariable prediction involves the joint prediction of multiple,
both spatially and cross-variable correlated variables.  Consider $m$
distinct variables, and let
$\{Z_i(s), X_i, \beta^i, e_i(s), x_i(s_0), v_i, V_i\}$ 
correspond to 
$\{Z(s), X, \beta, e(s), x(s_0), v, V\}$ of the $i$-th variable. Next, let
${\bf Z}(s) = (Z_1(s)',...,Z_m(s)')'$, 
${\bf B}=({\beta^1} ',...,{\beta^m} ')'$,
${\bf e}(s)=(e_1(s)',...,e_m(s)')'$,

$$
{\bf X} =
\left[
\begin{array}{cccc}
X_1 & 0 & ... & 0 \\\\
0 & X_2 & ... & 0 \\\\
\vdots & \vdots & \ddots & \vdots \\\\
0 & 0 & ... & X_m \\\\
\end{array}
\right], \
{\bf x}(s_0) =
\left[
\begin{array}{cccc}
x_1(s_0) & 0 & ... & 0 \\\\
0 & x_2(s_0) & ... & 0 \\\\
\vdots & \vdots & \ddots & \vdots \\\\
0 & 0 & ... & x_m(s_0) \\\\
\end{array}
\right]
$$

with $0$ conforming zero matrices, and

$${\bf v} =
\left[
\begin{array}{cccc}
v_{1,1} & v_{1,2} & ... & v_{1,m} \\\\
v_{2,1} & v_{2,2} & ... & v_{2,m} \\\\
\vdots & \vdots & \ddots & \vdots \\\\
v_{m,1} & v_{m,2} & ... & v_{m,m} \\\\
\end{array}
\right], \ \ 
{\bf V} =
\left[
\begin{array}{cccc}
V_{1,1} & V_{1,2} & ... & V_{1,m} \\\\
V_{2,1} & V_{2,2} & ... & V_{2,m} \\\\
\vdots & \vdots & \ddots & \vdots \\\\
V_{m,1} & V_{m,2} & ... & V_{m,m} \\\\
\end{array}
\right]
$$

where element $i$ of $v_{k,l}$ is $\Cov(Z_k(s_i), Z_l(s_0))$, and where
element $(i,j)$ of $V_{k,l}$ is $\Cov(Z_k(s_i),Z_l(s_j))$.

The multivariable prediction equations equal the previous UK equations and
when all matrices are substituted by their multivariable
forms (see also Ver Hoef and Cressie, Math.Geol., 1993), and when for 
$\sigma^2_0$, $\Sigma$ is substituted with $\Cov(Z_i(s_0),Z_j(s_0))$
in its $(i,j)$-th element. Note that the prediction variance
is now a prediction error covariance matrix.

### What is needed?

The main tool for estimating semivariances between different variables
is the _cross variogram_, defined for collocated data as
$$\gamma_{ij}(h) = \mbox{E}[(Z_i(s)-Z_i(s+h))(Z_j(s)-Z_j(s+h))]$$
and for non-collocated data as
$$\gamma_{ij}(h) = \mbox{E}[(Z_i(s)-m_i)(Z_j(s+h)-m_j)]$$
with $m_i$ and $m_j$ the means of the respective variables. Sample cross
variograms are the obvious sums over the available pairs or cross pairs,
as in one of
$$\hat{\gamma}_{jk}(\tilde{h})=\frac{1}{N_h}\sum_{i=1}^{N_h}(Z_j(s_i)-Z_j(s_i+h))(Z_k(s_i)-Z_k(s_i+h))$$
$$\hat{\gamma}_{jk}(\tilde{h})=\frac{1}{N_h}\sum_{i=1}^{N_h}(Z_j(s_i)-m_j)(Z_k(s_i+h)-m_k)$$

### Permissible cross covariance functions
Two classes of permissible cross covariance (semivariance) functions
are often used:

* intrinsic correlation (IC): 
$$\gamma_{jk}(h) = \alpha_{jk} \sqrt{\gamma_{jj}(h)\gamma_{kk}(h)}$$
parameters $\alpha_{jk}$ are correlation cofficients; very strict
* linear model of coregionalization (LMC): 
$$\gamma_{jk}(h) = \sum_{l=1}^p \gamma_{jk,p}(h)$$ (e.g., nugget + spherical model), and
$$\gamma_{jk,p}(h) = \alpha_{jk,p} \sqrt{\gamma_{jj,p}(h)\gamma_{kk,p}(h)}$$

### How to do this?
As multivariable analysis may involve numerous variables, we need to
start organising the available information. For that reason, we collect all the
observation data specifications in a `gstat` object, created by
the function `gstat`. This function does nothing else than ordering
(and actually, copying) information needed later in a single object.
Consider the following definitions of four heavy metals:

```{r}
library(sf)
library(sp)
library(gstat)
data(meuse, package = "sp")
meuse = st_as_sf(meuse, coords = c("x", "y"))
g <- gstat(id = "logCd", formula = log(cadmium)~1, data = meuse)
g <- gstat(g, "logCu", log(copper)~1, data = meuse)
g <- gstat(g, "logPb", log(lead)~1, data = meuse)
g <- gstat(g, "logZn", log(zinc)~1, data = meuse)
g 
vm <- variogram(g)
vm.fit <- fit.lmc(vm, g, vgm(1, "Sph", 800, 1))
plot(vm, vm.fit)
```

### Kriging predictions and errors -- how good are they?
Cross validation can be used to assess the quality of any interpolation,
including kriging. We split the data set in $n$ parts (folds). For each part, we

* leave out the observations of this fold
* use the observations of all other folds to predict the values at the
locations of this fold
* compare the predictions with the observations
This is called $n$-fold cross validation. If $n$ equals the number of
observation, it is called leave-one-out cross validation (LOOCV).

### Cross validation: what does it yield?

* residuals $r(s_i) = z(s_i) -\hat{z}(s_i)$ -- histograms, maps, summary statistics
* mean residual should be near zero
* mean square residual $\sum r(s_i)^2$ should be as small as possible

In case the interpolation method yields a prediction error we can
compute z-scores: $r(s_i)/\sigma(s_i)$

The z-score allows the validation of the kriging error, as the z-score
should have mean close to zero and variance close to 1. If the variance
of the z-score is larger (smaller) than 1, the kriging standard error is
underestimating (overestimating) the true interpolation error, on average.

### Kriging errors -- what do they mean?
Suppose legislation prescribes remediation in case zinc exceeds 500 ppm.
Where does the zinc level exceed 500 ppm?

* we can compare the map of the predictions with 500. However:
    * $\hat{z}(s_0)$ does not equal $z(s_0)$:
    * $\hat{z}(s_0)$ is more smooth than $z(s_0)$
    * $\hat{z}(s_0)$ is closer to the mean than $z(s_0)$
    * smoothing effect is stronger if spatial correlation is small or nugget effect is relatively large
* alternatively we can assume that the true (unknown) value follows
a probability distribution, with mean $\hat{z}(s_0)$ and standard error
$\sigma(s_0)$.
* this latter approach acknowledges that $\sigma(s_0)$ is useful as
a measure of interpolation accuracy

### Conditional probability

* we can use e.g. the normal distribution (on the log-scale?) to 
assess the conditional probability $\Pr(Z(s_0) > 500 | z(s_1),...,z(s_n))$
* the additional assumption underlying this is _multivariate
normality_: in addition to having stationary mean and covariance, the
field $Z$ is now assumed to follow a stationary, multivariate normal
distribution. This means that any single $Z(s_i)$ follows a normal
distribution, and any pair $Z(s_i), Z(s_j)$ follows a bivariate normal
distribution, with known variances and covariance.

How?
```{r}
v = variogram(log(zinc)~1, meuse)
v.fit = fit.variogram(v, vgm(1, "Sph", 900, 1))
data(meuse.grid, package = "sp")
library(stars)
meuse.grid = st_as_stars(meuse.grid)
out = krige(log(zinc)~1, meuse, meuse.grid, v.fit)
out$p500 = 1 - pnorm(log(500), out$var1.pred, sqrt(out$var1.var))
plot(out["p500"], col = sf.colors(), breaks = "equal")
```

### Indicator kriging

Another approach to estimating probabilities of exceedance is to consider the indicator function, which is 1 if a value exceeds the threshold and 0 otherwise:

```{r}
mean(meuse$zinc)
mean(meuse$zinc < 500)
```

```{r}
v = variogram(I(zinc > 500)~1, meuse)
v_I.fit = fit.variogram(v, vgm(.2, "Sph", 900, .02))
plot(v, v_I.fit)
out$I500 = krige(I(zinc > 500)~1, meuse, meuse.grid, v_I.fit)$var1.pred
out["I500"] # summarizes
plot(merge(out[c("p500", "I500")]), col = bpy.colors(), breaks = "equal")
```

This second approach:

* ignores the kriging variance
* generates probabilities outside the interval $[0, 1]$ (to be corrected?)
* ignores information whether observations are close to the threshold, or far away from it (they are reduced to 1/0 variable before interpolating)
* does not assume multivariate normality
* does not distinguish between _estimated_ probabilities and (true) probabilities
* lends itself to the interpolation of _categorical_ (nominal, ordinal) variables 
* cokriging is sometimes used for interpolating several indicator variables (multiple categories, or multiple thresholds)


### Conditional simulation

```{r}
v = variogram(log(zinc)~1, meuse)
v.fit = fit.variogram(v, vgm(1, "Sph", 900, 1))
out = krige(log(zinc)~1, meuse, meuse.grid, v.fit, nmax = 20, nsim = 9)
out = split(out)
out$kriging = krige(log(zinc)~1, meuse, meuse.grid, v.fit)$var1.pred
plot(merge(out), col = sf.colors(), breaks = "equal")
```

```{r}
sf = st_as_sf(out, as_points = TRUE)
v_kr = variogram(kriging~1, sf)
v_cs = variogram(sim1~1, sf)
plot(v_kr)
plot(v_cs)
plot(v)
```

* conditional simulation creates multiple _realisations_ of the field $Z(s)$ that 
    * follow the data points (pattern, reproduce observations like kriging)
    * have a variability equal to Z(s)
    * have a spatial correlation (variogram) equal to that of Z(s)
* as opposed to kriging, the resulting images are not smooth 
* this is useful e.g. if images are needed as input to subsequent processing / modelling, where the statistical properties of Z(s) need to be retained (e.g. simulating rainfall fields as input to rainfall-runoff models, to predict the likelihood of flooding / extreme water levels)