---
title: "Tutorial: Regional mapping of climate variables from point samples (Northeast China) -- using additional covariates"
author: "D G Rossiter"
date: "`r Sys.Date()`"
output:
   html_document:
     toc: TRUE
     toc_float: TRUE
     theme: "lumen"
     code_folding: show
     number_section: TRUE
     fig_height: 4
     fig_width: 4
     fig_align: 'center'
---

```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.width=4, fig.height=4, fig.align='center',
                      fig.path='kgraph/use', warning=FALSE, message=FALSE)
writeLines(capture.output(sessionInfo()), "sessionInfo_zhClimateUseAdditionalCovariates.txt")
```

# Introduction

In tutorial `ZhClimate_SetupAdditionalCovariates.html` we developed several predictors (covaribles) that might have a relation to regional or local climate: a terrain index (Multi-resolution Valley-Bottom Flatness MRVBF), and population density within two radii around stations or prediction points (nominally 2.5' and 15').

We extracted the values of these at the stations (objects `zhne.m` and `zhne.df`, as a `SpatialPointsDataFrame` and `data.frame`, respectively) and over the grid (objects `dem.ne.m.sp` and `zhne.m`, as a `SpatialPixelsDataFrame` and `data.frame`, respectively).

These were saved in the R Data file `Zh_StationsDEM_covariates.RData`.

Packages used in this tutorial:

```{r}
library(ggplot2)
library(sp)
library(ranger)
library(caret)
library(rpart)
library(rpart.plot)
library(Cubist)
# library(mgcv)
# library(spgwr)
```

## Display functions

A function for map display:

```{r}
display.prediction.map <- function(.prediction.map.name,
                                   .plot.title, 
                                   .legend.title,
                                   .plot.limits=NULL,
                                   .palette="YlGnBu") {
    ggplot(data=zhne.m.df) +
        geom_point(aes_(x=quote(E), y=quote(N),
                        color=as.name(.prediction.map.name))) +
        xlab("E") + ylab("N") + coord_fixed() +
        geom_point(aes(x=E, y=N, color=t.avg),
                   data=zhne.df, shape=I(20)) +
        ggtitle(.plot.title) +
        scale_colour_distiller(name=.legend.title,
                               space="Lab",
                               palette=.palette,
                               limits=.plot.limits)
}
```


A function to display the difference between two maps:

```{r}
display.difference.map <- function(.diff.map.name,
                                   .diff.map.title,
                                   .legend.name,
                                   .palette="Spectral") {
    ggplot(data=zhne.m.df) +
        geom_point(aes_(x=quote(E), y=quote(N),
                            color=as.name(.diff.map.name))) +
        xlab("E") + ylab("N") + coord_fixed() +
        ggtitle(.diff.map.title) +
        scale_colour_distiller(name=.legend.name, 
                               space="Lab",
                               palette=.palette)
    }
```


# Dataset


```{r}
load("./Zh_StationsDEM_covariates.RData", verbose=TRUE)
names(zhne.df)
summary(zhne.df)
length(pred.fields <- c(9,16:20))
(pred.names <- names(zhne.df)[pred.fields])
(model.formula <- paste("t.avg ~", paste0(pred.names, collapse=" + ")))
length(pred.fields.base <- c(9,16:17))
(pred.names.base <- names(zhne.df)[pred.fields.base])
(model.formula.base <- paste("t.avg ~", paste0(pred.names.base, collapse=" + ")))
names(zhne.m)
summary(zhne.m)
```

Make a data frame version of the point dataset, with the projected coördinates as fields:

```{r}
zhne.m.df <- as(zhne.m, "data.frame")
names(zhne.m.df)
```

We see the list of predictors that could be used for modelling and mapping. In the `Spatial*` versions the coordinates are in the `@coordinates` slot, not in the `@data` slot.

# Linear modelling

## Relation among predictors

First, investigate the relation among the predictors:

```{r pairs,fig.height=10, fig.width=10}
cor(zhne.df[, pred.fields], use="pairwise.complete.obs")
pairs(zhne.df[, pred.fields], pch=20, cex=0.4)
```

The two population densities are closely related. E and N are correlated because of the shape of the study area.

Elevation and E are weakly correlated because more mountains towards the W. 

MRVBF is not related to anything else.

So there is definite colinearity of predictors.


## Model

Build an OLS linear model:

```{r ols}
summary(m <- lm(model.formula, data=zhne.df))
```

All predictors are significantly different from 0. $R^2$ is very high, `r round(summary(m)$adj.r.squared,3)`.

Is there colinearity?

```{r vif}
car::vif(m)
```

Not too strong colinearity.

Can we reduce the model?

```{r}
summary(m.step <- step(m, direction="backward", trace=1))
```

No.

Regression diagnostics:

```{r ols.final, fig.width=12, fig.height=4}
par(mfrow=c(1,3))
plot(m, which=c(1,2,6))
par(mfrow=c(1,1))
ix <- which(abs(residuals(m)) > 2)
zhne.df[ix, 2:3]
```

Not bad except seriously over-predicts at the extreme low temperture end of the model.
Also, a few very high leverage points with large Cook's distance.

Show the fit:

```{r lm.1to1, fig.width=8, fig.height=8}
summary(p.lm <- predict(m, newdata=zhne.df))
plot(p.lm, zhne.df$t.avg, asp=1, col=zhne.df$province.en, pch=20,
     main="T annual Avg.", ylab="Measured", xlab="OLS fit")
legend("topleft", levels(zhne.m$province.en), pch=20, col=1:9)
abline(0,1); grid()
```


## Mapping

Map:



```{r lm.map, eval=TRUE, fig.width=6, fig.height=6}
zhne.m.df$pred.lm <- predict(m, newdata=zhne.m.df)
summary(zhne.m.df$pred.lm)
display.prediction.map("pred.lm",
  "OLS prediction, all predictors",
  "t.avg")
```

The coldest-temperatures are over-predicted.


# Regression Trees

## Model

```{r rt, fig.width=6, fig.height=4}
m.rt <- rpart(model.formula, data=zhne.df, minsplit=2, cp=0.001)
plotcp(m.rt)
```

Variable importance: N >> E, pop15, elev.m  > pop2pt5 >> mrvbf

Very little or no improvement in cross-validation accuracy after about 0.003.

Prune:

```{r rt2,fig.width=10, fig.height=6}
m.rt.p <- prune(m.rt, cp=0.003)
x <- m.rt.p$variable.importance
data.frame(variableImportance = 100 * x / sum(x))
rpart.plot(m.rt.p, digits=3, type=4, extra=1)
```

The N  is the most important. Only N, E, elevation come in the model.


```{r plot.rt,fig.width=8, fig.height=8}
summary(p.rt <- predict(m.rt.p, newdata=zhne.df))
plot(p.rt, zhne.df$t.avg, asp=1, col=zhne.df$province.en, pch=20,
     main="T annual avg.", ylab="Measured", xlab="Regression Tree fit")
legend("topleft", levels(zhne.m$province.en), pch=20, col=1:9)
abline(0,1); grid()
```
 
This has no bias at the lower temperatures, but has a large spread at each prediction.

## Mapping

```{r rt.map, eval=TRUE, fig.width=6, fig.height=6}
zhne.m.df$pred.rt <- predict(m.rt.p, newdata=zhne.m.df)
summary(zhne.m.df$pred.rt)
display.prediction.map("pred.rt",
  "regression tree prediction, all predictors",
  "t.avg")
```

The obvious blocks separated by N and E.

# Random Forests

## Optimization

Determine the optimum parameters for a random forest model using `ranger`:

```{r ranger.tune}
dim(preds <- zhne.df[, pred.fields])
length(response <- zhne.df[, "t.avg"])
system.time(
  ranger.tune <- train(x = preds, y = response,  
                       method="ranger", 
                       tuneGrid = expand.grid(.mtry = 2:4, 
                                              .splitrule = "variance",
                                              .min.node.size = 1:10),  
                       trControl = trainControl(method = 'cv'))
)
```

View the training results:

```{r ranger.tune.results, fig.width=6, fig.height=4}
ix <- which.min(ranger.tune$result$RMSE)
ranger.tune$result[ix, c(1,3,4)]
ix <- which.max(ranger.tune$result$Rsquared)
ranger.tune$result[ix, c(1,3,5)]
ix <- which.min(ranger.tune$result$MAE)
ranger.tune$result[ix, c(1,3,6)]
plot.train(ranger.tune, metric="RMSE")
plot.train(ranger.tune, metric="Rsquared")
```

RMSE is lowest with 4 predictors and four-item nodes. R^2 highest with the same.

## Best model

Build a model with the optimal parameters:

```{r rf.full}
rf <- ranger(model.formula,
             data=zhne.df, importance="impurity", mtry=4, min.node.size=4,
             oob.error=TRUE, num.trees=1024)
print(rf)
str(rf, max.level=1)
round(rf$prediction.error/rf$num.samples, 4)
round(rf$r.squared,3)
```

A very good model.

Plot against 1:1 line:

```{r rf.full.plot,fig.width=8, fig.height=8}
plot(rf$predictions, zhne.df$t.avg, asp=1, col=zhne.df$province.en, pch=20,
     main="Annual T avg", ylab="Measured", xlab="Ranger RF fit")
legend("topleft", levels(zhne.m$province.en), pch=20, col=1:9)
abline(0,1); grid()
```

One very poor OOB error (泰山 I think); scattered in 内蒙古 but not bad.

Variable importance:

```{r}
sort(round(ranger::importance(rf)/sum(ranger::importance(rf)),3), decreasing=TRUE)
```

N >> elevation = population 15' >> E >> population 2.5'.

The mean out-of-bag error is `r round(rf$prediction.error/rf$num.samples,3)`, effectively zero (unbiased).

## Mapping


```{r rf.map, eval=TRUE, fig.width=6, fig.height=6}
zhne.m.df$pred.rf <- predict(rf, data=zhne.m.df)$predictions
summary(zhne.m.df$pred.rf)
display.prediction.map("pred.rf",
  "Annual average T, ranger prediction",
  "deg C")
```

The effect of the population grid is clear.

# Cubist

```{r cubist.tune, fig.width=6, fig.height=4}
system.time(
  cubist.tune <- train(x = preds, y = response, method="cubist",
                    tuneGrid = expand.grid(.committees = 1:12, .neighbors = 0:8),
                    trControl = trainControl(method = 'cv'))
)
plot(cubist.tune, metric="RMSE")
plot(cubist.tune, metric="Rsquared")
cubist.tune$bestTune$neighbors
```

Committees have a large effect; seven is best.
Adjustment by one and two neighbours is very bad; three is similar to none; 4 to 8 gives steady improvement but not much past 5.

Best model:

```{r cubist.model}
c.model <- cubist(x = preds, y = response, committees=7, neighbours=5)
summary(c.model)
```


```{r cubist1to1,fig.width=8, fig.height=8}
cubist.fits <- predict(c.model, newdata=preds,
                       neighbors=cubist.tune$bestTune$neighbors)
(rmse.cubist <- sqrt(mean((cubist.fits - zhne.df$t.avg)^2)))
plot(cubist.fits, zhne.df$t.avg, asp=1, col=zhne.df$province.en, pch=20,
     main="deg C", ylab="Measured", xlab="Cubist fit")
legend("topleft", levels(zhne.m$province.en), pch=20, col=1:9)
abline(0,1); grid()
```

This is a highly-successful model, even at the lower temperatures. The value of using linear regression at the nodes is clear.

Predict over the grid:

```{r cubist.pred, fig.width=6, fig.height=6}
summary(zhne.m.df$pred.cubist <- 
          predict(c.model, newdata=zhne.m.df,
                  neighbours=cubist.tune$bestTune$neighbors))
display.prediction.map("pred.cubist",
  "T Avg, Cubist prediction",
  "deg C")
```

# Difference with simple models

Do the additional predictors improve or degrade the model? Compare with the base model, which uses only the coordinates and elevation:

A matrix of the values of base predictors at each station:

```{r preds.base}
pred.fields.base
dim(preds.base <- zhne.df[, pred.fields.base])
```

## Linear model

```{r ols.pred}
summary(m.base <- lm(model.formula.base, data=zhne.df))
```

$R^2$ is still very high, `r round(summary(m.base)$adj.r.squared,3)`, compared to the full model `r round(summary(m)$adj.r.squared,3)`.

Is there colinearity?

```{r vif.base}
car::vif(m.base)
```

No.

Regression diagnostics:

```{r ols.base, fig.width=12, fig.height=4}
par(mfrow=c(1,3))
plot(m.base, which=c(1,2,6))
par(mfrow=c(1,1))
ix <- which(abs(residuals(m.base)) > 2)
zhne.df[ix, 2:3]
```

Quite similar problems to full model.

Show the fit:

```{r lm.base.1to1, fig.width=8, fig.height=8}
summary(p.lm.base <- predict(m.base, newdata=zhne.df))
plot(p.lm.base, zhne.df$t.avg, asp=1, col=zhne.df$province.en, pch=20,
     main="T annual Avg.", ylab="Measured", xlab="OLS fit (3 predictors)")
legend("topleft", levels(zhne.m.df$province.en), pch=20, col=1:9)
abline(0,1); grid()
```

```{r lm.base.map, eval=TRUE, fig.width=6, fig.height=6}
zhne.m.df$pred.lm.base <- predict(m.base, newdata=zhne.m.df)
summary(zhne.m.df$pred.lm.base)
display.prediction.map("pred.lm.base",
  "OLS prediction, 3 predictors",
  "t.avg")
```

Compute and display the difference with the 5-predictor map:

```{r}
summary(zhne.m.df$diff.lm.5.3 <- 
          zhne.m.df$pred.lm - zhne.m.df$pred.lm.base)
```

```{r display.ols.diff, fig.width=6, fig.height=6}
display.difference.map("diff.lm.5.3",
  "Difference OLS 5 and 3 predictors",
  "+/- deg C")
```

The effect of the population density is clear!

## Random forest

Tune and build a random forest model using just three predictors:

```{r}
system.time(
  ranger.tune.base <- train(x = preds.base, y = response, method="ranger", 
                       tuneGrid = expand.grid(.mtry = 1:3, 
                                              .splitrule = "variance",
                                              .min.node.size = 1:10), 
                       trControl = trainControl(method = 'cv'))
)
```


```{r ranger.tune.results.base, fig.width=6, fig.height=4}
ix <- which.min(ranger.tune.base$result$RMSE)
ranger.tune$result[ix, c(1,3,4)]
ix <- which.max(ranger.tune.base$result$Rsquared)
ranger.tune$result[ix, c(1,3,5)]
ix <- which.min(ranger.tune.base$result$MAE)
ranger.tune$result[ix, c(1,3,6)]
plot.train(ranger.tune.base, metric="RMSE")
plot.train(ranger.tune.base, metric="Rsquared")
```

Consistently two predictors and node size 2.

Build a model with the optimal parameters:

```{r rf.base}
rf.base <- ranger(model.formula.base,
             data=zhne.df, importance="impurity", mtry=2, min.node.size=2,
             oob.error=TRUE, num.trees=1024)
print(rf.base)
sort(round(ranger::importance(rf.base)/sum(ranger::importance(rf.base)),3), decreasing=TRUE)
round(rf.base$prediction.error/rf.base$num.samples,3)
round(rf$prediction.error/rf.base$num.samples,3)
round(rf.base$r.squared,3)
round(rf$r.squared,3)
```

This simpler model is very close to the complex model.

Northing >> elevation >> E.

Compute and display the difference with the 9-predictor map:

```{r}
zhne.m.df$pred.rf.base <- predict(rf.base, data=zhne.m.df)$predictions
summary(zhne.m.df$diff.rf.5.3 <- 
          zhne.m.df$pred.rf - zhne.m.df$pred.rf.base)
```

Almost no difference, nothing to plot.

```{r display.rf.diff, fig.width=6, fig.height=6, eval=TRUE}
display.difference.map("diff.rf.5.3",
  "Difference RF 5 and RF 3 predictors",
  "+/- deg C")
```

Quite some differences here, seems more due to terrain (mrvbf) than population density.

## Cubist

Tune and build a Cubist model using just three predictors:

```{r cubist.tune.base, fig.width=6, fig.height=4}
system.time(
  cubist.tune.base <- train(x = preds.base, y = response, method="cubist",
                    tuneGrid = expand.grid(.committees = 1:12, .neighbors = 0:8),
                    trControl = trainControl(method = 'cv'))
)
plot(cubist.tune.base, metric="RMSE")
plot(cubist.tune.base, metric="Rsquared")
cubist.tune.base$bestTune$neighbors
```

Committees improve considerably, up to 3. Two neighbours are as good as more.

Adjustment by one to four neighbours worsens the OOB fit; five to to eight improves it; after seven, eight gives little advantage.

Best model:

```{r cubist.model.base}
c.model.base <- cubist(x = preds.base, y = response, committees=3, neighbours=2)
summary(c.model.base)
```


```{r cubist1to1.base}
cubist.fits.base <- predict(c.model.base, newdata=preds.base,
                       neighbors=cubist.tune.base$bestTune$neighbors)
(rmse.cubist.base <- sqrt(mean((cubist.fits.base - zhne.df$t.avg)^2)))
print(rmse.cubist)
```

This is a slightly worse model than the full model.

Predict over the grid:

```{r cubist.pred.base, fig.width=6, fig.height=6}
summary(zhne.m.df$pred.cubist.base <- 
          predict(c.model.base, newdata=zhne.m.df,
                  neighbours=cubist.tune$bestTune$neighbors))
display.prediction.map("pred.cubist.base",
  "Annual T avg, Cubist prediction (3 predictors)",
  "deg C")
```

Compute and display the difference with the 9-predictor map:

```{r}
zhne.m.df$pred.cubist.base <- 
          predict(c.model.base, newdata=zhne.m.df,
                  neighbours=cubist.tune$bestTune$neighbors)
summary(zhne.m.df$diff.cubist.5.3 <- 
          zhne.m.df$pred.cubist - zhne.m.df$pred.cubist.base)
```


```{r display.cubist.diff, fig.width=6, fig.height=6}
display.difference.map("diff.cubist.5.3",
  "Difference Cubist 5 and 3 predictors",
  "+/- deg C")
```

Patchy small differences but an unusual "stripe" in the N.