Example uncertainty and sensitivity
Example_uncertainty_sensitivity.Rmd
Introduction
This document will describe the steps one can take to determine how uncertainty in input parameters propagates through the CENTURY model. It will also specify the procedures used to quantify sensitivity to those parameters.
We will start off by giving some necessary definitions. As stated in (Taylor 2022), ‘Error analysis is the study and evaluation of uncertainty in measurement’. On the other hand, we can define sensitivity analysis as ‘The study of how the uncertainty in the output of a model (numerical or otherwise) can be apportioned to different sources of uncertainty in the model input’ (Saltelli et al. 2004).
As can be seen, both topics may sound somehow related and even seemingly partially overlap.
We can use the Rcentury package to assess how uncertainties (aka errors) in a model’s parameters contribute to the overall uncertainty of its output. In general, for many models it is possible to derive approximate analytical expressions for n-error error propagation by using Taylor expansions, which involves calculating analytical n-order derivatives. However, this approach quickly becomes cumbersome and impractical when the number of parameters grows high and, especially, when the function to evaluate is very complex and non-linear like CENTURY. Therefore, to determine error propagation for the CENTURY model we turn to a much simpler Monte Carlo numerical approach, which consists of feeding the model with input parameters of interests which vary randomly at each model run following a given distribution of known mean and variance.
For the sake of simplicity we will demonstrate how to perform an uncertainty propagation analysis for soil texture (i.e. clay, silt and sand) only, keeping all the other parameters of the model fixed. In this case, at each model run we will draw random proportion values for each of the three parameters, calculate the CENTURY model with those values and then save the output. This will be done many times to derive reliable confidence intervals for the output.
Soil texture is given as a set of 3 proportions which add up to 1 always, and thus we need to specify a random distribution that accounts for that property. Consequently, we turn to Dirichlet distributions, which allow us to model probability distributions of outcomes for categorical events, such that the total probability adds up to one. For details, see Annex below.
Rcentury numerical simulations
For this example we will use the forest data from CENTURY example files. We will create a temporary directory and move there all necessary files (ASCII data files and executables). The simulation below will be carried out with the so-called file example, though any other site would do.
# Copy all files to temporary folder.
from_dir <- system.file("extdata/4.forest", package = "Rcentury")
to_dir <- tempdir()
invisible(file.copy(from_dir, to_dir, overwrite = TRUE, recursive = TRUE))
path_exe <- system.file("bin/windows", package = "Rcentury")
century_exe <- file.path(path_exe, "century_47.exe")
list100_exe <- file.path(path_exe, "list100_47.exe")
to_dir <- file.path(to_dir, "4.forest")
invisible(file.copy(century_exe, to_dir, overwrite = TRUE, recursive = TRUE))
invisible(file.copy(list100_exe, to_dir, overwrite = TRUE, recursive = TRUE))
We will also need package , available from CRAN, which will provide us with a random-number generator for the Dirichlet distribution. We will also load the Rcentury package.
# install.packages("extraDistr")
library(extraDistr)
#> Warning: package 'extraDistr' was built under R version 4.4.3
library(Rcentury)
We intend to model the propagation of uncertainty for, and sensitivity to, texture parameters. Those parameters are included in the site file, which is then read by during execution. Consequently, we use package functions and to read site data and write it back onto disk at each simulation step.
The output of the CENTURY model may include up to approximately 600 variables, in this example we will focus on total system carbon (variable ).
# Read site characteristics and extract sand, slit and clay values.
site <- site2 <- read_site(to_dir, "duke.100")
x <- site$`site and control`$df
sand <- x$value[x$field == "SAND"]
silt <- x$value[x$field == "SILT"]
clay <- x$value[x$field == "CLAY"]
alpha <- dirichlet_params(c(sand, silt, clay), c(NA, .01, NA))
# We need an ASCII text file with a list of variables to extract; in this case, only $totsysc$.
cat("totsysc", file = file.path(to_dir, "totsysc.txt"))
# Finally, the simulations.
nsimu <- 100
texture <- matrix(0, 3, nsimu)
df <- list()
for (i in 1:nsimu) {
# Random Dirichlet triplet.
texture_random <- rdirichlet(1, alpha)
texture[, i] <- texture_random
# Save <site>.100 file.
x <- site2$`site and control`$df
x$value[x$field == "SAND"] <- texture_random[1]
x$value[x$field == "SILT"] <- texture_random[2]
x$value[x$field == "CLAY"] <- texture_random[3]
site2$`site and control`$df <- x
write_site(site2, to_dir, "duke.100", overwrite = TRUE)
df[[i]] <- century_run(to_dir, "duke.sch", "res.bin", "res.lis", "totsysc.txt", verbose = FALSE)
invisible(file.remove(file.path(to_dir, c("res.bin", "res.lis"))))
}
Once we have performed all the simulations, we can proceed to determine how the uncertainty propagates through the model and what is its sensitivity to each of the three soil components.
Model uncertainty
x <- df[[1]]$time
z <- sapply(df, function(x) x$totsysc)
y <- apply(z, 1, mean)
ci_lo <- apply(z, 1, quantile, prob = 0.025)
ci_up <- apply(z, 1, quantile, prob = 0.975)
plot(x, y, type = "l", ylim = c(0, max(z)), xlab = "Time (years)", ylab = "Total system C")
polygon(c(x, rev(x)), c(ci_lo, rev(ci_up)), col = "skyblue", border = NA)
points(x, y, type = "l")
Model sensitivity
To determine model sensitivity to the three texture parameters we apply a
par(mfcol = c(2, 1))
nx <- length(x)
cc <- sapply(2:nx, function(i) sapply(1:3, function(j) cor(texture[j,], z[i,])^2))
plot(2:nx, cc[1, ], ylim = c(0, 1), xlab = "Time (years)", ylab = "R-squared",
col = "red", type = "l", lwd = 2)
points(2:nx, cc[2, ], col = "blue", type = "l", lwd = 2)
points(2:nx, cc[3, ], col = "green", type = "l", lwd = 2)
legend("right", col = c("red", "blue", "green"), lwd = 2, legend = c("Sand", "Silt", "Clay"))
cn <- sweep(cc, 2, apply(cc, 2, sum), FUN = "/")
plot(2:nx, cn[1, ], ylim = c(0, 1), xlab = "Time (years)", ylab = "Normalized R-squared",
col = "red", type = "l", lwd = 2)
points(2:nx, cn[2, ], col = "blue", type = "l", lwd = 2)
points(2:nx, cn[3, ], col = "green", type = "l", lwd = 2)
legend("topright", col = c("red", "blue", "green"), lwd = 2, legend = c("Sand", "Silt", "Clay"))
par(mfrow = c(1, 1))
Annex: How to draw multivariate proportions
The means of the proportions will be fixed in our simulations.
where:
and:
as well as:
The mean of the is, then:
and the expression for the variance is:
where:
We can use those two expressions to calculate the values when we are given four parameters for the or . The Rcentury package provides with a function, called , that can calculate the values when all the means and one single variance value are known.
For example, for and , , and , we can easily generate sequences of Dirichlet-distributed triplets.
alpha <- dirichlet_params(c(.28, .60, .12), c(NA, .01, NA))
x <- rdirichlet(100000, alpha)
par(mfrow = c(1, 3))
hist(x[, 1], xlab = "", main = "Sand", breaks = seq(0, 1, length = 50))
hist(x[, 2], xlab = "", main = "Silt", breaks = seq(0, 1, length = 50))
hist(x[, 3], xlab = "", main = "Clay", breaks = seq(0, 1, length = 50))