Skip to contents

Introduction

This document will describe the steps one can take to determine how uncertainty in input parameters propagates through the CENTURY model. It will also specify the procedures used to quantify sensitivity to those parameters.

We will start off by giving some necessary definitions. As stated in (Taylor 2022), ‘Error analysis is the study and evaluation of uncertainty in measurement’. On the other hand, we can define sensitivity analysis as ‘The study of how the uncertainty in the output of a model (numerical or otherwise) can be apportioned to different sources of uncertainty in the model input’ (Saltelli et al. 2004).

As can be seen, both topics may sound somehow related and even seemingly partially overlap.

We can use the Rcentury package to assess how uncertainties (aka errors) in a model’s parameters contribute to the overall uncertainty of its output. In general, for many models it is possible to derive approximate analytical expressions for n-error error propagation by using Taylor expansions, which involves calculating analytical n-order derivatives. However, this approach quickly becomes cumbersome and impractical when the number of parameters grows high and, especially, when the function to evaluate is very complex and non-linear like CENTURY. Therefore, to determine error propagation for the CENTURY model we turn to a much simpler Monte Carlo numerical approach, which consists of feeding the model with input parameters of interests which vary randomly at each model run following a given distribution of known mean and variance.

For the sake of simplicity we will demonstrate how to perform an uncertainty propagation analysis for soil texture (i.e. clay, silt and sand) only, keeping all the other parameters of the model fixed. In this case, at each model run we will draw random proportion values for each of the three parameters, calculate the CENTURY model with those values and then save the output. This will be done many times to derive reliable confidence intervals for the output.

Soil texture is given as a set of 3 proportions which add up to 1 always, and thus we need to specify a random distribution that accounts for that property. Consequently, we turn to Dirichlet distributions, which allow us to model probability distributions of outcomes for categorical events, such that the total probability adds up to one. For details, see Annex below.


Rcentury numerical simulations

For this example we will use the forest data from CENTURY example files. We will create a temporary directory and move there all necessary files (ASCII data files and executables). The simulation below will be carried out with the so-called 𝚍𝚞𝚔𝚎.𝟷𝟶𝟶\texttt{duke.100} file example, though any other site would do.

# Copy all files to temporary folder.
from_dir <- system.file("extdata/4.forest",  package = "Rcentury")
to_dir <- tempdir()
invisible(file.copy(from_dir, to_dir, overwrite = TRUE, recursive = TRUE))
path_exe <- system.file("bin/windows",  package = "Rcentury")
century_exe <- file.path(path_exe, "century_47.exe")
list100_exe <- file.path(path_exe, "list100_47.exe")
to_dir <- file.path(to_dir, "4.forest")
invisible(file.copy(century_exe, to_dir, overwrite = TRUE, recursive = TRUE))
invisible(file.copy(list100_exe, to_dir, overwrite = TRUE, recursive = TRUE))


We will also need package 𝚎𝚡𝚝𝚛𝚊𝙳𝚒𝚜𝚝𝚛\texttt{extraDistr}, available from CRAN, which will provide us with a random-number generator for the Dirichlet distribution. We will also load the Rcentury package.

# install.packages("extraDistr")
library(extraDistr)
#> Warning: package 'extraDistr' was built under R version 4.4.3
library(Rcentury)


We intend to model the propagation of uncertainty for, and sensitivity to, texture parameters. Those parameters are included in the 𝚍𝚞𝚔𝚎.𝟷𝟶𝟶\texttt{duke.100} site file, which is then read by 𝚌𝚎𝚗𝚝𝚞𝚛𝚢_𝟺𝟽.𝚎𝚡𝚎\texttt{century_47.exe} during execution. Consequently, we use package functions 𝚛𝚎𝚊𝚍_𝚜𝚒𝚝𝚎.𝚁\texttt{read_site.R} and 𝚠𝚛𝚒𝚝𝚎_𝚜𝚒𝚝𝚎.𝚁\texttt{write_site.R} to read site data and write it back onto disk at each simulation step.

The output of the CENTURY model may include up to approximately 600 variables, in this example we will focus on total system carbon (variable totsysctotsysc).

# Read site characteristics and extract sand, slit and clay values.
site <- site2 <- read_site(to_dir, "duke.100")
x <- site$`site and control`$df
sand <- x$value[x$field == "SAND"]
silt <- x$value[x$field == "SILT"]
clay <- x$value[x$field == "CLAY"]
alpha <- dirichlet_params(c(sand, silt, clay), c(NA, .01, NA))

# We need an ASCII text file with a list of variables to extract; in this case, only $totsysc$.
cat("totsysc", file = file.path(to_dir, "totsysc.txt"))

# Finally, the simulations.
nsimu <- 100
texture <- matrix(0, 3, nsimu)
df <- list()
for (i in 1:nsimu) {
  
  # Random Dirichlet triplet.
  texture_random <- rdirichlet(1, alpha)
  texture[, i] <- texture_random
  
  # Save <site>.100 file.
  x <- site2$`site and control`$df
  x$value[x$field == "SAND"] <- texture_random[1]
  x$value[x$field == "SILT"] <- texture_random[2]
  x$value[x$field == "CLAY"] <- texture_random[3]
  site2$`site and control`$df <- x
  write_site(site2, to_dir, "duke.100", overwrite = TRUE)
  
  df[[i]] <- century_run(to_dir, "duke.sch", "res.bin", "res.lis", "totsysc.txt", verbose = FALSE)
  invisible(file.remove(file.path(to_dir, c("res.bin", "res.lis"))))
}


Once we have performed all the simulations, we can proceed to determine how the uncertainty propagates through the model and what is its sensitivity to each of the three soil components.

Model uncertainty

x <- df[[1]]$time
z <- sapply(df, function(x) x$totsysc)
y <- apply(z, 1, mean)
ci_lo <- apply(z, 1, quantile, prob = 0.025)
ci_up <- apply(z, 1, quantile, prob = 0.975)
plot(x, y, type = "l", ylim = c(0, max(z)), xlab = "Time (years)", ylab = "Total system C")
polygon(c(x, rev(x)), c(ci_lo, rev(ci_up)), col = "skyblue", border = NA)
points(x, y, type = "l")


Model sensitivity

To determine model sensitivity to the three texture parameters we apply a

par(mfcol = c(2, 1))
nx <- length(x)
cc <- sapply(2:nx, function(i) sapply(1:3, function(j) cor(texture[j,], z[i,])^2))
plot(2:nx, cc[1, ], ylim = c(0, 1), xlab = "Time (years)", ylab = "R-squared",
     col = "red", type = "l", lwd = 2)
points(2:nx, cc[2, ], col = "blue", type = "l", lwd = 2)
points(2:nx, cc[3, ], col = "green", type = "l", lwd = 2)
legend("right", col = c("red", "blue", "green"), lwd = 2, legend = c("Sand", "Silt", "Clay"))

cn <- sweep(cc, 2, apply(cc, 2, sum), FUN = "/")
plot(2:nx, cn[1, ], ylim = c(0, 1), xlab = "Time (years)", ylab = "Normalized R-squared",
     col = "red", type = "l", lwd = 2)
points(2:nx, cn[2, ], col = "blue", type = "l", lwd = 2)
points(2:nx, cn[3, ], col = "green", type = "l", lwd = 2)
legend("topright", col = c("red", "blue", "green"), lwd = 2, legend = c("Sand", "Silt", "Clay"))
par(mfrow = c(1, 1))

Annex: How to draw multivariate proportions

The means of the proportions will be fixed in our simulations.

f(x1,,xK;α1,,αK)=1B(𝛂)i=1Kxiαi1f(x_1, \dots, x_K; \alpha_1, \dots, \alpha_K) = \frac{1}{B(\boldsymbol{\alpha})} \prod_{i=1}^K x_i^{\alpha_i - 1}

where:

B(𝛂)=i=1KΓ(αi)Γ(i=1Kαi)B(\boldsymbol{\alpha}) = \frac{\prod_{i=1}^K \Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^K \alpha_i)}

and:

xi[0,1],for all i{1,,K}x_i \in [0, 1], \quad \text{for all } i \in \{1, \dots, K\}

as well as:

i=1Kxi=1\sum_{i=1}^K x_i = 1

The mean of the xix_i is, then:

E[xi]=αij=1KαjE[x_i] = \frac{\alpha_i}{\sum_{j=1}^K \alpha_j} and the expression for the variance is:

Var[xi]=αĩ(1αĩ)α01Var[x_i] = \frac{\tilde{\alpha_i}\cdot(1-\tilde{\alpha_i})}{\alpha_0-1}

where:

αĩ=αiα0\tilde{\alpha_i}=\frac{\alpha_i}{\alpha_0} We can use those two expressions to calculate the αj\alpha_j values when we are given four parameters for the E[xi]E[x_i] or E[Vi]E[V_i]. The Rcentury package provides with a function, called 𝚍𝚒𝚛𝚒𝚌𝚑𝚕𝚎𝚝_𝚙𝚊𝚛𝚊𝚖𝚜.𝚁\texttt{dirichlet_params.R}, that can calculate the αj\alpha_j values when all the means E[xi] for all i{1,,K}E[x_i]\text{ for all } i \in \{1, \dots, K\} and one single variance value Var[xj]Var[x_j] are known.

For example, for K=3K=3 and E[x1]=.35E[x_1]=.35, E[x2]=.12E[x_2]=.12, E[x3]=.53E[x_3]=.53 and Var[x2]=.17Var[x_2]=.17, we can easily generate sequences of Dirichlet-distributed triplets.

alpha <- dirichlet_params(c(.28, .60, .12), c(NA, .01, NA))
x <- rdirichlet(100000, alpha)
par(mfrow = c(1, 3))
hist(x[, 1], xlab = "", main = "Sand", breaks = seq(0, 1, length = 50))
hist(x[, 2], xlab = "", main = "Silt", breaks = seq(0, 1, length = 50))
hist(x[, 3], xlab = "", main = "Clay", breaks = seq(0, 1, length = 50))

par(mfrow = c(1,1))
Saltelli, Andrea, Stefano Tarantola, Francesca Campolongo, Marco Ratto, et al. 2004. Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models. Vol. 1. Wiley Online Library.
Taylor, John R. 2022. An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements. MIT Press.