# Chapter 7 Weather generation

**IMPORTANT NOTE**: In version 0.8.6, **meteoland** incorporated the possibility of generating stochastic weather series. Note, however, that the corresponding functions have been *deprecated* since **ver. 2.0.0**. We still keep this theoretical explanation in case new functions are developed in the future.

Stochastic weather generators are algorithms that produce series of synthetic daily weather data. The parameters of the model are conditioned on existing meteorological records to ensure the characteristics of input weather series emerge in the daily stochastic process. The weather generation approaches available in **meteoland** are intended to be used to generate daily series of the same length as the input. It can be understood as a bootstrap resampling algorithm that tries to preserve some properties of the original weather series. The approach implemented in **meteoland** can be applied to any spatial structure and it preserves the spatial correlation and multivariate covariance structure of weather series (because it works on area-averaged weather and the chosen resampled days are applied to all points/pixels).

Two modes of weather generation are offered:

- Unconditional weather generation is based on a first order Markov chain (MC) to simulate a series of precipitation states (dry/wet/extreme wet) and a K-nearest neighbor (\(k\)-NN) algorithm to select a pair of consecutive days with the same transition and similar weather for the first day.
- Conditional weather generation couples the former generation algorithm with a second algorithm to generate multiyear variation.

## 7.1 Unconditional weather generation algorithm

The algorithm is based on Apipattanavis et al. (2007) and combines a Markov Chain (MC) to generate the sequence of precipitation states with a *k*-nearest neighbor (*k*-NN) bootstrap resampler to generate multivariate and multisite weather variables. The MC is used to better represent wet and dry spell statistics while the *k*-NN bootstrap resampler preserves the covariance structure between weather variables and across space.

The weather generation approach is based on the common practice of first simulating precipitation occurence as a chain-dependent process. A three-state (dry / wet / extremely wet) Markov chain (MC) of order 1 is used to simulate an (area-averaged) precipitation state series. Nine transition probabilities are fit to the (area-averaged) input precipitation state series by month using maximum likelihood. A threshold of 0.3 mm (by default) is chosen to distinguish between dry and wet days, while the 80th percentile of precipitation (on wet days) is used as a threshold for extremely wet conditions. The fitted MC can be used to simulate (area-averaged) precipitation state series. A *k*-NN resampling algorithm of lag-1 is used
to generate the values for all the weather variables, including disaggregation to \(L\) locations. The *k*-NN bootstrap resampler algorithm follows six steps (description adapted from Steinschneider et al. 2013):

- Let \(\bar{\bf{x}}_{t-1}\) be a vector of (area-averaged) weather variables already simulated for day \(t - 1\). Also assume that the MC has simulated precipitation state series for days \(t-1\) and \(t\).
- Partition the historic record to find all pairs of consecutive days in a 7-day window (by default) centered on the day of the year of day \(t\) (i.e. if \(t\) is 15 January, then the window includes days from 12 to 18 January) that has the same sequence of (area-averaged) precipitation states simulated by the MC for days \(t-1\) and \(t\). Assume there are \(Q\) such pairs.
- Compute the distance between vector \(\bar{\bf{x}}_{t-1}\) and the vectors \(\bar{\bf{x}}^{1}_{q}\) corresponding to the first day of each \(q\) pair of consecutive days. The weighted Euclidean distance \(d_q\) between vector \(\bar{\bf{x}}_{t-1}\) and \(\bar{\bf{x}}^{1}_{q}\) is calculated as: \[\begin{equation} d_q = \sqrt{\sum_{i=1}^{r}{w_i \cdot (\bar{x}_{i,t-1} - \bar{x}^{1}_{i,q})^2}} \end{equation}\] where \(w_i\) is set to the inverse of the standard deviation of the \(i\)th weather variable. Here mean temperature and precipitation are used and we force a ten-fold weight for precipitation compared to temperature.
- Order the distances \(d_q\) from smallest to largests and select the \(k\) smallest distances, where \(k = \sqrt{Q}\). These corresponding \(k\) neighbors are assigned resapling weights \(K\) using a discrete kernel function: \[\begin{equation} K[j] = \frac{1/j}{\sum_{j=1}^{k}{1/j}} \end{equation}\] where \(j\) indexes the \(k\) neighbors.
- Sample one of the \(k\) neighbors and record the historic date associated with that selected neigbor. Then, use vectors of weather variables on the successive day to the recorded date for each of the \(L\) locations to simulate, multisite, weather for day \(t\).
- Repeat steps 1-5 for all days of the simulation.

To begin the resampling algorithm and generate initial (area-averaged) \(\bar{\bf{x}}_{1}\) values for all weather variables, data from a random day from the simulation starting month is selected from the historic record that is consistent with the first precipitation state simulated by the MC.

## 7.2 Conditional weather generation

The algorithm described in the previous section allows generating intra-annual (i.e. seasonal) weather variation because of monthly calibration of the MC and \(k\)-NN selection of pairs of days on the basis of their day of the year. However, year-to-year variation in annual precipitation or mean temperature will arise due to stochasticity only. Multi-year variation (i.e. low-frequency periodic variations or long-term trends) cannot be simulated. Two alternative approaches are offered in **meteoland** to condition stochastic weather generation on multi-year variation.

Annual precipitation can be conditioned by using a stationary auto-regressive (ARIMA) model annual precipitation and then using a \(k\)-nearest neighbor algorithm (similar to that of the previous section) to select years of the original weather series with annual precipitation similar to the simulated one and produce a bootstrap resample of weather data to train the MC-KNN algorithm (Steinschneider and Brown 2013). This option is recommended if low-frequency variation of annual precipitation is to be acounted for in long series.

Multi-year temperature or precipitation trends of the original series can be preserved by using a moving window. Each target year, a window around the target year is used to subset the original series. Annual precipitation to be simulated is then conditioned using a lognormal random trial of the precipitation corresponding to the selected years. Then, the same \(k\)-NN algorithm is used to produce a bootstrap resample of weather data from the years included in the moving window. This strategy is recommended to generate multiple stochastic series from (already corrected) climate change projections, as it preserves long-term trends.

### Bibliography

*Water Resources Research*49 (11): 7205–20. https://doi.org/10.1002/wrcr.20528.