# Lesson 2 – Dealing with Uncertainty in Contaminant Transport Models

In Units 3 and 4 we briefly discussed the issue of uncertainty, but since doing so, we have not discussed uncertainty again (although we have looked at probabilistic simulations in several of the Examples and Exercises). This allowed us to focus on the details of simulating the various contaminant transport processes. However, now that we have done so, it is critical that we revisit this topic again.

This is because, as we noted in the beginning of the Course (Unit 3), due to the nature of the systems involved, dealing with uncertainty is especially important for environmental systems, and particularly so when trying to carry out contaminant transport modelling. In this Lesson we will reiterate the key points made in that earlier Unit, as well as discuss some additional topics that should be considered when modeling complex systems.

## Sources of Uncertainty in Contaminant Transport Models

In Unit 3, Lesson 8 we discussed some of the major sources of uncertainty in environmental models in general and contaminant transport models in particular. Having an appreciation for this is so important that it is worthwhile to reiterate these here.

Unlike engineered systems (for which we can often create prototypes and/or measure important controlling variables), for environmental systems, we almost always have a much poorer understanding of the system.  A major reason for this is that they often cannot be easily characterized (i.e., the relevant parameters cannot be easily measured). For example, if trying to predict contaminant transport through groundwater, it is not practical or feasible to completely characterize the subsurface environment and determine the relevant properties (instead, you may have only a handful of data points). This is complicated by the fact that the properties themselves (e.g., hydraulic conductivity, chemical environment) are spatially variable (we will discuss this point in more detail below). Moreover, relatively small differences in spatially variable environmental conditions (e.g., pH) can result in order of magnitude changes in parameters affecting mass transport (e.g., partition coefficients, solubilities).

Some parameters used to describe contaminant transport are quite difficult to measure at all. An example of this is the hydrodynamic dispersivity. Dispersivity is unusual in that unlike a property such as porosity it is not really meaningful to say that it has some value at a particular point in space. This is because its value is typically considered to be scale dependent (due to the concept of macrodispersion discussed in previous Units).  Hence, it can be thought of as a property of the entire system. This, of course, can make it difficult to quantify (doing so may require a large-scale field experiment that may not be feasible or practical).

In some cases, extremely important variables required for predicting performance may be almost entirely unavailable and/or need to be estimated using very poor information.  For contaminant transport models, particularly for existing waste sites, the classic example of this is the source term (or contaminant inventory). There may be very limited information available regarding what was disposed and when.  This imposes a very large uncertainty on any predictive models.

In some cases, not just the parameters, but the processes themselves may be poorly understood.  For example, if your model included a pond with many chemical constituents, and during the simulation the pond went dry due to evaporation, concentrations in the pond would be very high (and spatially variable over small distances) and hence the various chemical precipitation reactions taking place while the pond evaporates would be quite difficult to predict accurately.  As a result, the behavior of such a system would have lots of uncertainty.

In addition to these issues, it is often not practical or feasible to carry out experiments or evaluate alternative designs for environmental systems you are trying to model. In many cases, it is simply not possible to build and test alternative designs for a proposed system (such as a mine) – the system is simply too large to build and test realistic prototypes. The long time frames involved for some systems also makes this impossible. For example, when disposing radioactive waste, highly engineered waste packages are often used. Laboratory tests can be carried out on these for months or perhaps years to evaluate their performance.  But their design life is typically thousands of years, and it is very difficult to design experiments to extrapolate performance over such time frames.

Models with long time frames have many other difficulties.  For example, climatic factors (e.g., precipitation and temperature) will typically play an important role in environmental models. But predicting future climate for thousands (or even for tens or hundreds) of years in the future is difficult. And, of course, even for very short duration models (months or years), the stochastic nature of weather adds uncertainty to many environmental models.

All of these factors result in very large uncertainties (in some cases, several orders of magnitude) in many of the parameters, processes and events associated with contaminant transport models.

## Implications of the Large Degree of Uncertainty in Contaminant Transport Models

So now that we have reiterated the large degree of uncertainty associated with contaminant transport modeling, what are the implications of this when trying to build complex, realistic models?

The first implication is that it is critical to represent this uncertainty quantitatively using Monte Carlo simulation.  As we discussed in Unit 3, Lesson 8, dealing with uncertainty in a simpler way by selecting single values for each parameter, and labeling the results as “best estimates” or perhaps “worst case estimates” is highly problematic. Defending “best estimate” approaches is often very difficult. In a confrontational environment (e.g., demonstrating that a particular facility will meet certain regulatory criteria), “best estimate” analyses will typically evolve into “worst case” analyses.  However, “worst case” analyses can be extremely misleading. Worst case analyses of a system are likely to be grossly conservative and therefore completely unrealistic (i.e., by definition, picking lots of unlikely “pessimistic” values will almost certainly have an extremely low probability of actually representing the future behavior of the system). And it is not possible in a deterministic simulation to quantify how conservative a “worst case” simulation actually is (i.e., define its probability). Using a highly improbable estimate to guide policy-making (e.g., “is the design safe?”) is likely to result in very poor decisions.

The second implication of the large degree of uncertainty associated with contaminant transport modeling, and one that is commonly overlooked, is that the complexity and detail that you include in your model should be consistent with the amount of uncertainty in the system. In particular, when building a model and deciding whether to add detail to a certain process, you should not just ask if you can (e.g., can you use a more detailed equation or more discretization?), but you should ask if you should (does the amount uncertainty I have in this process justify a more detailed model?). Building a model that has details that cannot be justified is not just a waste of time and money,  but more importantly can result in a model that can be misleading.  The extreme detail can often act to mask the fact that there are huge uncertainties in the model, making the model seem more “correct” than it really is.

This point is reflected in the quote by George Box at the beginning of this Unit. When building models of complex systems, it is important to understand that all models are approximations of reality.  Your goal is not to create the most detailed model that you can; rather, it is to create a model that is consistent with the level of uncertainty present, and that provides you with useful information.

## Key Issues When Specifying Uncertain Parameters for Contaminant Transport Models

In order to represent this uncertainty in your contaminant transport models, it means that you will need to represent many of the parameters that we have been discussing in this Course (e.g., those defining the source term, flow rates, and transport parameters such as partition coefficients) as probability distributions.

Defining uncertain parameters and running probabilistic simulations was discussed in detail in Unit 11 and Unit 12 of the Basic Course (and prior to building models of complex uncertain systems you are strongly encouraged to refresh your memory on probabilistic simulation by reviewing these Units).

Below we discuss several key points that are particularly important to be aware of when specifying uncertain parameters for contaminant transport models.

• Correlation.  Many of the key mass transport parameters that we have discussed in previous Units (e.g., partition coefficients, solubilities) are not only likely to be highly uncertain (and hence should be specified using a probability distribution), but the distributions for these parameters may be correlated to the distributions of the same parameters for other species (i.e., the distribution for the partition coefficient for species X may be correlated to the distribution for the partition coefficient for species Y; moreover, the distribution for the partition coefficient for species X may be correlated to the distribution for the solubility for species X ).
We first discussed the concept of correlations between distributions in Unit 11, Lesson 9 of the Basic Course. Correlations among parameters such as these exist because they are influenced by the same underlying parameters (e.g., pH). Ignoring correlations, particularly if they are very strong, can lead to physically unrealistic simulations. For example, if the solubilities of two contaminants were positively correlated (e.g., due to a pH dependence), it would be physically inconsistent for one contaminant’s solubility to be sampled from the high end of its possible range while the other’s was sampled from the low end of its possible range. That is, if one is high, the other one should also be high. Ignoring the correlation could result in a situation (one high, one low) that is physically impossible. Hence, when defining probability distributions, it is critical that the analyst determine whether correlations need to be represented.
One way to express correlations is to directly specify correlation coefficients between various model parameters (these vary from -1 to 1; a value close to -1 indicates a strong negative correlation, while a value close to 1 indicates a strong positive correlation). In practice, however, assessing and quantifying correlations in this manner can be difficult. A more practical way of representing correlations is to explicitly model the cause of the dependency. That is, the analyst adds detail to the model such that the underlying functional relationship causing the correlation is directly represented. For example, one might be uncertain regarding the solubility of two contaminants, while knowing that the solubilities tend to be correlated. If the solubilities were both strongly dependent on pH, and the main source of the uncertainty in the solubilities was actually due to uncertainty in the pH, the two solubilities could be explicitly correlated by defining each solubility as an equation that was a function of the pH (with the pH being the underlying uncertain parameter).
• Correctly representing stochastic parameters. Some parameters in a contaminant transport model may be stochastic. As described in Unit 12 of the Basic Course, stochastic parameters are inherently random in time; their temporal behavior cannot be predicted precisely, but can be described statistically. Examples of stochastic parameters include various environmental flow rates. How this stochastic behavior is represented (e.g., by resampling the parameter through time) is a function of the time frame of the stochastic behavior you want to represent and the time frame of the simulation.
For example, let’s assume that we are simulating the long-term behavior of a radioactive waste site over thousands of years.  There is a river at the site.  There is no assumed long-term trend in how the flow rate will change over time, but it is certainly variable from day to day, month to month and year to year. You would likely not need to represent the daily or monthly fluctuations in the flow rate of the river, and would only be interested in longer term fluctuations (since your timestep may be 10 years or more). However, you still might want to treat it as a stochastic parameter; just not one that is sampled every day or month. Care would need to be taken, however, in how you define the distribution for the flow rate.  In particular, it would be inappropriate to specify the same distribution you would use if you were modeling monthly fluctuations.  Rather, you would define a distribution that represented the variability in the long-term average flow rate over the time period in which you are resampling it. For example, if you have a 10 year timestep, you could sample a different value every timestep (in the extreme case, of course, you would never resample the flow rate and keep it constant for the entire simulation).  Note, however, that the shape (e.g., standard deviation) of your distribution would be a function of how often it was sampled. As a general rule, the longer the resampling period, the smaller your standard deviation should be. This reflects the fact that, assuming that there is no trend, the variability/uncertainty in the annual flow rate from one year to the next is less than the variability/uncertainty in the monthly flow rate from one month to the next, which in turn is less than the variability/uncertainty in the daily flow rate from one day to the next.
• Differentiating uncertainty from spatial variability. In most systems that you will simulate, there will be spatial variability in many parameters.  We will spend an entire Lesson discussing this later in the Unit. With regard to our discussion on uncertainty, however, it is important to point out a common error that can occur when representing parameters that have spatial variability. In particular, it is very common for parameters to be both uncertain and spatially variable.  As an example, the solubility of a contaminant may vary spatially.  We may also be uncertain about the value of the solubility at a particular location.
A common mistake in such a situation would be to try to define a single probability distribution for the solubility that “lumps together” both the spatial variability of the solubility and the uncertainty in the solubility at a particular location. One of the (several) problems with such an approach is that it can be very misleading, and can lead to incorrect interpretations of the simulation results.  For example, if the solubility had been studied extensively, such that the value at any particular location had only a small amount of uncertainty, but the variability in solubility spatially was large, lumping those together would result in a misleadingly large amount of uncertainty in simulation results.
The correct way to handle such a situation would be to disaggregate the problem (by explicitly modeling each spatial location separately) and then define different probability distributions for the solubility at each location (with each distribution representing only the uncertainty in the solubility for that location).We will discuss how spatial variability in parameters like solubility can be represented in Lesson 5.