Available with Geostatistical Analyst license.
All interpolation models are prediction methods, and the ultimate goal is to produce a surface of predicted values at all locations between the measured locations. Frequently, you also need to know how precise and reliable the predictions are, so Geostatistical Analyst offers several different output surface types to help you interpret the prediction surface while keeping in mind the inherent variability of the predictions. The following sections describe the different output surface types, and the table at the bottom shows which surface types are available for each interpolation method.
All output maps assume that you have chosen the correct interpolation method and chosen correct interpolation parameters. In practice, if the data does not meet the assumptions of the interpolation method or the wrong parameters are supplied, these surfaces may not correctly represent the true values of the data.
All interpolation methods (except indicator and probability kriging) are capable of creating prediction surfaces, and this is the default output of all interpolation methods. This surface displays the predicted value of the data at all locations between the measured locations.
Prediction standard error surface
The prediction standard error surface is a map of the standard errors of the predicted values at each location. Standard errors are the standard deviation of the estimated value at each location, and the larger the standard error, the lower the precision of the predicted value. Standard errors are most often used to create intervals that are likely to contain the true value at each predicted location.
The 68-95-99.7 rule
If the data follows a multivariate normal distribution, you can apply a simple rule of thumb to create confidence intervals for the true value at each predicted location. The rule states that 68 percent (approximately two-thirds) of the true values will fall within one standard error of the predicted value, 95 percent will fall within two standard errors, and 99.7 percent (nearly all) will fall within three standard errors. For example, if a location receives a predicted value of 100 with a standard error of 5, you can be 68 percent confident that the true value is between 95 and 105. Similarly, you can be 95 percent confident that the true value is between 90 and 110, and you can be nearly certain (99.7 percent confident) that the true value is between 85 and 115. To create confidence intervals with other percentages, you can look up critical values in Z-tables, which are widely available in statistical textbooks and on the Internet.
It is very difficult to verify that the data follows a multivariate normal distribution, and in practice, typically only univariate normality is verified. You can investigate univariate normality with the Normal QQ plot and Histogram exploratory tools.
If your data does not meet the assumptions for the 68-95-99.7 rule, or you are unsure whether your data meets the assumptions, a more conservative confidence interval can be created based on Chebyshev's inequality. This inequality states that for any distribution that has a finite mean and variance, at least (1-1/k2)·100 percent of the true values will fall within k standard errors of the predicted value, where k>1. Setting k=2, this inequality states that at least 75 percent of the true values will fall within two standard errors of the predicted value. Similarly with k=3, at least 88.9 percent of the true values will fall within three standard errors of the predicted value. Other values of k can be used to create intervals with different percentages.
How to interpret standard errors
Standard error values should be interpreted while keeping in mind the values and range of the input data. For example, if the input data values are all between 10,000 and 12,000, a standard error value of 100 would likely indicate high precision in the predictions because the standard error is much smaller than the values and the range of the input data. However, if the data values are between 50 and 200, the same standard error of 100 would indicate low precision because the variability of the predictions is as large as the values and range of the input data.
Probability maps are generally used when there is some critical value of interest, such as a national standard level of a pollutant. This critical value is called a threshold, and the output map will display the probability that this threshold value is exceeded or not exceeded. These maps are useful for seeing which areas are most or least likely to exceed or not exceed this critical threshold.
Quantile maps display a specified quantile of the prediction distribution at each location. Quantile maps are generally used when preparing for worst- or best-case scenarios. For example, instead of creating a map with predicted values, you can create a map of the 95th quantiles of the predicted values. In this case, only 5 percent of the true values will exceed the value of the quantile surface. Similarly, if you create a 10th quantile map, only 10 percent of the true values will be less than the value of the quantile surface.
Standard errors of indicators surface
An indicator variable is a binary variable that only takes the values 0 and 1. Indicator, probability, and disjunctive kriging all compute probability maps by reclassifying the input data to 0 or 1 based on a threshold value; values less than the threshold are reclassified to 0, and values greater than the threshold are reclassified to 1. After interpolating the indicator variable, the prediction map calculates the expected value of the indicator variable, and this expected value can be interpreted as the probability that the indicator variable equals one (in other words, that the threshold value is exceeded). Thus, the standard errors of indicators output map is a surface of the standard errors of the expected value of the indicator variable; in other words, it is the standard error of the probability that the threshold value is exceeded.
Because indicator variables cannot be normally distributed, you cannot use the 68-95-99.7 rule with indicator variables. To create confidence intervals for the probability that a threshold is exceeded, Chebyshev's inequality can be used.
Condition number surface
The condition number surface is an optional output for local polynomial interpolation, and the surface is used to determine the stability of the predicted value at each prediction location. Condition numbers are difficult to interpret literally, but the larger the condition number, the larger the instability of the predictions. In this case, stability means the amount that the predicted value will change for a small change in the input data or small changes in the interpolation parameters. The rule of thumb for condition numbers in local polynomial interpolation is that for first-order polynomials, condition numbers should not exceed 10. For second-order polynomials, condition numbers should not exceed 100, and for third-order polynomials, the condition number should not exceed 1,000. Polynomial orders higher than three are generally not recommended.
Which output surface types are available for each interpolation method?
In the following table, the symbol indicates the available output surface types for each interpolation method.
|Interpolation method||Predictions||Prediction standard errors||Quantile maps||Probability maps||Standard errors of indicators||Condition Number|
Empirical Bayesian kriging
Diffusion interpolation with barriers
Global polynomial interpolation
Kernel interpolation with barriers
Local polynomial interpolation
Radial basis functions
1 Requires assumption of multivariate normal distribution.
2 Requires assumption of pairwise bivariate normality.
3 Order of polynomial must be set to 1.
4 A spatial condition number threshold must be used.