Sampling design is a critical part of any study involving modeling and estimation based on data that is sampled from natural resources or other phenomena occurring in the landscape. Statistical considerations related to sampling are part of a larger scenario involving theoretical knowledge, previously detected behavior and patterns of the phenomenon, costs, accessibility to sample sites, politics, and so forth. Thus, the sampling design algorithm should be flexible enough to accommodate external considerations in the design.
Currently, ArcGIS offers some methods to construct sampling designs:
- Simple random sampling: Sites are generated independently, using the Create Random Points tool. A similar outcome could be obtained by using the Create Random Raster tool and a probability cutoff value (note that the ArcGIS Spatial Analyst version of the Create Random Raster tool uses a uniform random number, whereas the Data Management toolbox's version of the Create Random Raster tool supports several different distributions). The method is simple and flexible, but the outcome of one realization may include areas where samples are clustered and other areas that are devoid of samples.
- Stratified random sampling: The study area is split into strata and random samples are generated within each stratum. Strata can be adjusted based on prior knowledge of the phenomenon (for example, concentric circles can be made larger as the distance from a point source emission increases), providing some spatial structure to the sample.
Other types of designs can be relatively easily generated using simple scripts or models:
- Systematic random sampling: An initial sample site is picked at random, and all other sites are selected so that they are located according to some regular pattern (for example, on the vertices of equilateral triangles, squares, hexagons, and so forth). The method is simple and provides designs that are spatially well balanced (well distributed in space).
- Clustered random sampling: The location for a group of sites is selected at random, and sites within each group are then located relatively close to one another. This can be done by generating randomly placed centers using the Minimum Allowed Distance of the Create Random Points tool and allocating additional samples within a specified distance from each center. This method is easy to implement in practice as many samples are collected from nearby locations (unlike a simple random sample pattern, where sample sites may occur anywhere in the study area).
These methods do not easily account for variations in the probability of a site to be selected (other than splitting the study area into strata, which usually requires manual inspection of the study site and good knowledge of the process under study). Also, not all of them guarantee that the sampling design will be spatially balanced (that is, that the design will sample the entire population, due to the inherent randomness of selecting a site to sample). Due to this, the Create Spatially Balanced Points tool exists within the Geostatistical Analyst toolbox. An explanation of how this tool works and the publications it is based on can be found here: How Create Spatially Balanced Points works.