Optimized Hot Spot Analysis executes the Hot Spot Analysis (Getis-Ord Gi*) tool using parameters derived from characteristics of your input data. Similar to the way that the automatic setting on a digital camera will use lighting and subject versus ground readings to determine an appropriate aperture, shutter speed, and focus, the Optimized Hot Spot Analysis tool interrogates your data to obtain the settings that will yield optimal hot spot results. If, for example, the Input Features dataset contains incident point data, the tool will aggregate the incidents into weighted features. Using the distribution of the weighted features, the tool will identify an appropriate scale of analysis. The statistical significance reported in the Output Features will be automatically adjusted for multiple testing and spatial dependence using the False Discovery Rate (FDR) correction method.
Each of the decisions the tool makes in order to give you the best results possible is reported as messages during tool execution and an explanation for these decisions is documented below.
Just like your camera has a manual mode that allows you to override the automatic settings, the Hot Spot Analysis (Getis-Ord Gi*) tool gives you full control over all parameter options. Running the Optimized Hot Spot Analysis tool and noting the parameter settings it uses may help you refine the parameters you provide to the full control Hot Spot Analysis (Getis-Ord Gi*) tool.
The workflow for the Optimized Hot Spot Analysis tool includes the following components. The calculations and algorithms used within each of these components are described below.
Initial data assessment
In this component, the Input Features and the optional Analysis Field, Bounding Polygons Defining Where Incidents Are Possible, and Polygons For Aggregating Incidents Into Points are scrutinized to ensure there are sufficient features and adequate variation in the values to be analyzed. If the tool encounters records with corrupt or missing geometry, or if an Analysis Field is specified and null values are present, the associated records will be listed as bad records and excluded from analysis.
The Optimized Hot Spot Analysis tool uses the Getis-Ord Gi* (pronounced Gee Eye Star) statistic and, similar to many statistical methods, the results are not reliable when there are less than 30 features. If you provide polygon Input Features or point Input Features and an Analysis Field, you will need a minimum of 30 features to use this tool. The minimum number of Polygons For Aggregating Incidents Into Points is also 30. The feature layer representing Bounding Polygons Defining Where Incidents Are Possible may include one or more polygons.
The Gi* statistic also requires values to be associated with each feature it analyzes. When the Input Features you provide represent incident data (when you don't provide an Analysis Field), the tool will aggregate the incidents and the incident counts will serve as the values to be analyzed. After the aggregation process completes, there still must be a minimum of 30 features, so with incident data you will want to start with more than 30 features. The table below documents the minimum number of features for each Incident Data Aggregation Method:
Minimum Number of Incidents | Aggregation Method | Minimum Number of Features After Aggregation |
---|---|---|
60 | COUNT_INCIDENTS_WITHIN_FISHNET_POLYGONS, without specifying Bounding Polygons Defining Where Incidents Are Possible | 30 |
30 | COUNT_INCIDENTS_WITHIN_FISHNET_POLYGONS, when you do provide a feature class for the Bounding Polygons Defining Where Incidents Are Possible parameter | 30 |
30 | COUNT_INCIDENTS_WITHIN_AGGREGATION_POLYGONS | 30 |
60 | SNAP_NEARBY_INCIDENTS_TO_CREATE_WEIGHTED_POINTS | 30 |
The Gi* statistic was also designed for an Analysis Field with a variety of different values. The statistic is not appropriate for binary data, for example. The Optimized Hot Spot Analysis tool will check the Analysis Field to make sure that the values have at least some variation.
If you specify a path for the Density Surface, this component of the tool workflow will also check the raster analysis mask environment setting. If no raster analysis mask is set, it will construct a convex hull around the incident points to use for clipping the output Density Surface raster layer. The Density Surface parameter is only enabled when your Input Features are points and you have the ArcGIS Spatial Analyst extension. It is disabled for all but the SNAP_NEARBY_INCIDENTS_TO_CREATE_WEIGHTED_POINTS Incident Data Aggregation Method.
Locational outliers are features that are much farther away from neighboring features than the majority of features in the dataset. Think of an urban environment with large, densely populated cities in the center, and smaller, less densely populated cities at the periphery. If you computed the average nearest neighbor distance for these cities you would find that the result would be smaller if you excluded the peripheral locational outliers and focused only on the cities near the urban center. This is an example of how locational outliers can have a strong impact on spatial statistics such as Average Nearest Neighbor. Since the Optimized Hot Spot Analysis tool uses the average and the median nearest neighbor calculations for aggregation and also to identify an appropriate scale of analysis, the Initial Data Assessment component of the tool will also identify any locational outliers in the Input Features or Polygons For Aggregating Incidents Into Points and will report the number it encounters. To do this, the tool computes each feature's average nearest neighbor distance and evaluates the distribution of all of these distances. Features that are more than a three standard deviation distance away from their closest noncoincident neighbor are considered locational outliers.
Incident Aggregation
For incident data the next component in the workflow aggregates your data. There are three possible approaches based on the Incident Data Aggregation Method you select. The algorithms for each of these approaches are described below.
- COUNT_INCIDENTS_WITHIN_FISHNET_POLYGONS:
- Collapse coincident points yielding a single point at each unique location in the dataset, using the same method employed by the Collect Events tool.
- Compute both the average and median nearest neighbor distances on all of the unique location points, excluding locational outliers. The average nearest neighbor distance (ANN) is computed by summing the distance to each feature's nearest neighbor and dividing by the number of features (N). The median nearest neighbor distance (MNN) is computed by sorting the nearest neighbor distances smallest to largest and selecting the distance that falls in the middle of the sorted list.
- Set the initial cell size (CS) to the larger of either ANN or MNN.
- Adjust the cell size to account for coincident points. Smaller = MIN(ANN,MNN); Larger = MAX(ANN,MNN). Scalar = MAX((Larger/Smaller),2). The adjusted cell size becomes CS * Scalar.
- Construct a fishnet polygon mesh using the adjusted cell size and overlay the mesh with the incident points.
- Count the incidents in each polygon cell.
- When you provide Bounding Polygons Defining Where Incidents Are Possible, all polygon cells within the bounding polygons are retained. When you do not provide Bounding Polygons Defining Where Incidents Are Possible, polygon cells with zero incidents are removed.
- If the aggregation process results in less than 30 polygon cells or if the counts in all the polygon cells are identical, you will get a message indicating the Input Features you provided are not appropriate for the Incident Data Aggregation Method selected; otherwise, the aggregation component for this method completes successfully.
- COUNT_INCIDENTS_WITHIN_AGGREGATION_POLYGONS:
- For this Incident Data Aggregation Method, a Polygons For Aggregating Incidents Into Points feature layer is required. These aggregation polygons overlay the incident points.
- Count the incidents within each polygon.
- Ensure there is sufficient variation in the incident counts for analysis. If the aggregation process results in all polygons having the same number of incidents, you will get a message indicating the data is not appropriate for the Incident Data Aggregation Method you selected.
- SNAP_NEARBY_INCIDENTS_TO_CREATE_WEIGHTED_POINTS:
- Collapse coincident points yielding a single point at each unique location in the dataset, using the same method employed by the Collect Events tool. Count the number of unique location features (UL).
- Compute both the average and the median nearest neighbor distances on all of the unique location points, excluding locational outliers. The average nearest neighbor distance (ANN) is computed by summing the distance to each feature's nearest neighbor and dividing by the number of features (N). The median nearest neighbor distance (MNN) is computed by sorting the nearest neighbor distances smallest to largest and selecting the distance that falls in the middle of the sorted list.
- Set the initial snap distance (SD) to the smaller of either ANN or MNN.
- Adjust the snap distance to account for coincident points. Scalar = (UL/N) where N is the number of features in the Input Features layer. The adjusted snap distance becomes SD * Scalar.
- Integrate the incident points in three iterations first using the adjusted snap distance times 0.10, then using the adjusted snap distance times 0.25, and finally integrating with a snap distance equal to the fully adjusted snap distance. Performing the integrate step in three passes minimizes distortion of the original point locations.
- Collapse the snapped points yielding a single point at each location with a weight to indicate the number of incidents that were snapped together. This part of the aggregation process uses the Collect Events method.
- If the aggregation process results in less than 30 weighted points or if the counts for all of the points are identical, you will get a message indicating the Input Features you provided are not appropriate for the Incident Data Aggregation Method selected; otherwise, the aggregation component for this method completes successfully.
Scale of analysis
This next component of the Optimized Hot Spot Analysis workflow is applied to weighted features either because you provided Input Features with an Analysis Field or because the Incident Aggregation procedure has created weights from incident counts. The next step is to identify an appropriate scale of analysis. The ideal scale of analysis is a distance that matches the scale of the question you are asking (if you are looking for hot spots of a disease outbreak and know that the mosquito vector has a range of 10 miles, for example, using a 10-mile distance would be most appropriate). When you can't justify any specific distance to use for your scale of analysis, there are some strategies to help with this. The Optimized Hot Spot Analysis tool employs these strategies.
The first strategy tried is Incremental Spatial Autocorrelation. Whenever you see spatial clustering in the landscape, you are seeing evidence of underlying spatial processes at work. The Incremental Spatial Autocorrelation tool performs the Global Moran's I statistic for a series of increasing distances, measuring the intensity of spatial clustering for each distance. The intensity of clustering is determined by the z-score returned. Typically, as the distance increases, so does the z-score, indicating intensification of clustering. At some particular distance, however, the z-score generally peaks. Peaks reflect distances where the spatial processes promoting clustering are most pronounced. The Optimized Hot Spot Analysis tool looks for peak distances using Incremental Spatial Autocorrelation. If a peak distance is found, this distance becomes the scale for analysis. If multiple peak distances are found, the first peak distance is selected.
When no peak distance is found, Optimized Hot Spot Analysis examines the spatial distribution of the features and computes the average distance that would yield K neighbors for each feature. K is computed as 0.05 * N, where N is the number of features in the Input Features layer. K will be adjusted so that it is never smaller than three or larger than 30. If the average distance that would yield K neighbors exceeds one standard distance, the scale of analysis will be set to one standard distance; otherwise, it will reflect the K neighbor average distance.
The Incremental Spatial Autocorrelation step can take a long time to finish for large, dense datasets. Consequently, when a feature with 500 or more neighbors is encountered, the incremental analysis is skipped, and the average distance that would yield 30 neighbors is computed and used for the scale of analysis.
The distance reflecting the scale of analysis will be reported to the Results window and will be used to perform the hot spot analysis. If you provide a path for the Density Surface parameter, this optimal distance will also serve as the search radius with the Kernel Density tool. This distance corresponds to the Distance Band or Threshold Distance parameter used by the Hot Spot Analysis (Getis-Ord Gi*) tool.
Hot spot analysis
At this point in the Optimized Hot Spot Analysis workflow all of the checks and parameter settings have been made. The next step is to run the Getis-Ord Gi* statistic. Details about the mathematics for this statistic are outlined in How Hot Spot Analysis (Getis-Ord Gi*) works. Results from the Gi* statistic will be automatically corrected for multiple testing and spatial dependence using the False Discovery Rate (FDR) correction method. Messages to the Results window summarize the number of features identified as statistically significant hot or cold spots, after the FDR correction is applied.
Output
The last component of the Optimized Hot Spot Analysis tool is to create the Output Features and, if specified, the Density Surface raster layer. If the Input Features represent incident data requiring aggregation, the Output Features will reflect the aggregated weighted features (fishnet polygon cells, the aggregation polygons you provided for the Polygons For Aggregating Incidents Into Points parameter, or weighted points). Each feature will have a z-score, p-value, and Gi Bin result.
When specified, the Density Surface is created using the Kernel Density tool. The search radius for this tool is the same as the scale of analysis distance used for hot spot analysis. The default rendering is stretched values along a gray scale color ramp. If a raster analysis mask is specified in the environment settings, the output Density Surface will be clipped to the analysis mask. If the raster analysis mask isn't specified, the Density Surface will be clipped to a convex hull around the Input Features centroids.
Additional resources
Getis, A. and J.K. Ord. 1992. "The Analysis of Spatial Association by Use of Distance Statistics" in Geographical Analysis 24(3).
Ord, J.K. and A. Getis. 1995. "Local Spatial Autocorrelation Statistics: Distributional Issues and an Application" in Geographical Analysis 27(4).
The spatial statistics resource page has short videos, tutorials, web seminars, articles and a variety of other materials to help you get started with spatial statistics.