How Find Similar works—Help | ArcGIS for Desktop

Conventional Find Similar method
Principal Component Analysis (PCA) Method

Find Similar is a tool used to score potential new sites against a known, well-performing site, called a master site.

Why do some stores do better than others? The old real estate axiom of "location, location, location" is usually the most important part of the answer. The Find Similar tool is based on the idea that the characteristics of a master site can be used to find similar sites elsewhere. The Find Similar tool allows you to score polygon data—for example, simple rings, drive times, and other forms of trade areas. The tool also lets you score potential sites (points) as the target layer and rings will be created around each target site location.

The master site can be based on your best location or a typical location. You can select a master site based on a store with a particular product mix or one that has the highest rate of same-store sales. You have to pick the master site candidate against which Find Similar will score. You can choose your master site by selecting a point on the map, entering an address, or selecting a feature from either a point or polygon layer.

Some examples of scored sites are a database of points added as a layer using the Store Setup tools; a database of points geocoded by latitude/longitude or address and set up using the Analysis Layer Setup wizard; rings, drive times, or trade areas created in Business Analyst; or any polygons added as a layer on the map and set up using the Analysis Layer Setup wizard.

Although it isn't required, you should compare similar-sized areas around the master and scored sites. For example, if you're using a five-minute drive time around your master site, you should create and use a five-minute drive time around the other features in the target layer.

There are two approaches for running the Find Similar tool: the Conventional Find Similar method and the Principal Component Analysis (PCA) method.

Conventional Find Similar method

The Conventional Find Similar method ranks trade areas by comparing values, up to five variables, of the master site to the scored sites. You will assign a +/- percentage by which you would like the sites to be scored according to the master site value. Sites are then assigned a score of 1–5 based on the number of variables that match the criteria you set.

Principal Component Analysis (PCA) Method

The analysis can also be run on points (stores) or polygons. When you run your analysis on a point feature class and select a radius around each point, the data is attributed to the circle it runs the analysis on the data contained in the radii. If you choose a polygon layer to run the analysis, the data contained in each polygon will be used to rank each polygon, and your output will be based on the boundaries of those polygons.

To run your analysis with customer data, you need to first append this data to some trade areas and run the analysis on that layer.

The Conventional Find Similar method compares the master site against the other features in the target layer based on variables you select. This method has a fundamental assumption that you know what variables are important in ranking sites based on similarity. That you can precisely identify the relevant variables is an assumption that might not hold in most cases. For instance, if block groups were used as the level of geography, deciding on the right variables is not easy; sometimes setting the range to +/- 60 percent for the chosen variables does not find a similar site.

The PCA method removes the burden of variable selection while still providing a ranking of the sites according to the level of similarity. You may want to score similarity using a predefined set of variables you choose or use all the variables provided.

The figure below illustrates how the variables or neighbors can be selected, where K is the number of neighbors to be found.

The PCA algorithm considers a set of variables for each site as a vector. It then considers a set of vectors for all potential sites and the major site and performs the PCA on it in the following sequence:

It builds a covariations matrix.
It finds eigenvectors and values for the covariations matrix.
Using Kaiser Criterion, it drops eigenvectors with eigenvalues less than 1.
These eigenvectors form subspace in the initial space.
Projections are calculated for all vectors to this subspace.
It standardizes the projected data to [0,1] interval.
It uses L2 distance (Euclidean) to choose K closest similar potential sites.

The resulting layer containing K potential sites closest to the major site will be color-coded according to the L2 distance from the major site.

Feedback on this topic?