A moving target
The director of market research for a company that develops, designs, and manages resort-style retirement communities has been tasked with identifying candidate locations for a new facility. He knows he needs to be creative. While it may have been sufficient in the United States ten years ago to build sprawling developments in the warmest parts of the country, people approaching retirement today are not as willing to relocate. Many want to stay connected to friends and family, remain close to existing doctors, continue to work, to enjoy local cultural and educational opportunities, and to be surrounded by people of all ages. Consequently, as a first pass, he decides to look for locations projected to have large numbers of senior citizens but very few existing options for residential retirement. He will then narrow these locations by ranking how similar each candidate is to the company's current most successful resort retirement community.
His workflow is summarized below.
What data is needed?
He will first model supply versus demand for retirement housing opportunities.
For the demand component of his model, he needs a variable representing potential retirement community residents. Since a new facility will not be open for a couple years, he obtains the projected 2019 age 55 and older population data, by ZIP Code, from Business Analyst.
The supply component of the model proves a bit more difficult. Retirement facilities range from private homes accommodating one or two people to whole villages housing as many as 100,000 residents. While he can easily get the number of businesses within each ZIP Code that are classified as retirement communities, retirement homes, independent living facilities, or senior citizen housing (SIC codes 805904, 805918, and 836114), his business data does not include information about the number of residents or the number of residential units associated with each facility.
He decides to use the number of employees associated with these facilities as a surrogate for retirement community size, at least until better information is available.
He also gets housing unit vacancy data from Business Analyst since locations with high vacancy rates generally reflect low demand for new housing.
Where is the demand for retirement housing highest in relation to supply?
While it may be tempting to calculate supply versus demand as a simple ratio—projected age 55+ population, divided by an estimate of retirement community housing resources (number of employees, for now)—this is problematic for at least three reasons:
- Division by zero: For ZIP Codes without any retirement community facilities, the denominator will be zero and the ratio will be undefined. If these ZIP Codes are removed from the analysis as a workaround, it will eliminate most of the ZIP Codes in the contiguous United States (see the map below) and will likely remove the very high-demand locations the director of market research is hoping to discover.
- Small numbers problem: Extreme ratios due to small numerators and small denominators can also be a problem because they are unstable. The best example of this is mortality rates. Suppose you have a community with only two people and one dies from cancer. The cancer mortality rate for this community would be a very high 50 percent, an outlier that could throw off subsequent analyses. Similarly, a ZIP Code with very few senior citizens and very few retirement housing resources will result in ratios that are unstable.
- Boundaries: In addition, looking at a ZIP Code in isolation can be misleading. If a ZIP Code has lots of residential retirement opportunities and very few senior citizens, we might say it has more supply than demand. But what if it is surrounded by ZIP Codes with lots of senior citizens and no residential retirement opportunities? We will get a better picture of supply and demand if we evaluate each ZIP Code within the context of its neighboring ZIP Codes.
To address the division by zero and small numbers problems, the director of market research creates a level of service variable (L). The underlying assumption for the level of service variable is one of equity. If a ZIP Code contains 8 percent of the Country's projected 55+ population, it should also contain 8 percent of the Country's retirement community resources.
How does he construct this variable? It's easy. He subtracts a supply ratio from a demand ratio:
The demand ratio is the projected age 55+ population in a ZIP Code, (di), divided by the projected age 55+ population in all ZIP Codes (D). Because the denominator is a count of all age 55 or older people in all ZIP Codes, it will never be zero or small (unstable).
Similarly, the supply ratio, , is the estimated number of retirement community employees in a ZIP Code (ri), divided by the total number of retirement community employees in all ZIP Codes (R).
Here are some examples of how this plays out:
- Supply is equal to demand: When supply equals demand, L is zero. Suppose a ZIP Code contains 5 percent of the Country's senior citizens and 5 percent of the Country's retirement community employees. When you subtract the supply ratio from the demand ratio (5 - 5 = 0), the result is zero.
- Demand exceeds supply: When demand for retirement housing is larger than supply, L is a positive number. Suppose a ZIP Code contains 10 percent of the Country's senior citizen population but only 2 percent of the Country's residential retirement employees. When you subtract the supply ratio from the demand ratio (10 - 2 = 8), the result is a positive number.
- Supply exceeds demand: When the supply of retirement housing opportunities is larger than demand, on the other hand, L is a negative number. For example, if you have a ZIP Code containing 3 percent of all senior citizens but 12 percent of all residential retirement employees, the difference (3 - 12 = -9) is a negative number.
To address the boundary issue, the director of market research uses Hot Spot Analysis on the level of service variable (L) which balances the surpluses or deficits within each ZIP Code with the surpluses and deficits for surrounding ZIP Codes. A spatial cluster of large positive values that is not balanced by nearby negative values will be identified as a hot spot for demand. Similarly, a spatial cluster of negative values that isn't balanced by nearby positive values will be identified as a cold spot for demand. The map below shows the results of this analysis.
Where are vacancy rates lowest? Which areas are projected to have the largest number of people age 55 and older?
Next, the director will take into account vacancy rates (2014) and the projected (2019) number of people, age 55 and older, across the country. The hot spot analysis maps for these variables are shown below.
Where are vacancies lowest and demand for retirement housing highest?
The director selects ZIP Codes within statistically significant hot or cold spot areas that meet all of these criteria:
- High demand and low supply of retirement housing opportunities
- Low housing unit vacancy rates
- Large projected age 55 and older populations
He finds that there are 898 ZIP Codes satisfying all three of these criteria. These become the candidate ZIP Codes for further analysis.
Which of the candidate ZIP Codes most resemble the current best performing retirement community?
To narrow the list of candidate ZIP Codes further, the director will rank them by how similar they are to one of the company's current best performing communities. It is one their newest communities, actually, but has experienced the fastest lease-up time in company history. It also consistently maintains strong occupancy rates, and because it has some of the highest rents across the portfolio, it has been very profitable.
The director will utilize tapestry variables to analyze the similarity of characteristics associated with each of the candidate ZIP Codes, to the characteristics for a 5-mile driving distance surrounding the best performing retirement community.
What are tapestry variables? Business Analyst classifies United States residential neighborhoods into 68 unique segments based on their socioeconomic and demographic qualities such as age, income, home value, occupation, education, and consumer spending behaviors. Each of these 68 segments is a tapestry variable.
Each tapestry variable has a name, and the names for the top four tapestries associated with the best performing retirement community are In Style, Professional Pride, Comfortable Empty Nesters, and Bright Young Professionals. The table below summarizes the characteristics for each of these tapestry categories.
5B In Style | 1B Professional Pride | 5A Comfortable Empty Nesters | 8C Bright Young Professionals | |
---|---|---|---|---|
Median Income | $66,000 | $127,000 | $68,000 | $50,000 |
Median Age | 41 | 41 | 47 | 32 |
Education | College Degree | College Degree | College Degree | College Degree |
Family Structure | Married couples without kids | Married couples | Married couples | Married couples |
Occupation Types | Professional, management | Professional, management | Professional, management | Professional, services |
Primary Race/Ethnicity | White | White | White | White |
Consumer Behaviors |
|
|
|
|
In addition to comparing the proportion of people in each of the top four tapestries, the director decides to include several other variables as well. Population density will provide some information about urban spatial structure. Including the family annual growth rate will add information about neighborhood character. Finally, adding an unemployment rate variable will tap into economic stability.
With his final list of variables, he will rank all 898 candidate ZIP Codes by their similarity to the area surrounding the benchmark community, near Knoxville, Tennessee.
The results of this analysis indicate that the top four, most similar ZIP Codes are concentrated in two locations: Houston and Atlanta. The rankings, one through four, are shown on the right side of the map below.
What are these top ranking ZIP Codes like?
Houston options: Two ZIP Codes in the Houston area were identified as having high demand, low vacancies, a large projected age 55+ population, and characteristics similar to the benchmark community. The first one is located southeast of Houston, along Galveston Bay. It encompasses several cities, including Taylor Lake Village, El Lago, and Seabrook.
Taylor Lake Village prides itself on having the lowest crime rate in Harrison County and a spirit of good will. It is ranked number five in Texas for livability. Seabrook is described as quaint, off the beaten path, and the area's best choice for saltwater living.
This ZIP Code is near museums, shopping malls, fine restaurants, hiking trails, and the Johnson Space Center. Close to both Houston and the beaches of Galveston, this location offers residents many options for employment, education, recreation, and entertainment.
The second Houston area ZIP Code is just northwest of the quiet and peaceful Memorial Villages.
This area has great access to downtown Houston and the Houston Medical Center. Located near a number of parks, it offers options for golf, hiking, and other outdoor recreation. This location also provides good access to shopping and fine dining.
Atlanta options: In the Atlanta area, two ZIP Codes were identified as having high demand, low vacancies, a large projected age 55+ population, and characteristics similar to the benchmark retirement community. The first of these is located between the cities of Roswell and Alpharetta, just north of Atlanta.
In 2009, Forbes Magazine named Alpharetta the number one place in America to move. In 2012, they listed Alpharetta among America's friendliest towns. Both Alpharetta and Roswell appear in a list of the top 50 safest Georgia cities, and they also appear in a list of the top 10 cities in Georgia to live. Home to the Verizon Amphitheater, shopping centers, beautiful parks, and excellent schools, Alpharetta is called The Technology City of the South. Described as fun and vibrant, Roswell has gotten a lot of recognition, including being named a top city in the United States to retire.
The second Atlanta area ZIP Code is located very close to the city of Smyrna, just 10 miles north of Atlanta.
Smyrna is known as the jonquil city because of the thousands of jonquil daffodils that flourish all around the City each spring. It appears on the list of Georgia's safest cities and is considered one of the best places to live in Georgia. With walking trails, parks, stores, restaurants, and a number of special events throughout the year (concerts, festivals, and holiday activities), there is plenty to do here.
Final steps
The analysis identified some great locations. The director will next research potential acquisition properties in each of the ZIP Codes identified, and will present these, along with a full cost-benefit analysis, to the CEO and General Counsel.
Workflow using ArcMap
Create the demand versus supply level of service variable.
- If you haven't done so already, download and unzip the data packages at the top of this case study.
- Open ArcMap. When it opens, use Catalog to navigate to the unzipped data and drag the layer package (ZipCodeAndTargetArea.lpk) into ArcMap. Once you have the data, you may complete the workflow by using the model tools in the workflowmodels.tbx toolbox, or by following the steps below (beginning with step 3). If you elect to use the model tools, run them in the order shown below. Also, be sure to create a folder for your output data and to modify the output paths each time you run one of the model tools.
After you've run the final model tool, open the table associated with the top four ZIP Codes layer. Notice that the first record in the table is the best performing benchmark community, located near Knoxville, Tennessee. The next records in the table, in order from most to least similar, are the ranked candidates. Click each record in the table to select the candidates one by one, confirming that two are located near Houston and two are located near Atlanta. If you want additional insight into each component of this analysis, you can repeat the workflow using the steps below.
- The first step in the analysis is to determine the total number of projected age 55 and older people, and the total number of retirement community employees in all of the ZIP Codes. Right-click on the Zip Code Data layer and select Open Attribute Table so you can see the variables you have available.
- You will begin by constructing a supply versus demand level of service variable. Demand will be based on the Projected 55+ Population variable. Right-click on the field header for that variable and select Statistics. A graphic showing basic statistics (mean, median, Standard Deviation, and so on) as well as a distribution of the values is displayed.
- Copy the Sum (95005265) so you can use this value to calculate the demand ratios later in this workflow. (To copy the value, drag your cursor over it so it becomes highlighted. Right-click and select Copy.)
- You will create a new field for the demand ratio values. Find and open the Add Field tool and run it with the following parameter settings:
- Input Table: Zip Code Data
- Field Name: DemandRatio
- Field Type: FLOAT
- Calculate the demand ratio values to be the projected 55+ population for each ZIP Code, divided by the projected 55+ population for all ZIP Codes; to do this, find and open the Calculate Field tool and run it with the following parameter settings (you can paste the sum you copied above for the Expression):
- Input Table: Zip Code Data
- Field Name: DemandRatio
- Expression: [Proj55pPop]/ 95005265
- Your supply variable will be based on the number of retirement community employees (Number of Employees field). Right-click on the field header for that variable and select Statistics. Highlight and copy the value for the Sum (428693) so you can use it to calculate the supply ratios below.
- Create a new field for the supply ratio values using the Add Field tool and the following parameter settings:
- Input Table: Zip Code Data
- Field Name: SupplyRatio
- Field Type: FLOAT
- Calculate the supply ratio values to be the Number of Employees for each ZIP Code divided by the total number of employees for all ZIP Codes; to do this, use the Calculate Field tool with the following parameter settings:
- Input Table: Zip Code Data
- Field Name: SupplyRatio
- Expression: [EstNumEmp]/ 428693
- Create the level of service variable by subtracting the SupplyRatio from the DemandRatio. Similar to the steps above, begin by using Add Field to create a new field of type FLOAT called DemandVsSupply. Once the new field exists, use the Calculate Field tool to compute the difference: DemandRatio - SupplyRatio for all of the ZIP Codes.
This model encapsulates the steps above to create the demand versus supply level of service variable:
Create a hot spot map of demand for retirement housing opportunities.
- Find and open the Optimized Hot Spot Analysis tool.
- Set the parameters as follows and run the analysis.
- Input Features: Zip Code Data
- Output Features: the name of your output feature class such as DemandVsSupplyHotSpots
- Analysis Field: DemandVsSupply
- To help you see the results with so many ZIP Codes
- Add the U.S. States layer from <your ArcGIS Installation Folder>\DesktopXX\TemplateData, in TemplateData.gdb>USA>States.
- Clear the fill color for the States by clicking the square box below the states layer in the Table of Contents and setting the fill color to No Color.
- Remove the ZIP Code outlines by right-clicking on the hot spot result layer and selecting Properties. Click the Symbology tab. Click the Symbol column header and select Properties for All Symbols.
- Set the outline color to No Color and click OK to close the Layer Properties dialog box. Uncheck all but the demand versus supply hot spot layer and the U.S. States layer.
Create hot spot maps for vacancy rates and the projected age 55 and above population.
- Use the Optimized Hot Spot Analysis tool again. This time use the parameters below to create a hot spot map for 2014 housing unit vacancy rates:
- Input Features: Zip Code Data
- Output Features: the name of your output feature class, such as VacancyHotSpots
- Analysis Field: VacantRate14
- Move the U.S. States layer above the vacancy rate hot spot layer in the Table of Contents. Remove the ZIP Code outlines as you did above for the first hot spot map you created: right-click the hot spot result layer, select Properties, select the Symbology tab, click the Symbols column header, and select Properties for All Symbols. Set the outline color to No Color and click OK to exit the Layer Properties dialog box.
- Repeat the steps above to run Optimized Hot Spot Analysis on the projected age 55 and older population variable:
- Input Features: Zip Code Data
- Output Features: the name of your output feature class, such as Proj55pPopHotSpots
- Analysis Field: Proj55pPop
- Move the U.S. States layer above the hot spot maps in the Table of Contents and remove the ZIP Code outlines as you did before.
This model encapsulates the steps above to create all three hot spot maps:
Transfer the hot spot result fields to the ZIP Code features.
A hot spot is a statistically significant cluster of high values. Similarly, a cold spot is a statistically significant cluster of low values. The Optimized Hot Spot Analysis tool calculates a z-score and a bin field for each ZIP Code (for every feature in the Input Features). The bin value determines if the intensity of the clustering is statistically significant or not. The associated confidence level indicates how certain you can be that the observed clustering is real—that something other than random spatial processes are creating and promoting the statistically significant clustering you see in your data.
Bin Value | Interpretation | |
---|---|---|
-3 | Statistically significant clustering of low values; highest confidence level (99 percent) | Cold spot |
-2 | Statistically significant clustering of low values; high confidence (95 percent) | Cold spot |
-1 | Statistically significant clustering of low values; lowest confidence (90 percent) | Cold spot |
0 | These features do not exhibit statistically significant clustering of high or low values | |
1 | Statistically significant clustering of high values; lowest confidence (90 percent) | Hot Spot |
2 | Statistically significant clustering of high values; high confidence (95 percent) | Hot spot |
3 | Statistically significant clustering of high values; highest confidence (99 percent) | Hot spot |
You will use the bin values from each of your hot spot maps above in subsequent analyses, so in the next steps you will append those values to the Zip Code Data layer.
You will use the Join Field tool to bring the bin field over and then the Alter Field tool to change the bin field name. Repeat the steps three times, once for each of the hot spot results layers.
- Transfer and rename the demand versus supply hot spot bin field:
- Join Field
- Input Table: Zip Code Data
- Input Join Field: OBJECTID
- Join Table: DemandVsSupplyHotSpots
- Output Join Field: SOURCE_ID
- Join Fields: Gi_Bin
- Alter Field
- Input Table: Zip Code Data
- Field Name: Gi_Bin
- New Field Name: DemandBin
- New Field Alias: Demand vs Supply Bin
- Join Field
- Transfer and rename the vacancy rate hot spot bin field:
- Join Field
- Input Table: Zip Code Data
- Input Join Field: OBJECTID
- Join Table: VacancyHotSpots
- Output Join Field: SOURCE_ID
- Join Fields: Gi_Bin
- Alter Field
- Input Table: Zip Code Data
- Field Name: Gi_Bin
- New Field Name: VacancyBin
- New Field Alias: Vacancy Bin
- Join Field
- Transfer and rename the projected age 55 plus population bin field:
- Join Field
- Input Table: Zip Code Data
- Input Join Field: OBJECTID
- Join Table: Proj55pPopHotSpots
- Output Join Field: SOURCE_ID
- Join Fields: Gi_Bin
- Alter Field
- Input Table: Zip Code Data
- Field Name: Gi_Bin
- New Field Name: pPop55Bin
- New Field Alias: Projected 55+ pop Bin
- Join Field
This model encapsulates the steps above to transfer the hot spot analysis results fields.
Select the ZIP Codes with low vacancy, high demand, and large projected age 55 plus populations.
Based on the hot spot bin fields you just transferred to the ZIP Code data layer, use the Select Layer By Attribute tool to identify the ZIP Codes that are inside statistically significant hot or cold spot areas, and also meet all of these criteria:
High demand and low supply of retirement housing opportunities | DemandBin > 0 |
Low housing unit vacancy rates | VacancyBin < 0 |
Large projected age 55 and older populations | pPop55Bin > 0 |
- Find and open the Select Layer By Attribute tool.
- Run the tool with the following parameters:
- Layer Name or Table View: Zip Code Data
- Selection Type: NEW_SELECTION
- Expression: DemandBin > 0 AND VacancyBin < 0 AND pPop55Bin > 0
- Find and open the Copy Features tool and run it with the following parameters:
- Input Features: Zip Code Data
- Output Feature Class: the name of your output feature class, such as Candidates
- After the candidates layer is created, clear the selected ZIP Code features.
Narrow the candidate ZIP Codes down to only four best development options.
All of the candidate ZIP Codes are associated with statistically significant high-demand, low-vacancy rates, and high-projected age 55+ populations. To find the best ZIP Codes to consider for future retirement community development, use the Similarity Search tool to rank these 898 candidates by how similar they are to the current best performing community.
For this workflow, we have already identified the top tapestry variables for a 5-mile driving distance surrounding the best performing benchmark community. We did this by using the Enrich Layer tool in ArcGIS Online to obtain the count values for all 68 tapestry variables within the area surrounding the benchmark community. Looking at each tapestry, we determined which ones had the largest counts. The top four tapestries were In Style, Professional Pride, Comfortable Empty Nesters, and Bright Young Professionals.
We created percentages for each of these tapestries by dividing the tapestry counts by the tapestry base variable. We also used the Enrich Layer tool to obtain population density, family annual growth rate, and unemployment data for the area surrounding the benchmark. Once we had this data for the benchmark community, we got this same data for the 898 candidate ZIP Codes. If you want to run through this analysis yourself, it is included in the ArcGIS Online workflow.
The 5-mile drive distance area, tapestry variables, and other key demographic data are in the Target Area Data layer included with the ZipCodeAndTargetArea.lpk layer package.
- Right-click the layer called Target Area Data and select Open Attribute Table. Notice the fields available for your analysis.
- If the table isn't already open, right-click the Zip Code Data layer and select Open Attribute Table. Notice that the same fields are available there.
-
Find and open the Similarity Search tool so you can rank the candidate ZIP Codes by their similarity to the benchmark community (Target Area Data layer). To do this, you will run the tool with the following parameters:
- Input Features To Match: Target Area Data
- Candidate Features: Candidates
- Output Features: the name of your output feature class, such as Top4ZipCodes
- Most or Least Similar: MOST_SIMILAR
- Match Method: ATTRIBUTE_VALUES
- Number of Results: 4
- Attributes of Interest: POPDENS_CY; FAMGRW10CY; pSeg5B; pSeg1B; pSeg5A; pSeg8C; UNEMPRT_CY
- Open the table for the top four ZIP Codes result layer. Notice that the first record in the table is the best performing benchmark community, located near Knoxville, Tennessee. The next records in the table, in order from most to least similar, are the ranked candidates. Click each record to select the candidates one by one, confirming that two are located near Houston and two are located near Atlanta.
The model below encapsulates the steps above to select the candidate ZIP Codes and identify the top four ZIP Codes based on similarity to the best performing community:
Your analysis identified some great locations. You would next research potential acquisition properties in each of the ZIP Codes identified and prepare a report that includes a full cost benefit analysis for the stakeholders in your organization.
Workflow using ArcGIS Pro
Create the demand versus supply level of service variable.
- If you haven't done so already, download and unzip the data packages at the top of this case study. Open ArcGIS Pro and browse to the RetirementResortPKG.ppkx project package. If the Tasks pane isn't open or isn't populated, click the View tab and click Tasks.
- You can either follow the steps below, or run the shared analysis tasks, top to bottom, from the Tasks pane. If you choose to run the analyses using the tasks, first create a folder for your output results. As you run each task, be sure to write the output to your own local folder. Double-click each task and fill out the parameters as instructed. Instructions for filing out the tool parameters can be found in the top of the Tasks pane. Instructions for launching each task step appear at the bottom of the Tasks pane.
- The demand versus supply level of service variable has demand ratio and supply ratio components. To calculate these ratios, you will need to know the total number of projected age 55+ people and the total number of retirement facility employees in all ZIP Codes. Use the Summary Statistics tool to obtain these totals.
- Use the search box at the top of the Geoprocessing pane to find and open the Summary Statistics tool. Use the back arrow on the tool UI to return to the search box after you run a geoprocessing tool.
- Fill out the tool parameters as follows:
- Input Table: Zip Code Data
- Output Table: the name of your output table, such as FieldTotals
- Statistics Field(s):
- Projected 55+ population : SUM
- Number of Employees: SUM
- When the tool completes, you will see a new table called FieldTotals in your Contents. Right-click the table and select Open. The totals for the projected age 55 and above population (SUM_PROJ55PPOP) and number of employees (SUM_ESTNUMEMP) are fields in the table. Write these values down (95005265 and 428693) so you can use them to calculate the ratios below.
- Add a new field for the demand versus supply values. To do this, click the back arrow on the Geoprocessing pane to get access to the tool search box. Find the Add Field tool and use the parameters shown below to run it:
- Input Table: Zip Code Data
- Field Name: DemandVsSupply
- Field Type: Float
- Calculate the demand versus supply level of service values. Use the field totals you obtained above. Click the back arrow on the Geoprocessing pane to get access to the search box. Find and run the Calculate Field tool with the following parameters:
- Input Table: Zip Code Data
- Field Name: DemandVsSupply
- Expression: DemandVsSupply = (!Proj55pPop! / 95005265) - (!EstNumEmp! / 428693)
Create hot spot maps.
You will create three hot spot maps. The first will show you locations with statistically significant clustering of high demand for retirement housing (the hot spots). The second will show you where the lowest vacancy rates cluster spatially (the statistically significant cold spots). The last map will show you where large numbers of age 55 and older populations are projected to be in the next few years (the statistically significant hot spots).
- Find and select the Optimized Hot Spot Analysis tool.
- Create a hot spot map of demand versus supply. Run Optimized Hot Spot Analysis with the following parameters:
- Input Features: Zip Code Data
- Output Features: the name of your output feature class, such as DemandVsSupplyHotSpots
- Analysis Field: DemandVsSuppy
- To help you see the spatial patterns more easily, you may want to check on the States layer and remove the ZIP Code outlines and do the following:
- Check on the State layer and move it above the hot spot layer in the Contents pane.
- Right-click the DemandVsSupplyHotSpots layer and select Symbology.
- Click the More drop-down menu, select Symbols, and click Format all symbols.
- Click the Properties tab and set the Outline color to No Color. Click Apply.
- Uncheck all but the hot spot and states layers.
- Close the Symbology pane by clicking the x in the upper right corner.
- Create a hot spot map of vacancy rates. Run Optimized Hot Spot Analysis again. This time, use the following parameters:
- Input Features: Zip Code Data
- Output Features: the name of your output feature class, such as VacancyHotSpots
- Analysis Field: VacantRate14
- Move the States layer to the top of the Contents pane and clear the ZIP Code outlines, as you did above.
- Create a hot spot map of projected age 55 and older populations. Run Optimized Hot Spot Analysis a third time. This time, use the following parameters:
- Input Features: Zip Code Data
- Output Features: the name of your output feature class, such as Proj55pPopHotSpots
- Analysis Field: Projected 55+ population
- Again, move the States layer to the top of the Contents pane and clear the ZIP Code outlines, as you did above.
Select candidate ZIP Codes based on the hot spot analysis result fields.
A hot spot is a statistically significant cluster of high values. Similarly, a cold spot is a statistically significant cluster of low values. The Optimized Hot Spot Analysis tool calculates a z-score and a bin field for each ZIP Code (for every feature in the Input Features). The bin value determines if the intensity of the clustering is statistically significant or not. The associated confidence level indicates how certain you can be that the observed clustering is real—that something other than random spatial processes are creating and promoting the statistically significant clustering you see in your data.
Bin Value | Interpretation | |
---|---|---|
-3 | Statistically significant clustering of low values; highest confidence level (99 percent) | Cold spot |
-2 | Statistically significant clustering of low values; high confidence (95 percent) | Cold spot |
-1 | Statistically significant clustering of low values; lowest confidence (90 percent) | Cold spot |
0 | These features do not exhibit statistically significant clustering of high or low values | |
1 | Statistically significant clustering of high values; lowest confidence (90 percent) | Hot Spot |
2 | Statistically significant clustering of high values; high confidence (95 percent) | Hot spot |
3 | Statistically significant clustering of high values; highest confidence (99 percent) | Hot spot |
Use the bin values from each of your hot spot maps above in subsequent analyses, so in the next steps you will append those values to the Zip Code Data layer.
The Join Field tool allows you to copy the bin field from each hot spot map into the ZIP Code layer, and the Alter Field tool allows you to change the bin field name. You will repeat the steps three times, once for each of the hot spot results layers.
- Transfer and rename the demand versus supply hot spot bin field:
- Join Field
- Input Table: Zip Code Data
- Input Join Field: OBJECTID
- Join Table: DemandVsSupplyHotSpots
- Output Join Field: SOURCE_ID
- Join Fields: Gi_Bin
- Alter Field
- Input Table: Zip Code Data
- Field Name: Gi_Bin
- New Field Name: DemandBin
- New Field Alias: Demand vs Supply Bin
- Join Field
- Transfer and rename the vacancy rate hot spot bin field:
- Join Field
- Input Table: Zip Code Data
- Input Join Field: OBJECTID
- Join Table: VacancyHotSpots
- Output Join Field: SOURCE_ID
- Join Fields: Gi_Bin
- Alter Field
- Input Table: Zip Code Data
- Field Name: Gi_Bin
- New Field Name: VacancyBin
- New Field Alias: Vacancy Bin
- Join Field
- Transfer and rename the projected age 55 plus population bin field:
- Join Field
- Input Table: Zip Code Data
- Input Join Field: OBJECTID
- Join Table: Proj55pPopHotSpots
- Output Join Field: SOURCE_ID
- Join Fields: Gi_Bin
- Alter Field
- Input Table: Zip Code Data
- Field Name: Gi_Bin
- New Field Name: pPop55Bin
- New Field Alias: Projected 55+ pop Bin
- Join Field
Select the ZIP Codes with low vacancy, high demand, and large projected age 55 and older populations.
Based on the hot spot bin fields you just transferred to the zip code data layer, you will use the Select Layer By Attribute tool to identify the ZIP Codes that are inside statistically significant hot or cold spot areas and also meet all of these criteria:
High demand and low supply of retirement housing opportunities | Demand vs. Supply Bin is Greater Than 0 |
Low housing unit vacancy rates | Vacancy Bin is Less Than 0 |
Large projected age 55 and older populations | Projected 55+ pop Bin is Greater Than 0 |
- In the Geoprocessing pane, search for and select the Select Layer By Attribute tool.
- Run the tool with the following parameters:
- Layer Name or Table View: Zip Code Data
- Selection Type: New selection
- Expression: Demand vs Supply Bin is Greater Than 0 And Vacancy Bin is Less Than 0 And Projected 55+ pop Bin is Greater Than 0
- In the Geoprocessing pane, search for and select the Copy Features tool and run it with the following parameters:
- Input Features: Zip Code Data
- Output Feature Class: the name of your output feature class, such as Candidates
- After the candidates layer is created, clear the selected ZIP Code features.
Narrow the candidate ZIP Codes down to the four best development options.
All of the candidate ZIP Codes are associated with statistically significant high demand, low vacancy rates, and high projected age 55+ populations. To find the best ZIP Codes to consider for future retirement community development, use the Similarity Search tool to rank these 898 candidates by how similar they are to the current best performing community.
For this workflow, we have already identified the top tapestry variables for a 5-mile driving distance surrounding the best performing benchmark community. We did this by using the Enrich Layer tool in ArcGIS Online to obtain the count values for all 68 tapestry variables within the area surrounding the benchmark community. Looking at each tapestry, we determined which ones had the largest counts. The top four tapestries were In Style, Professional Pride, Comfortable Empty Nesters, and Bright Young Professionals.
We created percentages for each of these tapestries by dividing the tapestry counts by the tapestry base variable. We also used the Enrich Layer tool to obtain population density, family annual growth rate, and unemployment data for the area surrounding the benchmark. Once we had this data for the benchmark community, we got this same data for the 898 candidate ZIP Codes. If you want to run through this analysis yourself, it is included in the ArcGIS Online workflow.
The 5-mile drive distance area, tapestry variables, and other key demographic data are in the Target Area Data layer included with the RetirementResortPKG.ppkx project package.
- Right-click the layer called Target Area Data in the Contents pane and select Attribute Table. Notice the fields available for your analysis.
- Right-click the Zip Code Data layer and select Attribute Table. Notice that the same fields are available there.
- In the Geoprocessing pane, search for and click the Similarity Search tool so you can rank the candidate ZIP Codes by their similarity to the benchmark community. To do this, you will run the tool with the following parameters:
- Input Features To Match: Target Area Data
- Candidate Features: Candidates
- Output Features: the name of your output feature class, such as Top4ZipCodes
- Most or Least Similar: Most similar
- Match Method: Attribute values
- Number of Results: 4
- Attributes of Interest: % Seg 1B; % Seg 5A; % Seg 5B; % Seg 8C; 2010-2014 Growth Rate: Families; 2014 Population Density; 2014 Unemployment Rate
- Turn off all but the States and top four ZIP Codes layers. Move the States layer to the top of the Contents pane, if necessary. Right-click the top four ZIP Codes layer and select the Zoom To Layer option.
- Open the table for the top four ZIP Codes layer. (You may need to drag and resize the table to fit nicely at the bottom of the map.) Notice that the first record in the table is the best performing benchmark community, located near Knoxville, Tennessee. The next records in the table, in order from most to least similar, are the ranked candidates. Click each record in the table to select the candidates one by one, confirming that two are located near Houston and two are located near Atlanta. If you want, try different Basemaps (or no Basemaps) to help you see the top four ZIP Code locations.
Your analysis identified some great locations. You would next research potential acquisition properties in each of the ZIP Codes identified and prepare a report that includes a full cost benefit analysis for the stakeholders in your organization.
Workflow using ArcGIS Online
Get the data for your analysis.
- Sign in to your ArcGIS Online account.
- Search for LocatingRetirementCommunity. When it displays, select Add layer to new map.
- High demand for retirement housing opportunities
- Low supply of retirement housing
- Low housing unit vacancy rates
- Large projected age 55 and older populations
Two layers are added to the map. The first layer, called Target Communities, contains the current best performing retirement community near Knoxville, Tennessee. The second layer, called Candidates, contains the 898 ZIP Codes in the continental USA associated with statistically significant hot or cold spot areas for all of these criteria:
Hot spot analysis to identify these candidate ZIP Codes was done using ArcMap. If you want to run through this analysis yourself, it is included in both the ArcMap and ArcGIS Pro workflows.
In the workflow below, you will be using ArcGIS Online to create a 5-mile drive distance around the best performing community and obtaining tapestry and demographic data for the area. You will then obtain the same data for the candidate ZIP Codes. Finally, you will use the Find Similar Locations tool to identify the top four high demand, low vacancy, large projected age 55+ population ZIP Codes that are most similar to the area surrounding the best performing community.
Create a 5-mile drive time polygon around the best performing community.
- Click the Analysis button at the top of your map to open the Perform Analysis panel.
- Click Use Proximity and the Create Drive-Time Areas tool.
- Specify that you want to calculate drive-time areas around the TargetCommunity layer. Change the Measure parameter to Driving Distance and specify 5 Miles. Provide a name for your results, such as Target Area. To see how many credits will be consumed by this analysis, click Show Credits. Click Run Analysis.
Determine the top tapestry segments.
You will be looking for ZIP Codes that are similar to the area surrounding the best performing retirement community. You will take advantage of tapestry variables because they summarize so many aspects of a population, such as age, income, home value, occupation, education, and consumer spending behaviors. To identify the top tapestries within the 5-mile drive distance area, you will obtain and compare all 68 tapestry segments. You will also obtain the tapestry base variable so you can calculate percentages.
- Click the Analysis button at the top of your map.
- Select Data Enrichment. Click the Enrich Layer tool.
- Set the first parameter: the layer you want to enrich with new data is the target area you just created.
- Click the Select Variables button to bring up the data browser. Scroll right and select the Tapestry category. Click the Tapestry Households button.
- If you expand the first entry, you will see the Tapestry Household variables, Tapestry LifeMode variables, and Tapestry Urbanization variables. Check the Tapestry Household variables.
- Expand the second entry and click the Base for Tapestry Segmentation Households variable. You should have a total of 69 variables selected. Click Apply.
- Open the Data Browser again (click on the Select Variables button) to obtain population density, family annual growth rate, and unemployment rate data for the target area as well. In the Data Browser search window, type Population Density. Expand the second drop-down menu and select the current year population density variable. Click Back. In the search window, type Families: Annual Growth Rate. Select the rate covering the current and previous years. Click Back. Finally, scroll left in the Data Browser and click the Jobs category. Select the current year unemployment rate variable. You should now have a total of 72 variables. Click Apply.
- Provide a result name, such as Target Area Data, and click on Show credits to see how many credits this tool will consume. Click Run Analysis.
- Hover over the target area data layer in the Contents pane to reveal layer operations available. Click the Show Table button to open the layer table.
- Scroll right in the table to see the tapestry variables you obtained using the Enrich Layer tool. For each tapestry variable there is an associated count value. Determine the top four tapestries associated with the largest counts. For the 2015 Tapestry Household data, for example, these are the top four target area tapestry counts:
Tapestry Name | Count | Percentage |
---|---|---|
In Style (5B) | 5802 | 20.0 |
Bright Young Professionals (8C) | 3925 | 13.6 |
Professional Pride (1B) | 3285 | 11.4 |
Comfortable Empty Nesters | 2810 | 9.7 |
Convert the top four target area tapestry counts to percentages.
Rather than counts, you will want to compare tapestry percentages between each candidate ZIP Code and the target area.
- Create a new field to hold each of the four top tapestry percentages. In the table, click Table Options and select Add Field. You will do this four times. The field type will be Double (for some reason, it is easy to forget to change the Type from String to Double for this step). The names and aliases for each new field (using the 2015 variables as an example) are shown below:
- Scroll all the way to the right in the table to see the new fields you added. Click in the field name on the gear symbol and select Calculate.
- Create an expression that divides the count value by the base value, In Style (5B) Tapestry Households divided by Tapestry Household Base, for example. It is unfortunate that the variable names for the tapestry variables are so cryptic (hopefully, this has been corrected by the time you use this tutorial). If the variables appear in the Expression Builder as THHnn, you will need to hover over each field name in order to find the one you need for your expression. To create the percentages for the fields above, for example, you would use these expressions:
- Close the Target Area Data table.
Field Name | Alias | Field Type |
---|---|---|
pSeg5B | % Seg 5B 2015 | Double |
pSeg8C | % Seg 8C 2015 | Double |
pSeg1B | % Seg 1B 2015 | Double |
pSeg5A | % Seg 5A 2015 | Double |
Field Name in the Table | Variable Name | Expression |
---|---|---|
2015 In Style (5B) | THH17 | THH17 / THHBASE |
2015 Bright Young Professionals (8C) | THH35 | THH35 / THHBASE |
2015 Professional Pride (1B) | THH02 | THH02 / THHBASE |
2015 Comfortable Empty Nesters (5A) | THH16 | THH16 / THHBASE |
Obtain the same data for the candidate ZIP Codes.
- Click the Analysis button at the top of your map to open the Perform Analysis panel.
- Click Data Enrichment. Click Enrich Layer to open the tool.
- Set the first parameter to the Candidates layer. This is the layer you want to enrich with new data.
- Click the Select Variables button to bring up the data browser. Find and select the following eight variables:
- The top four tapestry variables you found for the target area. You will also need to get the base variable. For the 2015 example above, these were the following:
- 2015 Professional Pride (1B) Tapestry Households
- 2015 Comfortable Empty Nesters (5A) Tapestry Households
- 2015 In Style (5B) Tapestry Households
- 2015 Bright Young Professionals (8C) Tapestry Households
- 2015 Base for Tapestry Segmentation Households
- The current year population density, family annual growth rate, and unemployment rate. For the 2015 example above, these were the following:
- 2015 Population Density (Pop per Square Mile)
- 2010-2015 Families: Annual Growth Rate
- 2015 Unemployment Rate
- The top four tapestry variables you found for the target area. You will also need to get the base variable. For the 2015 example above, these were the following:
- Provide a result layer name, such as Candidate Data, uncheck the Use current map extent box, click Show Credits to get an estimate of credit usage for this tool, and click Run Analysis.
Create the tapestry variable percentages for the candidate ZIP Codes.
- Hover over the Candidate Data layer to reveal the available layer operations and select Show Table.
- Create a new field to hold each of the four tapestry percentages. In the table, click Table Options and select Add Field. Do this four times, each time changing the field type to Double and setting the name and alias parameters appropriately. For the 2015 example above, the new fields would be the following:
- Scroll all the way to the right in the table to see the new fields you added. Click the Tapestry Household Base field heading on the gear symbol and select Sort Ascending. Notice that some of the base counts are zero. If you try to create the percentages with these zero values, you will get a zero divide. Filter these zero (or very small) population ZIP Codes, excluding them from further analysis.
- To filter the zero base count ZIP Codes, hover over the Candidate Data layer, revealing available layer options, and select the Filter button. Create a filter expression that will keep records where the Tapestry Household Base variable is greater than zero:
- Click one of the new percentage fields you created and select Calculate.
- Create an expression that divides the count value by the base value, In Style (5B) Tapestry Households by Tapestry Household Base, for example. If the variables appear in the Expression Builder as THHnn, your expressions will look similar to this:
- Close the Candidate Data table.
Name | Alias | Field Type |
---|---|---|
pSeg5B | % Seg 5B 2015 | Double |
pSeg8C | % Seg 8C 2015 | Double |
pSeg1B | % Seg 1B 2015 | Double |
pSeg5A | % Seg 5A 2015 | Double |
Field Name in the Table | Variable Name | Expression |
---|---|---|
2015 In Style (5B) | THH17 | THH17 / THHBASE |
2015 Bright Young Professionals (8C) | THH35 | THH35 / THHBASE |
2015 Professional Pride (1B) | THH02 | THH02 / THHBASE |
2015 Comfortable Empty Nesters (5A) | THH16 | THH16 / THHBASE |
Rank the candidate ZIP Codes by their similarity to the target area.
- Hover over the Target Area Data layer to reveal the available layer operations. Click the Perform Analysis button.
- Click Find Locations and select the Find Similar Locations tool.
- The first parameter should be the Target Area Data layer. Skip the second parameter.
- Indicate you want to search for similar locations in the Candidate Data layer.
- Base similarity on the top four tapestries, population density, family growth rate, and unemployment rate, by checking those variables.
- Indicate you are interested in the top four results.
- Select a name for the result, such as Top 4 Most Similar Candidates. Click Run Analysis.
- Hover over the result layer to reveal the available layer operations. Click the More Options button and select Zoom to.
- Hover over the result layer again and select Show Table. The first record in the table is the best performing benchmark community, located near Knoxville, Tennessee. The next records in the table, in order from most to least similar, are the ranked candidates. Click each candidate record in the table, one by one, to see where they are located on the map. If you used the tapestry and other variables that came with the LocatingRetirementCommunity feature service, two of the top ZIP Codes will be located near Houston and two will be located near Atlanta.
Your analysis identified some great locations. You would next research potential acquisition properties in each of the ZIP Codes identified and prepare a report that includes a full cost benefit analysis for the stakeholders in your organization.
References
Brennan, Morgan. 2012. "America's Friendliest Towns." Forbes, December 19, 2012. http://www.forbes.com/sites/morganbrennan/2012/12/19/americas-friendliest-towns/
City of Roswell, Georgia. "City Awards and Achievements." http://www.roswellgov.com/discover-us/city-awards-achievements
City of Taylor Lake Village, Texas. http://www.taylorlakevillage.us
Grunewald, Will. 2014. "How Baby Boomers Are Changing Retirement Living." Washingtonian, March 13, 2014. http://www.washingtonian.com/articles/people/how-baby-boomers-are-changing-retirement-living/
Heneghan, Carolyn. 2014. "The 50 Safest Cities in Georgia." The SafeWise Report, Safewise.com, February 17, 2014. http://www.safewise.com/blog/50-safest-cities-georgia/
Holley, Peter; Lomax, John; and Shilcutt, Katharine. 2014. "Where to Live Now: The 25 Hottest Neighborhoods of 2014." Houstonia, April 3, 2014. http://www.houstoniamag.com/articles/2014/4/3/where-to-live-now-hottest-neighborhoods-april-2014
Kilborn, Peter T. 2009. "In Depth: America's 25 Best Places to Move." Forbes, July 7, 2009. http://www.forbes.com/2009/07/07/relocate-relocation-cities-lifestyle-real-estate-affordable-moving_slide.html
Northern, Amanda. 2015. "Here Are The Best Places To Live In Georgia… And Why." OnlyInYourState.com, June 19, 2015. http://www.onlyinyourstate.com/georgia/best-places-to-live-in-ga/
Photo Attribution
Robinwood Retirement Community | Used with permission from Resort Lifestyle Communities |
Lakeside houses on Clear Lake southeast of Houston | This photo was taken by Mike Fisher and is licensed under CC BY 2.0 via Wikimedia Commons— https://commons.wikimedia.org/wiki/File:Lakeside_Houses_in_Nassau_Bay_TX.jpg |
Seabrook, Texas | "Seabrook-tx-kemah-bridge." Licensed under CC BY-SA 2.5 via Wikimedia Commons— |
Memorial Park | Public domain license, https://commons.wikimedia.org/wiki/File:MemorialParkHouston.JPG |
Alpharetta City Hall | Public domain, https://commons.wikimedia.org/wiki/File:Alpharetta,_Georgia_City_Hall.jpg |
Chattahoochee Nature Center near Roswell, Georgia | This photo was taken by WWALS Watershed Coalition and is licensed under CC BY 2.0 via Wikimedia Commons— https://commons.wikimedia.org/wiki/File:PaddlingUpperPondCNCApr6_2013.jpg |
Smyrna Village Market | Public domain, https://commons.wikimedia.org/wiki/File:Smyrna_Georgia_Market_Village.JPG |
Covered bridge, near Smyrna | This photo was taken by Maksim Sundukov and is licensed under CC BY-SA 4.0 via Wikimedia Commons— https://commons.wikimedia.org/wiki/File:Concord_Covered_Bridge_2.jpg |
This case study demonstrates a number of analytical methods that can be adapted to many different application areas, allowing you to answer a variety of questions.
Method | Generic Question | Examples |
---|---|---|
Hot Spot Analysis of feature attributes | Where do high and low values cluster together? | Where are the statistically significant clusters of senior citizens, poverty, unemployment, wealth, beer drinkers, lead levels, or college graduates? |
Which features have the characteristics I'm interested in? | Which features have more than 50,000 people and median annual incomes larger than $50,000? Which hospitals have readmission rates larger than 10 percent? | |
Which features are most like my target feature? | Which stores are similar to my best performing store? Which crimes in the database are most like the current one I want to solve? | |
Identify drive time or drive distance areas | What are the dimensions, based on time or distance, of the area surrounding a location? | What does a 5-mile or 5-minute walking, biking, or driving distance around the hospital look like? What can I visit within a 2KM walk from my hotel? |
In addition, you used data-enrichment capabilities to get tapestry and demographic data. A wide variety of data including demographics, consumer spending, occupation, and landscape data can be obtained for common administrative boundaries (for example, Census tracts or ZIP Codes), but also for any point, line, or polygon geometries. You used data manipulation and management functionality including Add Field, Calculate Field, Join Field, and Alter Field.
A number of resources are available to help you learn more about the analyses demonstrated in this case study:
Learn more about hot spot analysis
Spatial Data Mining I: Essentials of Cluster Analysis
Spatial Data Mining II: A Deep Dive Into Space-Time Analysis