Workflow using ArcGIS Desktop, ArcMap
Combine neighborhoods to incorporate missing data
- If you haven't done so already, download and unzip the data package provided at the top of this workflow.
- Double-click the LanguageData.mpk map package to open it.
- Open the table for the AllNeighborhoods layer by right-clicking the layer in the Table of Contents and selecting Open Attribute Table. Notice there are several zero values for the HasData Mother Tongue Variables (HasDataMT) field. Data has been suppressed for the sixty neighborhoods with a zero for this field.
- Select the neighborhoods that have data. To do this, use the Search window to locate and open the Select Layer By Attribute tool.
- Run the tool with the following parameters:
- Layer Name or Table View: AllNeighborhoods
- Selection type: NEW_SELECTION
- Expression: HasDataMT> 0
- Find and open the Copy Features tool. Run Copy Features with the following parameters, creating a feature class of neighborhoods with data:
- Input Features: AllNeighborhoods
- Output Feature Class: the name of your output feature class such as DataNeighborhoods
- Use the Select Layer By Attribute tool again, to select neighborhoods with no data:
- Layer Name or Table View: AllNeighborhoods
- Selection type: NEW_SELECTION
- Expression: HasDataMT=0
- Use the Copy Features tool to create a feature class for the selected neighborhoods where data has been suppressed:
- Input Features: AllNeighborhoods
- Output Feature Class: the name of your output feature class such as NoDataNeighborhoods
- Clear the selected features by clicking the Clear Selected Features button on the toolbar. (Alternatively, run the Select Layer By Attribute tool again with the Selection type parameter set to CLEAR_SELECTION).
- Open the table for the DataNeighborhoods layer and sort the neighborhoods, smallest to largest, based on area (double-click the SHAPE_Area field to sort the values).
- Create a new field to hold the number of random points to generate in each neighborhood. To do this, find and open the Add Field tool and run it with the following parameters:
- Input Table: DataNeighborhoods
- Field Name: NumPoints
- Field Type: LONG
- Calculate the number of points per neighborhood. To do this, find and open the Calculate Field tool and run it with the following parameters:
- Input Table: DataNeighborhoods
- Field Name: NumPoints
- Expression: [SHAPE_Area] / 100000
- Find and open the Create Random Points tool and run it with the following parameters:
- Output Location: the path to a working folder or file geodatabase such as Working.gdb
- Output Point Feature Class: the name of your output feature class such as NeighborhoodPoints
- Constraining Feature Class: DataNeighborhoods
- Field: NumPoints
- Linear unit: 100 Meters
- Run the Add Field tool with the following parameters:
- Input Table: DataNeighborhoods
- Field Name: CID
- Field Type: LONG
- Use Calculate Field to set the CID values to match the neighborhood object ID values:
- Input Table: DataNeighborhoods
- Field Name: CID
- Expression: [OBJECTID]
- Find and open the Create Thiessen Polygons tool and run it with the following parameters:
- Input Features: NeighborhoodPoints
- Output Feature Class: the name of your output feature class such as ThiessenPolys
- Output Fields: ALL
- Find and open the Snap tool. Run the tool using the following parameters:
- Input Features: ThiessenPolys
- Snap Environment:
- Features: NoDataNeighborhoods
- Type: EDGE
- Distance: 200 Meters
- Find and open the Dissolve tool. Run the tool using the following parameters (be sure to uncheck the Create multipart features parameter):
- Input Features: ThiessenPolys
- Output Feature Class: the name of your output feature class such as OverlayAreas
- Dissolve Field(s): CID
- Create multipart features: No
- Find and open the Intersect tool. Run it using the following parameters:
- Input Features: OverlayAreas; NoDataNeighborhoods
- Output Feature Class: the name of your output feature class such as Pieces
- Join Attributes: ALL
- Find and open the Merge tool and run it with the following parameters (remove all but the CID field using the X button):
- Input Datasets: Pieces; DataNeighborhoods
- Output Dataset: the name of your output feature class such as AllThePieces
- Field Map: CID
- Run the Dissolve tool using the following parameters:
- Input Features: AllThePieces
- Output Feature Class: the name of your output feature class such as DissolvedPieces
- Dissolve_Fields: CID
- Create multipart features: No
You will create two working layers. One will contain all the neighborhoods that have data, and the other will contain all of the neighborhoods with suppressed data.
You will now generate random points -- lots of them -- inside the neighborhoods with data. These become seeds for creating Thiessen polygons covering all of the neighborhoods in the study area. You will then dissolve the Thiessen polygons by neighborhood ID to create larger areas that you can use to carve up, into pieces, the neighborhoods without data. These pieces will be distributed and aggregated among nearby neighborhoods that do have data. The steps are shown graphically below using only five neighborhood features for simplicity.
Begin by calculating the number of random points to create within each neighborhood that has data.
Notice that the smallest area is 584706.0338. If you divide all of the SHAPE_Area values by 100,000 and round to an Integer value, the smallest area will have 6 points. This should be good for your purposes. Your goal is to make sure all neighborhoods get at least a few points, but you don't want so many points that the Create Thiessen Polygons tool takes forever to finish.
You will now create a new feature class of random points. You will use the NumPoints field you created above to indicate the number of random points to generate in each neighborhood polygon. You will set the minimum spacing between random points to be 100 meters. The idea here is to set the spacing to be the largest value possible that doesn't decrease the number of points created. For fun, run the Create Random Points tool several times using larger and larger values for the Minimum Allowed Distance, Linear unit parameter to see how large you can make this value before you begin to see messages indicating not all points could be created.
Each random point is given a CID value equal to the object or feature ID of its associated neighborhood. In other words, if a point is generated inside the neighborhood whose OBJECTID is 13, that point will be assigned a CID value of 13. The CID values are carried through each step in the aggregation workflow below. In the final step, you will need the CID field to also be in the DataNeighborhoods layer. The next steps add a new CID field to the DataNeighborhoods layer and calculate it to be equal to the object ID, for use later in the workflow.
With that bit of housekeeping taken care of, you are ready to proceed by creating a Thiessen polygon for each random point.
The result is a feature class with many Thiessen Polygons.
The Thiessen polygons will be used to carve up the NoDataNeighborhoods. To help reduce the number of sliver polygons created during the carving process, you will snap the Thiessen polygons to match the edges of the NoDataNeighborhoods.
Once the small Thiessen polygons have been snapped to the NoDataNeighborhoods, you will dissolve them into larger overlay areas using the CID field carried forward from the random points.
If you set the fill color for the OverlayAreas layer to clear, and zoom into some of the neighborhoods with suppressed data, you can see how these OverlayAreas will carve those NoDatNeighborhoods into pieces so they can be distributed among the neighborhoods that do have data.
You will use the Intersect tool to carve up the NoDataNeighborhoods.
You will merge the Pieces features with the DataNeighborhoods features, then use Dissolve to aggregate each no-data piece to a neighborhood that does have data. The number of neighborhoods after aggregation should match the number of polygons in your DataNeighborhoods feature class (328). If you have more polygons after you run the dissolve procedure, you will look for sliver polygons and eliminate them in order to get your final neighborhood geometry.
Remove sliver polygons
- Open the table for the DissolvedPieces feature class and determine the number of neighborhoods it contains (the neighborhood count is the number of records in the table). If the number does not match the number of neighborhoods in the DataNeighborhoods feature class (328), you have sliver polygons that need to be removed.
- Determine the area for the smallest DataNeighborhood by opening the DataNeighborhood table and double-clicking on the SHAPE_Area field to sort the areas smallest to largest. Notice that the smallest area is 584706.0338. Any polygons smaller than this must be sliver polygons.
- Select the sliver polygons using the Select Layer By Attribute tool with the following parameters:
- Layer Name or Table View: DissolvedPieces
- Selection Type: NEW_SELECTION
- Expression: SHAPE_Area < 584706
- Remove the slivers using the Eliminate tool with the following parameters:
- Input Layer: DissolvedPieces
- Output Feature Class: the name of your output feature class such as Neighborhoods
The resulting number of neighborhoods in the Neighborhoods feature class should now match the number of DataNeighborhoods (328). You will use this neighborhood geometry for the remaining analyses.
Compute the Linguistic Diversity Index
You now have the final neighborhood geometry but still need to transfer all of the mother tongue language variables to it. Once you have the variables, you will compute the diversity index for each neighborhood.
- Find and open the Join Field tool and run it with the following parameters:
- Input Table: Neighborhoods
- Input Join Field: CID
- Join Table: DataNeighborhoods
- Output Join Field: CID
- Join Fields: NAME;NUMBER;ECYMTTOT; ECYMTENGL; ECYMTFREN; ECYMTITAL; ECYMTGERM; ECYMTPUNJ; ECYMTCANT; ECYMTSPAN; ECYMTARAB; ECYMTTAGA; ECYMTPORT; ECYMTPOLI; ECYMTMAND; ECYMTCHIO; ECYMTURDU; ECYMTVIET; ECYMTUKRA; ECYMTPERS; ECYMTRUSS; ECYMTDUTC; ECYMTKORE; ECYMTGREE; ECYMTTAMI; ECYMTGUJA; ECYMTROMA; ECYMTHIND; ECYMTHUNG; ECYMTCROA; ECYMTCREO; ECYMTSERB; ECYMTBENG; ECYMTJAPA; ECYMTTURK; ECYMTCZEC; ECYMTSOMA; ECYMTABOR; ECYMTOTH
- You will add a new field to hold the diversity index values. Find and open the Add Field tool and run it with the following parameters:
- Input Table: Neighborhoods
- Field Name: LDI
- Field Type: FLOAT
- Find and open the Calculate Field tool. Run Calculate Field with the following parameters (you can copy and paste the formula below into the tool Expression parameter):
- Input Table: Neighborhoods
- Field Name: LDI
- Expression: 1 - (([ECYMTENGL]/[ECYMTTOT])^2 + ([ECYMTFREN]/[ECYMTTOT])^2 + ([ECYMTITAL]/[ECYMTTOT])^2 + ([ECYMTGERM]/ [ECYMTTOT])^2 + ([ECYMTPUNJ]/[ECYMTTOT])^2 + ([ECYMTCANT]/[ECYMTTOT])^2 + ([ECYMTSPAN]/[ECYMTTOT])^2 + ([ECYMTARAB]/[ECYMTTOT])^2 + ([ECYMTTAGA]/[ECYMTTOT])^2 + ([ECYMTPORT]/[ECYMTTOT])^2 + ([ECYMTPOLI]/[ECYMTTOT])^2 + ([ECYMTMAND]/[ECYMTTOT])^2 + ([ECYMTCHIO]/[ECYMTTOT])^2 + ([ECYMTURDU]/[ECYMTTOT])^2 + ([ECYMTVIET]/ [ECYMTTOT])^2 + ([ECYMTUKRA]/[ECYMTTOT])^2 + ([ECYMTPERS]/[ECYMTTOT])^2 + ([ECYMTRUSS]/[ECYMTTOT])^2 + ([ECYMTDUTC]/[ECYMTTOT])^2 + ([ECYMTKORE]/[ECYMTTOT])^2 + ([ECYMTGREE]/[ECYMTTOT])^2 + ([ECYMTTAMI]/[ECYMTTOT])^2 + ([ECYMTGUJA]/[ECYMTTOT])^2 + ([ECYMTROMA]/[ECYMTTOT])^2 + ([ECYMTHIND]/[ECYMTTOT])^2 + ([ECYMTHUNG]/[ECYMTTOT])^2 + ([ECYMTCROA]/[ECYMTTOT])^2 + ([ECYMTCREO]/[ECYMTTOT])^2 + ([ECYMTSERB]/[ECYMTTOT])^2 + ([ECYMTBENG]/[ECYMTTOT])^2 + ([ECYMTJAPA]/[ECYMTTOT])^2 + ([ECYMTTURK]/[ECYMTTOT])^2 + ([ECYMTCZEC]/[ECYMTTOT])^2 + ([ECYMTSOMA]/[ECYMTTOT])^2 + ([ECYMTABOR]/[ECYMTTOT])^2 + ([ECYMTOTH]/[ECYMTTOT])^2)
- Open the Neighborhoods table and double-click on the LDI field to sort the values. Notice they range from 0.158 (little diversity) to 0.932 (high diversity).
Create a hot spot map of linguistic diversity
- Find and open the Optimized Hot Spot Analysis tool. Run Optimized Hot Spot Analysis with the following parameters:
- Input Features: Neighborhoods
- Output Features: the name of your output feature class such as LDIHotSpots
- Analysis Field: LDI
High diversity areas are shown in red; low diversity areas are shown in blue.
Other applications
This case study evaluated linguistic diversity but similar steps could be employed to look at diversity for other categorical/nominal variables such as race/ethnicity, land use, occupations, age categories, home value categories, crop varieties, and so forth.