Workflow using ArcGIS Pro
Combine neighborhoods to incorporate missing data
- If you haven't done so already, download and unzip the data package provided at the top of this workflow.
- Open ArcGIS Pro and browse to the LingualDiversity.ppkx project package.
- Once the project opens, right-click on the AllNeighborhoods layer in the Contents pane and select Attribute Table. Notice there are several zero values for the HasData Mother Tongue Variables (HasDataMT) field. Data has been suppressed for the sixty neighborhoods with a zero for this field.
- Begin by searching for the Select Layer By Attribute tool in the Geoprocessing pane.
- Double-click to open the Select Layer By Attribute tool, and run it with the following parameters. Click the Add Clause button to create the Expression.
- Layer Name or Table View: AllNeighborhoods
- Selection type: New selection
- Expression: HasData Mother Tongue Variables is Greater Than 0
- Find and open the Copy Features tool. Run Copy Features with the following parameters, creating a feature class of neighborhoods with data:
- Input Features: AllNeighborhoods
- Output Feature Class: the name of your output feature class such as DataNeighborhoods
- Use the Select Layer By Attribute tool again to select neighborhoods with no data:
- Layer Name or Table View: AllNeighborhoods
- Selection type: New selection
- Expression: HasData Mother Tongue Variables is Equal to 0
- Use the Copy Features tool to create a feature class for the selected neighborhoods where data has been suppressed:
- Input Features: AllNeighborhoods
- Output Feature Class: the name of your output feature class such as NoDataNeighborhoods
- Clear the selected features by running Select Layer By Attribute and setting the Selection type to Clear the current selection, or by clicking on the Clear tool on the Map tab.
- Open the table for the DataNeighborhoods layer and sort the neighborhoods, smallest to largest, based on area (double-click the SHAPE_Area field to sort the values).
- Create a new field to hold the number of random points to create in each neighborhood. To do this, find and open the Add Field tool and run it with the following parameters:
- Input Table: DataNeighborhoods
- Field Name: NumPoints
- Field Type: Long (large integer)
- Calculate the number of points per neighborhood. To do this, find and open the Calculate Field tool and run it with the following parameters:
- Input Table: DataNeighborhoods
- Field Name: NumPoints
- Expression: NumPoints = !SHAPE_Area! / 100000
- Find and open the Create Random Points tool and run it with the following parameters:
- Output Location: a working folder or file geodatabase
- Output Point Feature Class: the name of your output feature class such as NeighborhoodPoints
- Constraining Feature Class: DataNeighborhoods
- Number of Points: Field
- [value or field]: NumPoints
- Minimum Allowed: Linear unit
- Distance: 100 Meters
- Run the Add Field tool with the following parameters:
- Input Table: DataNeighborhoods
- Field Name: CID
- Field Type: Long (large integer)
- Use Calculate Field to set the CID values to match the neighborhood object ID values:
- Input Table: DataNeighborhoods
- Field Name: CID
- Expression: !OBJECTID!
- Find and open the Create Thiessen Polygons tool and run it with the following parameters:
- Input Features: NeighborhoodPoints
- Output Feature Class: the name of your output feature class such as ThiessenPolys
- Output Fields: All Fields
- Find and open the Snap tool. Run the tool using the following parameters:
- Input Features: ThiessenPolys
- Snap Environment:
- Features: NoDataNeighborhoods
- Type: EDGE
- Distance: 200 Meters
- Find and open the Dissolve tool. Run the tool with the following parameters (be sure to uncheck the Create multipart features parameter):
- Input Features: ThiessenPolys
- Output Feature Class: the name of your output feature class such as OverlayAreas
- Dissolve Field(s): CID
- Create multipart features: No
- Find and open the Intersect tool. Run it using the following parameters:
- Input Features: OverlayAreas; NoDataNeighborhoods
- Output Feature Class: the name of your output feature class such as Pieces
- Join Attributes: All attributes
- Find and open the Merge tool and run it with the following parameters (remove all but the CID field):
- Input Datasets: Pieces; DataNeighborhoods
- Output Dataset: the name of your output feature class such as AllThePieces
- Field Map: CID
- Merge Rule: First
- Run the Dissolve tool using the following parameters:
- Input Features: AllThePieces
- Output Feature Class: the name of your output feature class such as DissolvedPieces
- Dissolve_Fields: CID
- Create multipart features: No
You will create a layer for neighborhoods with data, and another for neighborhoods that do not have data.
You will now generate random points -- lots of them -- inside the neighborhoods with data. These become seeds for creating Thiessen polygons covering all of the neighborhoods in the study area. You will then Dissolve the Thiessen polygons by neighborhood ID to create larger areas that you can use to carve up, into pieces, the neighborhoods without data. These pieces will be distributed and aggregated among nearby neighborhoods that do have data. The steps are shown graphically below using only five neighborhood features for simplicity.
Begin by calculating the number of random points to create within each neighborhood with data.
Notice that the smallest area is 584706.0338. If you divide all of the areas by 100,000 and round to an Integer value, the smallest area will have 6 points. This should be good for your purposes. Your goal is to make sure all neighborhoods get at least a few points, but you don't want so many points that the Create Thiessen Polygons tool takes forever to finish.
You will now create a new feature class of random points. You will use the NumPoints field you created above to indicate the number of random points to generate in each neighborhood polygon. You will set the minimum spacing between random points to be 100 meters. The idea here is to set the spacing to be the largest value possible that doesn't decrease the number of points created. For fun, run the Create Random Points tool several times using larger and larger values for the Minimum Allowed Distance parameter to see how large you can make this value before you begin to see messages indicating not all points could be created.
Each random point is given a CID value equal to the object or feature ID of its associated neighborhood. In other words, if a point is generated inside the neighborhood whose OBJECTID is 13, that point will be assigned a CID value of 13. The CID values are carried through each step in the aggregation workflow below. In the final step, you will need the CID field to also be in the DataNeighborhoods layer. The next steps add a new CID field to the DataNeighborhoods layer and calculate it to be equal to the object ID, for use later in the workflow.
With that bit of housekeeping taken care of, you are ready to proceed by creating a Thiessen polygon for each random point.
The result is a feature class with many Thiessen Polygons.
The Thiessen polygons will be used to carve up the NoDataNeighborhoods. To help reduce the number of sliver polygons created during the carving process, you will snap the Thiessen polygons to match the edges of the NoDataNeighborhoods.
Once the small Thiessen polygons have been snapped to the NoDataNeighborhoods, you will dissolve them into larger overlay areas using the CID field carried forward from the random points.
If you set the fill color for the OverlayAreas to clear, and zoom into some of the neighborhoods with suppressed data, you can see how these OverlayAreas will carve those NoDatNeighborhoods into pieces so they can be distributed among the neighborhoods that do have data.
You will use the Intersect tool to carve up the NoDataNeighborhoods.
You will Merge the Pieces polygons with the DataNeighborhoods polygons, then use Dissolve to aggregate each no-data piece to a neighborhood that does have data. The number of neighborhoods after aggregation should match the number of polygons in your DataNeighborhoods feature class (328). If you have more polygons after you run the dissolve procedure, you will look for sliver polygons and eliminate them in order to get your final neighborhood geometry.
Remove sliver polygons
- Open the table for the DissolvedPieces feature class and determine the number of neighborhoods it contains (the neighborhood count is the number of records in the table). If the number does not match the number of neighborhoods in the DataNeighborhoods feature class (328), you have sliver polygons that need to be removed.
- Determine the area for the smallest DataNeighborhood by opening the DataNeighborhood attribute table and double-clicking on the SHAPE_Area field to sort the areas smallest to largest. Notice that the smallest area is 584706.0338. Any polygons smaller than this must be sliver polygons.
- Select the sliver polygons using the Select Layer By Attribute tool with the following parameters:
- Layer Name or Table View: DissolvedPieces
- Selection Type: New selection
- SHAPE_Area is Less Than 584706
- Remove the slivers using the Eliminate tool with these parameters:
- Input Layer: DissolvedPieces
- Output Feature Class: the name of your output feature class such as Neighborhoods
- Eliminating polygon by border: No
The resulting number of neighborhoods in the Neighborhoods feature class should now match the number of DataNeighborhoods (328). You will use this neighborhood geometry for the remaining analyses.
Compute the Linguistic Diversity Index
You now have the final neighborhood geometry but still need to transfer all of the mother tongue language variables to it. Once you have the variables, you will compute the diversity index for each neighborhood.
- Find and open the Join Field tool and run it using the following parameters:
- Input Table: Neighborhoods
- Input Join Field: NUMBER
- Join Table: DataNeighborhoods
- Output Join Field: NUMBER
- Join Fields: 2014 Household population for mother tongue; 2014 MT: Aboriginal languages; 2014 MT: Arabic; 2014 MT: Bengali; 2014 MT: Cantonese; 2014 MT: Croatian; 2014 MT: Czech; 2014 MT: Dutch; 2014 MT: English; 2014 MT: French; 2014 MT: German; 2014 MT: Greek; 2014 MT: Gujarati; 2014 MT: Hindi; 2014 MT: Hungarian; 2014 MT: Italian; 2014 MT: Japanese; 2014 MT: Korean; 2014 MT: Mandarin; 2014 MT: Other Languages; 2014 MT: Persian (Farsi) ; 2014 MT: Polish; 2014 MT: Portuguese; 2014 MT: Punjabi; 2014 MT: Romanian; 2014 MT: Russian; 2014 MT: Serbian; 2014 MT: Somali; 2014 MT: Spanish; 2014 MT: Tagalog (Pilipino, Filipino) ; 2014 MT: Tamil; 2014 MT: Turkish; 2014 MT: Ukrainian; 2014 MT: Urdu; 2014 MT: Vietnamese; NAME
- Find and open the Add Field tool to create a field to hold the Linguistic Diversity Index (LDI). Run Add Field with the following parameters:
- Input Table: Neighborhoods
- Field Name: LDI
- Field Type: Float
- Find and open the Calculate Field tool. Run Calculate Field with the following parameters (you should be able to copy and paste the formula below directly into the tool dialog):
- Input Table: Neighborhoods
- Field Name: LDI
- Expression: LDI = 1 - ((!ECYMTENGL!/!ECYMTTOT!)**2 + (!ECYMTFREN!/!ECYMTTOT!)**2 + (!ECYMTITAL!/!ECYMTTOT!)**2 + (!ECYMTGERM!/!ECYMTTOT!)**2 + (!ECYMTPUNJ!/!ECYMTTOT!)**2 + (!ECYMTCANT!/!ECYMTTOT!)**2 + (!ECYMTSPAN!/!ECYMTTOT!)**2 + (!ECYMTARAB!/!ECYMTTOT!)**2 + (!ECYMTTAGA!/!ECYMTTOT!)**2 + (!ECYMTPORT!/!ECYMTTOT!)**2 + (!ECYMTPOLI!/!ECYMTTOT!)**2 + (!ECYMTMAND!/!ECYMTTOT!)**2 + (!ECYMTCHIO!/!ECYMTTOT!)**2 + (!ECYMTURDU!/!ECYMTTOT!)**2 + (!ECYMTVIET!/!ECYMTTOT!)**2 + (!ECYMTUKRA!/!ECYMTTOT!)**2 + (!ECYMTPERS!/!ECYMTTOT!)**2 + (!ECYMTRUSS!/!ECYMTTOT!)**2 + (!ECYMTDUTC!/!ECYMTTOT!)**2 + (!ECYMTKORE!/!ECYMTTOT!)**2 + (!ECYMTGREE!/!ECYMTTOT!)**2 + (!ECYMTTAMI!/!ECYMTTOT!)**2 + (!ECYMTGUJA!/!ECYMTTOT!)**2 + (!ECYMTROMA!/!ECYMTTOT!)**2 + (!ECYMTHIND!/!ECYMTTOT!)**2 + (!ECYMTHUNG!/!ECYMTTOT!)**2 + (!ECYMTCROA!/!ECYMTTOT!)**2 + (!ECYMTCREO!/!ECYMTTOT!)**2 + (!ECYMTSERB!/!ECYMTTOT!)**2 + (!ECYMTBENG!/!ECYMTTOT!)**2 + (!ECYMTJAPA!/!ECYMTTOT!)**2 + (!ECYMTTURK!/!ECYMTTOT!)**2 + (!ECYMTCZEC!/!ECYMTTOT!)**2 + (!ECYMTSOMA!/!ECYMTTOT!)**2 + (!ECYMTABOR!/!ECYMTTOT!)**2 + (!ECYMTOTH!/!ECYMTTOT!)**2)
- Open the Neighborhoods table and double-click on the LDI field to sort the values. Notice they range from 0.158 (low diversity) to 0.932 (high diversity).
Create a hot spot map of linguistic diversity
- Find and open the Optimized Hot Spot Analysis tool. Run Optimized Hot Spot Analysis with the following parameters:
- Input Features: Neighborhoods
- Output Features: the name of your output feature class such as LDIHotSpots
- Analysis Field: LDI
High diversity areas are shown in red; low diversity areas are shown in blue.
Other applications
This case study evaluated linguistic diversity but similar steps could be employed to look at diversity for other categorical/nominal variables such as race/ethnicity, land use, occupations, age categories, home value categories, crop varieties, and so forth.