ArcGIS Desktop

  • ArcGIS Pro
  • ArcMap

  • My Profile
  • Help
  • Sign Out
ArcGIS Desktop

ArcGIS Online

The mapping platform for your organization

ArcGIS Desktop

A complete professional GIS

ArcGIS Enterprise

GIS in your enterprise

ArcGIS Developers

Tools to build location-aware apps

ArcGIS Solutions

Free template maps and apps for your industry

ArcGIS Marketplace

Get apps and data for your organization

  • Documentation
  • Support
Esri
  • Sign In
user
  • My Profile
  • Sign Out

Analytics

  • Home
  • Applied Analysis
  • Python

Celebrating linguistic diversity workflow

    Download the data package

    Workflow using ArcGIS Pro

    ArcGIS Pro icon

    Note:

    The steps below are based on the 1.2 release of ArcGIS Pro, but they should work fine for later software releases as well. To follow the steps below, you may download and unzip the data in the data package provided, or improvise using your own data.

    Combine neighborhoods to incorporate missing data

    1. If you haven't done so already, download and unzip the data package provided at the top of this workflow.
    2. Open ArcGIS Pro and browse to the LingualDiversity.ppkx project package.
    3. Once the project opens, right-click on the AllNeighborhoods layer in the Contents pane and select Attribute Table. Notice there are several zero values for the HasData Mother Tongue Variables (HasDataMT) field. Data has been suppressed for the sixty neighborhoods with a zero for this field.
    4. You will create a layer for neighborhoods with data, and another for neighborhoods that do not have data.
    5. Begin by searching for the Select Layer By Attribute tool in the Geoprocessing pane.
    6. Double-click to open the Select Layer By Attribute tool, and run it with the following parameters. Click the Add Clause button to create the Expression.
      • Layer Name or Table View: AllNeighborhoods
      • Selection type: New selection
      • Expression: HasData Mother Tongue Variables is Greater Than 0
        Select Layer By Attribute tool parameters
    7. Find and open the Copy Features tool. Run Copy Features with the following parameters, creating a feature class of neighborhoods with data:
      • Input Features: AllNeighborhoods
      • Output Feature Class: the name of your output feature class such as DataNeighborhoods
        Copy Features tool parameters
    8. Use the Select Layer By Attribute tool again to select neighborhoods with no data:
      • Layer Name or Table View: AllNeighborhoods
      • Selection type: New selection
      • Expression: HasData Mother Tongue Variables is Equal to 0
    9. Use the Copy Features tool to create a feature class for the selected neighborhoods where data has been suppressed:
      • Input Features: AllNeighborhoods
      • Output Feature Class: the name of your output feature class such as NoDataNeighborhoods
    10. Clear the selected features by running Select Layer By Attribute and setting the Selection type to Clear the current selection, or by clicking on the Clear tool on the Map tab.
      Clear selection
    11. You will now generate random points -- lots of them -- inside the neighborhoods with data. These become seeds for creating Thiessen polygons covering all of the neighborhoods in the study area. You will then Dissolve the Thiessen polygons by neighborhood ID to create larger areas that you can use to carve up, into pieces, the neighborhoods without data. These pieces will be distributed and aggregated among nearby neighborhoods that do have data. The steps are shown graphically below using only five neighborhood features for simplicity.
      Creating final geometries using Thiessen polygons
      Note:

      When the neighborhoods are narrow and sinuous, as they are in Edmonton, you will need more random points. This ensures the aggregation process doesn't separate neighborhoods into disconnected pieces. You may still end up with some polygon slivers. You will find and remove slivers later in the workflow.

      Begin by calculating the number of random points to create within each neighborhood with data.
    12. Open the table for the DataNeighborhoods layer and sort the neighborhoods, smallest to largest, based on area (double-click the SHAPE_Area field to sort the values).
      Sort a field by double-clicking the field name
    13. Notice that the smallest area is 584706.0338. If you divide all of the areas by 100,000 and round to an Integer value, the smallest area will have 6 points. This should be good for your purposes. Your goal is to make sure all neighborhoods get at least a few points, but you don't want so many points that the Create Thiessen Polygons tool takes forever to finish.
    14. Create a new field to hold the number of random points to create in each neighborhood. To do this, find and open the Add Field tool and run it with the following parameters:
      • Input Table: DataNeighborhoods
      • Field Name: NumPoints
      • Field Type: Long (large integer)
        Add Field parameters
    15. Calculate the number of points per neighborhood. To do this, find and open the Calculate Field tool and run it with the following parameters:
      • Input Table: DataNeighborhoods
      • Field Name: NumPoints
      • Expression: NumPoints = !SHAPE_Area! / 100000
        Calculate Field parameters
    16. You will now create a new feature class of random points. You will use the NumPoints field you created above to indicate the number of random points to generate in each neighborhood polygon. You will set the minimum spacing between random points to be 100 meters. The idea here is to set the spacing to be the largest value possible that doesn't decrease the number of points created. For fun, run the Create Random Points tool several times using larger and larger values for the Minimum Allowed Distance parameter to see how large you can make this value before you begin to see messages indicating not all points could be created.
    17. Find and open the Create Random Points tool and run it with the following parameters:
      • Output Location: a working folder or file geodatabase
      • Output Point Feature Class: the name of your output feature class such as NeighborhoodPoints
      • Constraining Feature Class: DataNeighborhoods
      • Number of Points: Field
      • [value or field]: NumPoints
      • Minimum Allowed: Linear unit
      • Distance: 100 Meters
        Create Random Points tool parameters
    18. Each random point is given a CID value equal to the object or feature ID of its associated neighborhood. In other words, if a point is generated inside the neighborhood whose OBJECTID is 13, that point will be assigned a CID value of 13. The CID values are carried through each step in the aggregation workflow below. In the final step, you will need the CID field to also be in the DataNeighborhoods layer. The next steps add a new CID field to the DataNeighborhoods layer and calculate it to be equal to the object ID, for use later in the workflow.
    19. Run the Add Field tool with the following parameters:
      • Input Table: DataNeighborhoods
      • Field Name: CID
      • Field Type: Long (large integer)
    20. Use Calculate Field to set the CID values to match the neighborhood object ID values:
      • Input Table: DataNeighborhoods
      • Field Name: CID
      • Expression: !OBJECTID!
    21. With that bit of housekeeping taken care of, you are ready to proceed by creating a Thiessen polygon for each random point.
    22. Find and open the Create Thiessen Polygons tool and run it with the following parameters:
      • Input Features: NeighborhoodPoints
      • Output Feature Class: the name of your output feature class such as ThiessenPolys
      • Output Fields: All Fields
        Create Thiessen Polygons tool parameters
    23. The result is a feature class with many Thiessen Polygons.
      Many tiny Thiessen polygons cover the study area

      The Thiessen polygons will be used to carve up the NoDataNeighborhoods. To help reduce the number of sliver polygons created during the carving process, you will snap the Thiessen polygons to match the edges of the NoDataNeighborhoods.

    24. Find and open the Snap tool. Run the tool using the following parameters:
      • Input Features: ThiessenPolys
      • Snap Environment:
        • Features: NoDataNeighborhoods
        • Type: EDGE
        • Distance: 200 Meters
        Snap tool parameters
    25. Note:

      You have used a snap distance twice as big as the minimum distance between random points. If the steps to remove sliver polygons, given later in this workflow, do not result in the expected 328 neighborhoods (while unlikely, this is possible because the Thiessen polygons are generated from randomly located points), increasing the snap distance (to 300, for example) will resolve this.

      Once the small Thiessen polygons have been snapped to the NoDataNeighborhoods, you will dissolve them into larger overlay areas using the CID field carried forward from the random points.
    26. Find and open the Dissolve tool. Run the tool with the following parameters (be sure to uncheck the Create multipart features parameter):
      • Input Features: ThiessenPolys
      • Output Feature Class: the name of your output feature class such as OverlayAreas
      • Dissolve Field(s): CID
      • Create multipart features: No
        Dissolve tool parameters
    27. If you set the fill color for the OverlayAreas to clear, and zoom into some of the neighborhoods with suppressed data, you can see how these OverlayAreas will carve those NoDatNeighborhoods into pieces so they can be distributed among the neighborhoods that do have data.
      The overlay areas carve the neighborhoods with no data into pieces
      You will use the Intersect tool to carve up the NoDataNeighborhoods.
    28. Find and open the Intersect tool. Run it using the following parameters:
      • Input Features: OverlayAreas; NoDataNeighborhoods
      • Output Feature Class: the name of your output feature class such as Pieces
      • Join Attributes: All attributes
        Intersect tool parameters
    29. You will Merge the Pieces polygons with the DataNeighborhoods polygons, then use Dissolve to aggregate each no-data piece to a neighborhood that does have data. The number of neighborhoods after aggregation should match the number of polygons in your DataNeighborhoods feature class (328). If you have more polygons after you run the dissolve procedure, you will look for sliver polygons and eliminate them in order to get your final neighborhood geometry.
    30. Find and open the Merge tool and run it with the following parameters (remove all but the CID field):
      • Input Datasets: Pieces; DataNeighborhoods
      • Output Dataset: the name of your output feature class such as AllThePieces
      • Field Map: CID
      • Merge Rule: First
        Merge tool parameters
    31. Run the Dissolve tool using the following parameters:
      • Input Features: AllThePieces
      • Output Feature Class: the name of your output feature class such as DissolvedPieces
      • Dissolve_Fields: CID
      • Create multipart features: No

    Remove sliver polygons

    1. Open the table for the DissolvedPieces feature class and determine the number of neighborhoods it contains (the neighborhood count is the number of records in the table). If the number does not match the number of neighborhoods in the DataNeighborhoods feature class (328), you have sliver polygons that need to be removed.
    2. Note:

      If the number of neighborhoods in the DissolvedPieces feature class does match (it is 328), use the Copy Features tool to create your final geometry. Name the output appropriately, something like Neighborhoods, for example, then skip the remaining steps in this section of the workflow.

    3. Determine the area for the smallest DataNeighborhood by opening the DataNeighborhood attribute table and double-clicking on the SHAPE_Area field to sort the areas smallest to largest. Notice that the smallest area is 584706.0338. Any polygons smaller than this must be sliver polygons.
    4. Select the sliver polygons using the Select Layer By Attribute tool with the following parameters:
      • Layer Name or Table View: DissolvedPieces
      • Selection Type: New selection
      • SHAPE_Area is Less Than 584706
    5. Remove the slivers using the Eliminate tool with these parameters:
      • Input Layer: DissolvedPieces
      • Output Feature Class: the name of your output feature class such as Neighborhoods
      • Eliminating polygon by border: No
    6. The resulting number of neighborhoods in the Neighborhoods feature class should now match the number of DataNeighborhoods (328). You will use this neighborhood geometry for the remaining analyses.

    Compute the Linguistic Diversity Index

    You now have the final neighborhood geometry but still need to transfer all of the mother tongue language variables to it. Once you have the variables, you will compute the diversity index for each neighborhood.

    1. Find and open the Join Field tool and run it using the following parameters:
      • Input Table: Neighborhoods
      • Input Join Field: NUMBER
      • Join Table: DataNeighborhoods
      • Output Join Field: NUMBER
      • Join Fields: 2014 Household population for mother tongue; 2014 MT: Aboriginal languages; 2014 MT: Arabic; 2014 MT: Bengali; 2014 MT: Cantonese; 2014 MT: Croatian; 2014 MT: Czech; 2014 MT: Dutch; 2014 MT: English; 2014 MT: French; 2014 MT: German; 2014 MT: Greek; 2014 MT: Gujarati; 2014 MT: Hindi; 2014 MT: Hungarian; 2014 MT: Italian; 2014 MT: Japanese; 2014 MT: Korean; 2014 MT: Mandarin; 2014 MT: Other Languages; 2014 MT: Persian (Farsi) ; 2014 MT: Polish; 2014 MT: Portuguese; 2014 MT: Punjabi; 2014 MT: Romanian; 2014 MT: Russian; 2014 MT: Serbian; 2014 MT: Somali; 2014 MT: Spanish; 2014 MT: Tagalog (Pilipino, Filipino) ; 2014 MT: Tamil; 2014 MT: Turkish; 2014 MT: Ukrainian; 2014 MT: Urdu; 2014 MT: Vietnamese; NAME
    2. Tip:
      If you click the expand icon next to the Join Fields parameter, you can check all of the fields on at once, and then just uncheck the few you don't need. Be sure to click on Add when your list is complete.
      Join Fields parameter tip
    3. Find and open the Add Field tool to create a field to hold the Linguistic Diversity Index (LDI). Run Add Field with the following parameters:
      • Input Table: Neighborhoods
      • Field Name: LDI
      • Field Type: Float
    4. Find and open the Calculate Field tool. Run Calculate Field with the following parameters (you should be able to copy and paste the formula below directly into the tool dialog):
      • Input Table: Neighborhoods
      • Field Name: LDI
      • Expression: LDI = 1 - ((!ECYMTENGL!/!ECYMTTOT!)**2 + (!ECYMTFREN!/!ECYMTTOT!)**2 + (!ECYMTITAL!/!ECYMTTOT!)**2 + (!ECYMTGERM!/!ECYMTTOT!)**2 + (!ECYMTPUNJ!/!ECYMTTOT!)**2 + (!ECYMTCANT!/!ECYMTTOT!)**2 + (!ECYMTSPAN!/!ECYMTTOT!)**2 + (!ECYMTARAB!/!ECYMTTOT!)**2 + (!ECYMTTAGA!/!ECYMTTOT!)**2 + (!ECYMTPORT!/!ECYMTTOT!)**2 + (!ECYMTPOLI!/!ECYMTTOT!)**2 + (!ECYMTMAND!/!ECYMTTOT!)**2 + (!ECYMTCHIO!/!ECYMTTOT!)**2 + (!ECYMTURDU!/!ECYMTTOT!)**2 + (!ECYMTVIET!/!ECYMTTOT!)**2 + (!ECYMTUKRA!/!ECYMTTOT!)**2 + (!ECYMTPERS!/!ECYMTTOT!)**2 + (!ECYMTRUSS!/!ECYMTTOT!)**2 + (!ECYMTDUTC!/!ECYMTTOT!)**2 + (!ECYMTKORE!/!ECYMTTOT!)**2 + (!ECYMTGREE!/!ECYMTTOT!)**2 + (!ECYMTTAMI!/!ECYMTTOT!)**2 + (!ECYMTGUJA!/!ECYMTTOT!)**2 + (!ECYMTROMA!/!ECYMTTOT!)**2 + (!ECYMTHIND!/!ECYMTTOT!)**2 + (!ECYMTHUNG!/!ECYMTTOT!)**2 + (!ECYMTCROA!/!ECYMTTOT!)**2 + (!ECYMTCREO!/!ECYMTTOT!)**2 + (!ECYMTSERB!/!ECYMTTOT!)**2 + (!ECYMTBENG!/!ECYMTTOT!)**2 + (!ECYMTJAPA!/!ECYMTTOT!)**2 + (!ECYMTTURK!/!ECYMTTOT!)**2 + (!ECYMTCZEC!/!ECYMTTOT!)**2 + (!ECYMTSOMA!/!ECYMTTOT!)**2 + (!ECYMTABOR!/!ECYMTTOT!)**2 + (!ECYMTOTH!/!ECYMTTOT!)**2)
    5. Open the Neighborhoods table and double-click on the LDI field to sort the values. Notice they range from 0.158 (low diversity) to 0.932 (high diversity).

    Create a hot spot map of linguistic diversity

    1. Find and open the Optimized Hot Spot Analysis tool. Run Optimized Hot Spot Analysis with the following parameters:
      • Input Features: Neighborhoods
      • Output Features: the name of your output feature class such as LDIHotSpots
      • Analysis Field: LDI
        Diversity hot and cold spots
    2. High diversity areas are shown in red; low diversity areas are shown in blue.

    Other applications

    This case study evaluated linguistic diversity but similar steps could be employed to look at diversity for other categorical/nominal variables such as race/ethnicity, land use, occupations, age categories, home value categories, crop varieties, and so forth.

    • Celebrating Linguistic Diversity - Analysis Overview
    • References and resources for learning more

    ArcGIS Desktop

    • Home
    • Documentation
    • Support

    ArcGIS

    • ArcGIS Online
    • ArcGIS Desktop
    • ArcGIS Enterprise
    • ArcGIS
    • ArcGIS Developer
    • ArcGIS Solutions
    • ArcGIS Marketplace

    About Esri

    • About Us
    • Careers
    • Esri Blog
    • User Conference
    • Developer Summit
    Esri
    Tell us what you think.
    Copyright © 2021 Esri. | Privacy | Legal