Managing elevation data, Part 2: Design and data management plan—Help

Data storage
Data usage
Requirements
Data sources
Data management organization and services (products)
Metadata
Format optimization

The audience for this workflow is primarily the image data managers within an organization who are mandated to make elevation data accessible to a range of user communities. This workflow presumes the image data manager is using ArcGIS for Desktop to manage the data and ArcGIS for Server to distribute the data as one or more image services, but this workflow is also applicable for managing and distributing elevation data in only ArcGIS for Desktop.

This workflow is intended to address raster, cell-based, elevation data, and 3D point data stored as LAS files (the LAS dataset and Terrain dataset formats can also be used in this workflow).

This lists the general design of the elevation data management. Each of these will be discussed below.

Data storage (size, requirements, and locations).
Prepare data (may require preprocessing).
Create a mosaic dataset for each collection (source).
Create a mosaic dataset from each collection (master).
Create different mosaic datasets for visualization, analysis, user access, and to publish (referenced).

Data storage

Data storage is not discussed here, but does require some planning depending on your requirements. For a design methodology that outlines a successful geographic information system, refer to Esri's System Design Strategies System Design Strategies at wiki.gis.com .

One thing to mention is data organization. It is ideal if you can organize your data in folders grouped by their product. For example, keep the SRTM data in one folder and the NED 1/3 arcsec data in another folder. You will see in the steps in Part 3 how this helps when loading data, for quality assurance/quality control (QA/QC), and in longer-term maintenance.

Data usage

This workflow emphasizes three different modes of using elevation data. Most end users need to see visualizations of the topography to meet their needs, whereas a smaller percentage of users want to see the results of a topographic analysis. The smallest percentage of users, such as engineers, have a requirement to access the actual elevation values to do their analysis.

It is important to understand the differences and implement the proper usage mode for each of these users, since system efficiency and responsiveness can be greatly impacted. The different usage models that will be referenced repeatedly are viewer usage, analysis results usage, and data values usage.

Usage model 1—Viewer usage

Users have a need to view representations of the elevation data. Therefore, the data manager must create appropriate visualization products on the server and serve those views to the users. This refers to the largest but least technically demanding, user group—in many cases the general public—who needs easy access to any number of relatively clear products based on elevation data. Examples include:

A hillshaded or shaded relief image—For inclusion in a topographic map or basemap
An image representing slope—For urban planning, landslide susceptibility, and so on
An image representing aspect—For agriculture, wildlife habitat delineation and management, climate modeling, and so on

Of course, for users with ArcGIS, by providing them access to the elevation data, the application can generate these products themselves.

Usage model 2—Analysis results usage

Users define parameters and a region of interest for server-side analysis of the elevation data, then retrieve the results. This refers to a group of users who need access to a variety of analytical products that can be generated on the server. The results are typically maps or discrete features served to the user without transmitting the original source data. Examples include:

Viewshed calculations for visibility and line-of-sight analysis—For determining what can be seen from a location, for siting cell phone towers and microwave communication equipment, or for planning clear cuts
Uses in disaster management—For evacuation planning, flood mitigation, and a web-based overlay of current flood data for real-time decision making
Industrial planning—For wind profiles and visibility for wind farm locations, or hydroelectric dam design
Calculation of cartographic contours—For display on a map
Calculation of profiles along straight lines or line segments—For engineering pipeline routes and pressure calculations, road planning, or timber harvest road planning and extraction costs

For these applications, the user generally does not need the actual elevation values. By only distributing these products, the bandwidth requirements may be slightly reduced since the data sizes will be smaller. For example, the raw elevation values are 32 bit, whereas a viewshed may be a 1 bit or 8 bit result.

Usage model 3—Data values usage

Users need access to the elevation values. This refers to a user who needs the original elevation data in digital format to support numeric calculations (presumably in a client-side web or desktop application). Examples include:

As input in a secondary process—For the orthorectification of imagery
As input in their own data models or processes—For creating contours or completing water flow analysis for use in hydrographic design of irrigation systems or flood modeling

Of the three usage models, this is the most expensive in terms of time and resources on the server since it often requires a much larger amount of data to be accessed and transmitted.

Requirements

To provide the appropriate data and access to meet the usage models above. This is not intended to be a complete listing of requirements, but an introduction to some important requirements specific to elevation data. You must then decide if these requirements are applicable in your organization, and make appropriate decisions regarding proper implementation.

Distribute an image, result, or data
Provide access internally, intranet, or Internet
Provide one or many elevation models or representations
Provide access to 1, 10, 100, or 1 million users
Provide access to the source data
Many data sources, managed as one
Limited access to source and management
Need to be able to easily update or replace content

Data download and export

It is important for image data managers and end users to understand that any sampling of elevation data changes the data. For example, if a dataset is viewed at a resolution other than full resolution, in a different projection, or pixel alignment, the data will be resampled. It is not uncommon to oversample an elevation dataset, for example, by zooming in to a scale of 1:1,000 of a viewshed created from a 5-meter post spacing dataset—this is far too close.

For some applications, users may need access to the numeric values within a region of interest (data values usage). There are two methods for providing the numeric data values to a client: export or download.

Export refers to extracting data within a specific extent and spatial reference. Exporting may provide raw data values, a resampled version of them, or a processed version such as a hillshade or slope image. Download refers to transmitting the original (full resolution, nonresampled) data values, typically within a specified area. It should be clear that download can result in the transfer of a very large volume of data from server to client (especially if the data covers a very large area and contains a large number of datasets). Therefore, appropriate constraints must be implemented to ensure the user and system are prepared for the result—such as setting a limit on the maximum amount of data to transfer or designing a web application with a warning.

Data sources

Here’s a list of the example data that will be used in this workflow. This data can range in bit depths, typically 16 bit or float, signed or unsigned.

This workflow assumes the data manager is using in-house, locally stored data.

Data	Description
GTOPO	GTOPO is a global elevation dataset with resolution of 30 arc seconds (approximately 1 km), available for download at http://www1.gsi.go.jp/geowww/globalmap-gsi/gtopo30/gtopo30.html.
SRTM	The Shuttle Radar Topography Mission (SRTM) is elevation data on a near-global scale, acquired from the Space Shuttle, to generate the most complete high-resolution digital topographic database of Earth. It is available at http://srtm.usgs.gov/index.php.
NED 10, NED 30	The National Elevation Dataset (NED) was created by the USGS for the USA. NED data is available nationally at resolutions of 1 arc-second (about 30 meters, NED 30) and 1/3 arc-second (about 10 meters, NED 10). See http://ned.usgs.gov/.
Lidar	Lidar data can come from a variety of sources. In this particular case, the data is provided from the Oregon Metro Regional Land Information System (RLIS) and can be used to provide both a DTM and DSM.

Data management organization and services (products)

One key objective is to ensure that all data, regardless of extent, is managed and distributed as a single unit. The alternative (which often occurs over time, as an organization grows and individual projects are completed) is to manage data from different geographic areas as separate datasets. However, ArcGIS provides the capability to efficiently manage very large dataset collections, reducing creation and maintenance costs that previously resulted from data duplication and unnecessary management overhead.

Mosaic dataset organization

The mosaic dataset is the optimal data structure to manage, display, and publish your collection of elevation data, because it can manage all the different raster file formats and resolutions, and the files are maintained in their original format on disk. It also has many options for displaying and processing your elevation data, such as the dynamic mosaicking that allows the best resolution to display at appropriate scales and functions used to process the data to create multiple products without copying the source data.

Functions specific to elevation data are Hillshade, Shaded Relief, Aspect, and Slope.

It is advantageous to separate mosaic datasets into two types: those that are primarily used for management and those that provide other data representations (such as hillshade) and are published. This separation can aid in organization. You can manage your imagery within a mosaic dataset, but use another mosaic dataset to share or disseminate (publish) the contents.

Here's an overview of the different types of mosaic datasets and what purpose they may serve:

Source—Used for managing imagery. It generally contains a collection of similar imagery. You may use a number of these source mosaic datasets to manage different collections, such as SRTM and NED. These can be published directly or (more often) used as the source for other mosaic datasets.
Master (or derived)—Used to compile multiple sources into a single mosaic dataset. The source of a master mosaic dataset is generally one or more source mosaic datasets but can also include other images or services.
Referenced—A unique type of mosaic dataset, which is mainly used to share or publish the imagery. It is created using one mosaic dataset as input and does not allow the items in the table to be edited; therefore, keeping the inputs safe from alteration. It's often used to provide differently processed outputs of the source or master mosaic dataset inputs.

The source and master (derived) mosaic datasets are symbolic names used to help convey an understanding of the organizational structure of mosaic datasets, whereas a reference mosaic dataset is a physically different form of a mosaic dataset.

Your elevation data can be stored in folders organized by you or the data vendor, but it will all be managed and distributed using one or more mosaic datasets and image services. The data contained within a source mosaic dataset is usually determined by having the same number of bands and bit depths. In this case, it's determined by bit depth and product; for example, lidar-derived data can be organized in one source mosaic dataset and the SRTM in another. This helps in keeping the data organized and allows data with different vertical units to be separated; for example, lidar-derived data is measured in feet and the SRTM is measured in meters. This also allows you to refine the footprints, if necessary, or control the NoData, which can be unique for each product.

The source mosaic datasets will be combined together using the master mosaic dataset. Some functions may be added to some of the sources to ensure the data represents the same information, such as the conversion from feet to meters or ellipsoidal to orthometric. (For most requirements, it is recommended that a mosaic dataset with orthometric ground height be created and maintained as the master service upon which others are built.)

Final products and services

Various referenced mosaic datasets can be created from the master mosaic dataset to provide the following recommended elevation data services:

Orthometric ground height
Orthometric surface height—If surface elevation data (DSM) is available (which shows buildings, tree canopies, bridges, and so on)
Ellipsoidal ground height
For visualizations:
- HillShade
- Shaded relief
- Slope
- Aspect (used for both visualization and analysis)

If more than one geoid correction is required by the user community, the data manager may want to publish geoids as image services, exposing appropriate options to the users.

Ocean considerations

In all cases, the data manager must decide how to represent the oceans. The proper choice depends on the applications the data must support. The options include the following:

Ocean is elevation with a 0 value
Ocean is NoData
Oceans are represented with bathymetric data

For most applications, it is acceptable to represent any sea level with 0. If the sea is defined as NoData, then orthorectification in any NoData areas fails. An easy method to fill in with zero values is to add a very low-resolution, worldwide dummy image into the mosaic dataset, where all pixel values are 0, then, when the values in the data, such as the SRTM, are NoData, the 0 value in the dummy image displays.

If the data manager chooses to include bathymetric data, there will be negative elevation values in the oceans, which allows for visualizations of the ocean floor. This allows flexibility in terms of how multiple services may render (display) the data; one client application could show a blue fill for water where the elevation is less than 0 and in the same area, a different client application could render the subsurface elevation as shaded terrain.

Overviews

At a basic level, mosaic dataset overviews are like raster dataset pyramids. They are lower-resolution images created to increase display speed and reduce CPU usage since fewer rasters are examined to display the mosaicked image; however, they differ greatly because you can control many of the parameters used to create them. You can create them to cover only a specific area or only at specific resolutions. They are created to allow you to view all the rasters contained in the entire mosaic dataset, not just for each raster. Overviews generally begin where raster pyramids stop, but you can specify a base pixel size at which your overviews will be generated if you prefer not to use all the raster's pyramids.

The data manager should consider the best approach for overviews. Overviews can be created from the project data, but if appropriate lower-resolution datasets are available from alternative sources, such as GTOPO, ETOPO, and GMTED2010, it is recommended that they be used. The remainder of this workflow is based on this approach, to build a large-region image service comprised of multiple datasets at different spatial resolutions (so there is generally no requirement for overviews).

Learn more about overviews

Metadata

There are numerous properties that the data manager should verify regarding all elevation data. The data manager must review and decide which components are important to maintain as well as which metadata fields to expose to the data users. The metadata listed below is recommended for the purposes of both quality assurance and system configuration.

Metadata the data manager should verify includes:

Data source or owner.
Copyright.
Horizontal coordinate system (projection, datum, and units).
Vertical datum (specific model, noting if it is ellipsoidal or orthometric) and units (feet or meters).
Horizontal accuracy (typically measured as CE90 or CE95, but also may be reported as RMS error or RMSE).
Vertical accuracy (typically measured as LE90).
Resolution (sample spacing stored in the data file and is not the same as the horizontal accuracy of the data).
Elevation surface type (DEM versus DSM).
How is NoData defined in this dataset:
- Are there areas of NoData?
- If yes, is it represented by a single value?
- Is the NoData limited to the edges of the datasets or are there holes of NoData within the valid data?
- Some products have associated feature classes that define void regions. Look to see if these areas were filled with a value and if it is the NoData value.
Was the data sampled from another source?
Acquisition date for the raw data.
QA/QC completed on what date, by whom, and using what methods and/or standards.
Is the data releasable or restricted (data may be proprietary or classified, free for public release)?

For unique metadata fields, it may be necessary to manually add these to the mosaic dataset's attribute table, such as the horizontal and vertical accuracies. This way, you can easily query the mosaic dataset for this information.

Format optimization

Format optimization is not always necessary, but it is necessary when the data is in a format that is not optimal or well supported. The following lists some recommendations supporting preprocessing of elevation data when building a single collection and publishing as a service.

Conversion of point or TIN data to a raster format. There are several geoprocessing tools to choose from depending on the source. See To Raster toolset or the 3D Analyst Conversion toolset.
LAS files, LAS dataset, and terrain datasets do not need to be converted to raster datasets. They are supported directly in the mosaic dataset.
The optimum format for elevation data is 32 bit floating-point values with LZW compression stored in a tiled TIFF format. LZW compression is lossless, fast, automatically tiles the data within a file, and NoData values don't take up space in the raster format.
Esri's general recommendation is not to convert elevation data from its original format unless the file format is inefficient and the server's performance suffers. For example, if any of the following are true, the data should be converted before adding it to the mosaic dataset:
- If elevation data is stored in an inefficient file format, such as ASCII XYZ (inefficient to read).
- If elevation data files are large (number of rows or columns > 5000) and the data is not tiled or does not have pyramids and server performance is important.

In some cases, you will want to run a performance check before converting your data to determine if they are suitable or can be optimized with conversion. For example:

If the original is stored as JPEG 2000, be aware that time is required for decompression, which may have a significant impact on performance. For best performance, you may need to convert the data to tiled TIFF.
If the original data is in the Esri Grid format, it is also better to convert if scalability is important in a multiprocessing environment.

File conversion

If the files are converted from one format to another, with no changes needed in the bit depth or other properties of the dataset, use the Raster To Other Format (Multiple) tool. If you have to make changes to some of the properties, use the Copy Raster tool. When applying this tool to many datasets, right-click the tool and click Batch, or write a script to ingest multiple datasets. In either case, the environments must be set. You can do this at the application level if you'll be applying this to several tools, or on the tool level.

Access the Environments dialog box.
- From the main menu, click Geoprocessing > Environments.
- On the tool, click the Environments button.
Expand the Raster Storage section.
Check Build Pyramids.
1. Click the Pyramid Resampling Technique drop-down arrow and click BILINEAR.
2. Click the Pyramid Compression Type drop-down arrow and click LZW.
Check Calculate Statistics.
1. Type 1000 for the x and y skip factors.
This value is derived by dividing the number of columns by 1,000.
Click the Compression drop-down arrow and click LZ77.
Verify that the Tile Size width and height are both 128.

Projection

If the projection isn't defined for each dataset, use the Define Projection tool. For more information, see the blog, My image is in the wrong spot Blog post .

Build pyramids

If the data does not include pyramids and the tiles are large (with the number of rows or columns greater than 5,000), it is recommended that pyramids are created. To determine if a file has pyramids, right-click in the Catalog window or table of contents, click Properties, then look under Raster Information for Pyramids.

When building pyramids for multiple datasets, use the Build Pyramids And Statistics tool. Use the same environment settings as above.

Pyramids require some additional disk space and are written to separate files with an .ovr extension.

FWTools

Download FWTools from http://fwtools.maptools.org and execute the following commands:

gdal_translate.exe -of Gtiff -co "TILED=YES" -co "COMPRESS=LZW" Input.xxx Output.tif
gdaladdo.exe -r average -ro --config TILED YES --config PHOTOMETRIC_OVERVIEW LZW output.tif 2 4 8 16

For 16 bit imagery, insert -co NBITS=12 before Input.xxx.

If the files created are greater than 4 GB, insert either BIGTIFF=YES or BIGTIFF=IF_NEEDED before input.xxx.

Build statistics

Statistics are required for a raster dataset or mosaic dataset to perform some geoprocessing operations or certain tasks in ArcGIS for Desktop applications, such as applying a contrast stretch or classifying data. In this workflow, there is no need to build statistics for each data source, since none will be displayed or used on their own, nor are any functions or any of the products being created reliant on the statistics from the individual datasets. Statistics will be generated for the mosaic datasets for display purposes.

For more information about statistics, see Raster dataset statistics.