Raster data structures and storage models
Image and raster data is usually stored in their original form. Rarely will you edit individual pixel values like you might edit a feature in a vector dataset. You often process this data to create new forms that can be processed on the fly or saved as another version. These datasets, and collections of them, are often very large, so having good management capabilities is critical and ArcGIS Desktop is designed to do this.
There are three methods to store image and raster data: as files in a file system, within a geodatabase, or managed from within the geodatabase but stored in a file system. This decision also involves determining whether to store all the data in a single dataset or in a catalog of potentially many datasets. If you're choosing to store the data in a file system, you're choosing to store raster datasets, whereas a geodatabase can store either raster datasets or mosaic datasets. A third geodatabase option is the raster catalog; this option is not discussed below because it has been superceded by the mosaic dataset, which has many more capabilities, uses, and functions.
Raster datasets
Most imagery and raster data (such as an orthoimage or DEM) is provided as a raster dataset. The term raster dataset refers to any raster model that is stored on disk or accessible as single raster stored in cloud storage. It could also be stored in a geodatabase, but storing rasters inside a database is not recommended. A raster dataset is the most basic raster data storage model on which the others are built—mosaic datasets manage raster datasets. It's also the output from many geoprocessing tools that process raster data. Below is an example of a raster dataset.
A raster dataset is any valid raster format organized into one or more bands. Each band consists of an array of pixels (cells), and each pixel has a value. A raster dataset has at least one band. ArcGIS Desktop supports more than 70 different file formats for raster datasets, including TIFF, JPEG 2000, Esri Grid, and MrSid.
Mosaic datasets
A mosaic dataset is a collection of raster datasets (images) stored as a catalog and viewed or accessed as a single mosaicked image or individual images (rasters). These collections can be extremely large in both total file size and number of datasets. The raster datasets in a mosaic dataset can remain in their native format on disk or exist in the geodatabase. The metadata can be managed within the raster's record as well as attributes in the attribute table. Storing metadata as attributes enables parameters such as sensor orientation data to be managed more easily and allows fast queries to enable selections.
The data in a mosaic dataset does not have to be adjoining or overlapping but can exist as unconnected, discontinuous datasets. For example, you can have images that completely cover an area, or you can have many strips of images that may not join together to form a continuous image (such as along pipelines).
The data can even be completely or partially overlapping but captured over different dates. The mosaic dataset is an ideal dataset for storing temporal data. You can query the mosaic dataset for the images you need based on time or date and use a mosaic method to display the mosaicked image according to a time or date attribute.
Mosaic datasets are not limited to one particular type of raster data. You can add raster data from different sensor systems in different projections, resolutions, pixel depths, and numbers of bands. Overviews (like pyramids) can be generated for the entire data collection. This allows for faster viewing of the data and allows you to quickly serve these datasets. There are also many additional properties for viewing, including setting a mosaicking method, that make these datasets unique and functional in many situations. You can also query a mosaic dataset based on your spatial and nonspatial query constraints. The results of that query could be a set of images that you can process one by one, or it could be a dynamically generated mosaicked image.
In addition to raster data, you can store and manage lidar data in a mosaic dataset in the same way as raster datasets and even together with raster datasets. The lidar data can be stored in the file system as LAS files or LAS datasets, or in a geodatabase as a terrain dataset.
Compare raster data storage models
Storing the raster datasets individually is often the best method when the datasets are not adjacent to each other or are rarely used on the same project. Mosaicking your inputs together to form one large, single extent of raster data file is appropriate for many applications, but a mosaic dataset may be desired for many reasons:
- The extents of the raster datasets partially or fully overlap and you want the common areas to be preserved.
- The raster datasets represent a collection of observations of the same area at different times in a time series.
- You only want to display your study area, not the entire collection of images.
- You want to manage a collection of images as an integrated set but keep their individual states.
- You want to record and manage additional attribute columns that describe each image.
Compare the raster data storage models
Raster dataset | Mosaic dataset | |
---|---|---|
Description | A single picture of an object or a seamless image covering a spatially continuous area. This may be a single original image or the result of many images merged (mosaicked) together. | A collection of raster datasets stored as a catalog that allows you to store, manage, view, and query collections of raster and lidar data. It is viewed as a mosaicked image, but you have access to each dataset as an item in the collection. |
Storage | As a file on disk or within a geodatabase. | Within a geodatabase, but can reference the source data stored as a file on disk. |
Map layers | One map layer. | One map layer. |
Homogeneous or heterogeneous data | Homogeneous data: a single format, data type, and file. | Heterogeneous data: multiple formats, data types, file sizes, and coordinate systems. |
Metadata | Stored once and applies to complete dataset. | Can be stored within the raster record and as attributes in the attribute table. |
Downsampled datasets | Overviews of the entire raster dataset. | Pyramids for each raster dataset, as well as overviews (such as a pyramid) for the entire collection. |
Geoprocessing and image analysis |
|
|
Pros |
|
|
Cons | File and personal geodatabase raster datasets are slower to update because the entire file has to be rewritten. | Overviews can take time to generate. |
Serving | Can be served directly as an image service. | Can be served directly as an image service. |
Recommendations | Use raster datasets when overlaps between mosaicked images do not need to be retained and for fast display of large quantities of raster data. | Use a mosaic dataset for managing and visualizing raster and lidar data. It's good for multidimensional data, querying, storing metadata, and overlapping data, and it provides a good hybrid solution. |
Raster data storage in the geodatabase
Store raster data in the geodatabase when you want to manage rasters, add behavior, and control the schema; when you want to manage a well-defined set of raster datasets as part of your DBMS; and when you require a single architecture for managing all your content. There are three main types of geodatabases: enterprise, personal, and file.
The enterprise geodatabase can support multiple operations within its DBMS. File geodatabases (like the personal geodatabases) are designed to be edited by a single user and do not support versioning. They reside in your file system directory; thus they do not require a password for access. The file geodatabases and enterprise geodatabases share the same basic storage schema.
Compare raster storage in file, enterprise, and personal geodatabases
Raster storage characteristic | File geodatabase | Enterprise geodatabase | Personal geodatabase |
---|---|---|---|
Size limit | 1 TB for each raster dataset |
Unlimited; limit dependent on DBMS limits | 2 gigabytes (GB) per geodatabase (This is a table size limit, not a limit on the raster dataset size.) |
Raster dataset file format | File geodatabase raster dataset |
Enterprise geodatabase raster dataset | ERDAS IMAGINE, JPEG, or JPEG 2000 |
Storage |
|
|
|
Stored in the file system |
Stored in an RDBMS | Stored in Microsoft Access | |
Compression | LZ77, JPEG, JPEG 2000, or None |
LZ77, JPEG, JPEG 2000, or None | LZ77, JPEG, JPEG 2000, or None |
Pyramids | Supports partial pyramiding |
Supports partial pyramiding | Rebuilds entire pyramid |
Mosaicking | Allows you to append to a raster dataset when mosaicking |
Allows you to append to a raster dataset when mosaicking | Rewrites a new dataset every time you mosaic to a raster dataset |
Updating | Allows incremental updating |
Allows incremental updating | n/a |
Number of users | Single user and small workgroups; some readers and one writer |
Multiuser; many users and many writers | Single user and small workgroups; some readers and one writer |
File geodatabases
The storage model of the file geodatabase is a hybrid of the enterprise geodatabase and the personal geodatabase, where managed raster data follows the storage model of the enterprise geodatabase and unmanaged raster data follows the storage model of the personal geodatabase. File geodatabases are also similar to personal geodatabases because they are designed to be edited by a single user and do not support versioning. They reside in your file system directory; thus, they do not require a password for access. File and enterprise geodatabases share the same basic storage schema.
A file geodatabase has several advantages over the use of a personal geodatabase. Like the enterprise geodatabase, the file geodatabase stores data in blocks. This provides more efficient access to data—especially during the mosaic operation. When mosaicking data in a file geodatabase, only overlapping blocks are updated. If an overlapping block does not exist, a new block is inserted. Partial blocks are padded with NoData pixels. In addition, the file geodatabase (and enterprise) storage model employs partial pyramid updates, which saves time. Also, the data structures of file and enterprise geodatabases are the same—fast copy technology is used to copy and paste data between them.
File geodatabases also accept configuration keywords, but unlike with enterprise geodatabases, the configuration keywords have a standard predefined value. For more information about configuration keywords, see Configuration keywords for file geodatabases.
Enterprise geodatabases
When raster data is stored within an enterprise geodatabase, it offers an enterprise level of functionality, such as security, multiuser access, and data sharing. There are three main reasons to store your raster data as an enterprise geodatabase:
- It will not be updated very regularly (such as every two or three years or longer).
- It will be accessed in read-only use cases (such as using it as basemap data under vector data).
- Hundreds of users (or more) will access it as a basemap.
Because of its storage structure, the raster data is said to be managed, or fully controlled, by the geodatabase. Enterprise geodatabases always store all the raster information (pixels, spatial reference, any associated table, and other metadata) for raster datasets and raster attributes within the associated relational database. This means that all input raster information is loaded into the database and can be thought of as a format conversion.
The enterprise geodatabase evenly tiles the bands into blocks of pixels according to a user-defined dimension (the default is 128 by 128). Tiling the raster band data enables efficient storage and retrieval of the raster data. The pyramid information is stored according to a declining resolution. The height of the pyramid is determined by the number of levels specified by the application or user.
The raster blocks table (the largest table and the one that stores the actual pixel information and pyramids) stores one row per block (tile) per band in a raster dataset and per pyramid level. For example, a three-band raster divided into 12 blocks with no pyramids built will have 36 rows in the BLK table—12 separate blocks for each of the bands. The column containing the pixel data for the block is a binary large object (BLOB).
Enterprise geodatabases for Oracle, PostgreSQL, and SQL Server
Starting at ArcGIS Desktop 10.5, mosaic datasets created in Oracle, PostgreSQL, or SQL Server geodatabases will be created with the new RASTER_STORAGE keyword called RASTERBLOB. The RASTERBLOB keyword implements an efficient transfer of the mosaic dataset catalog items to the DBMS.
Mosaic datasets created with RASTERBLOB cannot be opened with earlier versions of the software. If you want to create mosaic datasets that are backward compatible with earlier versions, you will need to alter the configuration keyword for RASTER_STORAGE to one of the following compatible keywords:
- BINARY for PostgreSQL and SQL Server
- BLOB for Oracle.
Personal geodatabases
In a personal geodatabase, the raster dataset is converted to an IMAGINE (.img) file and stored inside an image database (IDB) folder. The IDB folder is located in the directory next to the personal geodatabase. When you delete a raster dataset, the raster in the IDB folder is permanently deleted.
When storing a mosaic dataset in a personal geodatabase, the mosaic dataset is a table that points to the stored raster datasets it contains. In a mosaic dataset, the raster datasets are stored as unmanaged; therefore, it contains the path to the location where the raster datasets are stored. Each row in the business table points to the stored raster dataset. The operations on a mosaic dataset do not affect the stored raster files; therefore, if you delete the raster datasets in a mosaic dataset, they will only be deleted from the mosaic dataset and not from the source data on disk.
When storing a raster dataset as an attribute, the raster is stored as an IMG file in the system-defined location or as it is in the file system; this depends on whether it is managed or not.
Compression, pyramids, and tile size
There are other storage structures to consider when storing and managing raster data, including compression, downsampled datasets (pyramids and overviews), and tile size.
Compression
There are two types of compression: lossless and lossy. Lossless compression means the values of pixels in the raster dataset are not changed, whereas lossy compression results in altered pixel values. The amount of compression depends on the type of pixel data; the more homogeneous the image, the higher the compression ratio. You should store data that will be used for analysis, not just display, using a lossless compression. The primary benefit of compressing your data is that it requires less storage space; the amount of savings depends on the method of compression and the redundancy in the data. An added benefit is the overwhelmingly improved performance because you are transferring fewer packets of data. For example, when accessing raster data over a network with low bandwidth, the use of compression can offer improved performance because the amount of information to be transferred is reduced significantly, making it possible to store large, seamless raster datasets and serve them quickly to a client for display.
Learn more about raster compression
Mosaic datasets also have compression. This is not for storage of the raster dataset being managed, but it is for the compression applied to the image it generates when displayed. This also aids in accessing data over the network by reducing the size of the file that is transferred.
Downsampled datasets
Downsampled datasets are rasters created from the original data for either raster datasets or mosaic datasets. They are generated to improve display speed and performance. When they are created for raster datasets, they are called pyramids, and when they are created for mosaic datasets, they are called overviews.
Pyramids versus overviews
Pyramids | Overviews | |
---|---|---|
Created for: | Raster datasets | Mosaic datasets |
Format | Writes .ovr files—with a few exceptions. Reads pyramids stored externally as .ovr or .rrd or internally. | Writes as .tif files. |
Storage | In a single file that generally resides next to the source raster dataset using the same name. | By default, in a folder next to the geodatabase with the *.overviews extension, or internally for enterprise geodatabases. Storage location is customizable. |
Storage size | 2 to 10 percent (compared to original raster datasets) | |
Downsampling factor | 2 | 3 |
Extent |
|
|
Options when building |
|
|
Tile size
In an enterprise geodatabase, raster data is stored in a structure where the data is tiled, indexed, pyramided, and most often compressed. Because of tiling, indexing, and pyramiding, each time the raster data is queried, only the tiles necessary to satisfy the extent and resolution of the query are returned instead of the whole dataset. The tile size controls the number of pixels you want to store in each database memory block. This is specified as a number of pixels in x and y. The default tile size is 128 by 128 pixels, and most applications do not warrant deviating from these default values. In an enterprise geodatabase, the tiles of raster data are compressed before storing them in the geodatabase.