When you create a dataset in a file geodatabase, you can choose a configuration keyword to customize how the data is stored. Each keyword optimizes storage for a particular type of data, slightly improving storage efficiency and performance. There are seven keywords available. These keywords cannot be customized.
In most cases, you will specify the DEFAULTS keyword when you create a feature class or raster in a file geodatabase. DEFAULTS works well in most cases. The only exceptions include the following:
- If you store a large raster dataset that exceeds 1 TB in size, specify the MAX_FILE_SIZE_256TB keyword.
- If you store character data that uses a non-Latin alphabet, such as Chinese or Arabic, specify the TEXT_UTF16 keyword.
- If you store terrain datasets in the file geodatabase, specify the GEOMETRY_OUTOFLINE keyword.
- If you store terrain datasets that also contain large BLOB columns, specify the GEOMETRY_AND_BLOB_OUTOFLINE keyword.
- If you store a feature class that contains large BLOB columns that you will not access often, specify the BLOB_OUTOFLINE configuration keyword.
If you do not specify any configuration keyword, DEFAULTS is used.
This keyword | How it affects data storage |
---|---|
DEFAULTS | Stores data up to 1 TB in size Text is stored in UTF-8 format. |
TEXT_UTF16 | Stores data up to 1 TB in size Text is stored in UTF16 format. |
MAX_FILE_SIZE_4GB | Limits data size to 4 GB Text is stored in UTF-8 format. |
MAX_FILE_SIZE_256TB | Stores data up to 256 TB in size Text is stored in UTF-8 format. |
GEOMETRY_OUTOFLINE | Stores data up to 1 TB in size Text is stored in UTF-8 format. Stores the geometry attribute in a file separate from the nonspatial attributes |
BLOB_OUTOFLINE | Stores data up to 1 TB in size Text is stored in UTF-8 format. Stores BLOB attributes in a file separate from the rest of the attributes |
GEOMETRY_AND_BLOB_OUTOFLINE | Stores data up to 1 TB in size Text is stored in UTF-8 format. Stores both geometry and BLOB attributes in files separate from the rest of the attributes |
Text storage: UTF8 vs. UTF16
UTF8 is the most efficient storage format if your text data is in English; another Western European language; or any other language that uses the Latin alphabet, such as Polish, Turkish, or Indonesian. UTF8 stores each nonaccented Latin character in just 1 byte and each accented character or any other character not found in the Latin alphabet in a variable number of bytes, ranging from 2 to 6. Since UTF8 stores the vast majority of text characters in just 1 byte, it results in lower storage requirements and improved performance for these languages.
UTF16 is the most efficient storage format for text data in a non-Latin alphabet, such as Chinese, Japanese, Korean, Russian, Greek, or Arabic. For these languages, this format uses just 2 bytes per character. The UTF8 representation of the same character might use up to 6 bytes, which would increase storage requirements and slightly slow performance for these languages. This method of storing text is only available with the TEXT_UTF16 keyword that comes with a 1 TB size limit.
MAX_FILE_SIZE_4GB
This keyword stores datasets that are less than 4GB in size slightly more efficiently than the DEFAULTS keyword, although the size savings is relatively insignificant at 1 byte per record, or about 1 MB per million records. As an example, all the roads in California (2,092,079 records) store as 312 MB with the DEFAULTS keyword and 310 MB with the MAX_FILE_SIZE_4GB keyword.
This keyword restricts a dataset to a maximum size of 4 GB, so specify it only if you know a feature class or raster dataset will never need to grow beyond this size.
MAX_FILE_SIZE_256TB
Specifying the MAX_FILE_SIZE_256TB configuration keyword allows you to create a dataset that is up to 256 TB in size. You would normally only specify this keyword to store a large raster dataset.
In-line vs. out-of-line storage
Storing data inline means all the attributes are in the same file or virtual table in the file geodatabase. When you store data out of line, it is stored in a different object.
If all the data is stored inline, it is loaded into memory when you query or edit the feature class. Therefore, if the feature class contains attributes that use a lot of storage space, it can take a long time to load into memory and requires a larger buffer to store in memory.
Geometry and BLOB attribute types have the potential to store large amounts of data. For example, if many of the features in the feature class contain thousands of vertices, you might want to store the geometry out of line. Or, if your attribute data is large (made up of several text columns or large BLOB columns), you might want to store your geometry out of line so when you access the geometry, you don't automatically have to pull all the attribute information into memory. If you store either or both geometry or BLOB types out of line, they are only loaded into memory when the application requests them. For example, if you select features in ArcMap based on the BLOB value, the BLOB attributes will be loaded into memory.
If your feature class will contain large BLOB attributes, you can specify the BLOB_OUTOFLINE keyword when you create the feature class. Then, the BLOB attribute will only be loaded if you query it.