Export Training Data For Deep Learning—Help

Available with Spatial Analyst license.

Summary
Usage
Syntax
Code sample
Environments
Licensing information

Summary

Uses a remote sensing image to convert labeled vector or raster data into deep learning training datasets. The output is a folder of image chips, and a folder of metadata files in the specified format.

Usage

This tool will create training datasets to support third party deep learning applications, such as Google TensorFlow or Microsoft CNTK.
Use your existing classification training sample data, or GIS feature class data such as a building footprint layer, to generate image chips containing the class sample from your source image. Image chips are often 256 pixel rows by 256 pixel columns, unless the training sample size is larger.
Deep learning class training samples are based on small subimages containing the feature or class of interest, called an image chip.

Syntax

ExportTrainingDataForDeepLearning (in_raster, out_folder, in_class_data, image_chip_format, {tile_size_x}, {tile_size_y}, {stride_x}, {stride_y}, {output_nofeature_tiles}, {metadata_format}, {start_index})

Parameter Explanation Data Type

in_raster

Input source imagery, typically multispectral imagery.

Examples of the type of input source imagery include multispectral satellite, drone, aerial or National Agriculture Imagery Program (NAIP) imagery.

Raster Dataset; Raster Layer

out_folder

Specify a folder where the output image chips and metadata will be stored.

Directory

in_class_data

Labeled data, either in vector or raster form.

Vector inputs should follow a training sample format as generated by the ArcGIS Desktop Image Classification toolbar.

Raster inputs should follow a classified raster format as generated by the Classify Raster tool.

Feature Dataset; Feature Layer; Raster Dataset; Raster Layer

image_chip_format

The raster format for the image chip outputs.

TIFF —TIFF format
PNG —PNG format
JPEG —JPEG format
MRF —MRF (Meta Raster Format)

String

tile_size_x

(Optional)

The size of the image chips, for the X dimension.

Long

tile_size_y

(Optional)

The size of the image chips, for the Y dimension.

Long

stride_x

(Optional)

The distance to move in the X when creating the next image chip.

When stride is equal to the tile size, there will be no overlap. When stride is equal to half of the tile size, there will be 50% overlap.

Long

stride_y

(Optional)

The distance to move in the Y when creating the next image chip.

When stride is equal to the tile size, there will be no overlap. When stride is equal to half of the tile size, there will be 50% overlap.

Long

output_nofeature_tiles

(Optional)

Choose if the image chips with overlapped labeled data will be exported.

ALL_TILES —Export all the image chips, including those that do not overlap labeled data. This is the default.
ONLY_TILES_WITH_FEATURES —Export only the image chips that overlap the labelled data.

Boolean

metadata_format

(Optional)

The format of the output metadata labels. There are 3 options for output metadata labels for the training data, KITTI Rectangles, PASCAL VOC rectangles and Classified Tiles (a class map). If your input training sample data is a feature class layer such as building layer or standard classification training sample file, use the KITTI or PASCAL VOC rectangle option. The output metadata is a .txt file or .xml file containing the training sample data contained in the minimum bounding rectangle. The name of the metadata file matches the input source image name. If your input training sample data is a class map, use the Classified Tiles as your output metadata format option.

KITTI_rectangles —The metadata follows the same format as the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) Object Detection Evaluation dataset. The KITTI dataset is a vision benchmark suite. This is the default.The label files are plain text files. All values, both numerical or strings, are separated by spaces, and each row corresponds to one object.
PASCAL_VOC_rectangles —The metadata follows the same format as the Pattern Analysis, Statistical Modeling and Computational Learning, Visual Object Classes (PASCAL_VOC) dataset. The PASCAL VOC dataset is a standardized image data set for object class recognition.The label files are XML files and contain information about image name, class value, and bounding box(es).
Classified_Tiles —This option will output one classified image chip per input image chip. No other meta data for each image chip. Only the statistics output has more information on the classes such as class names, class values, and output statistics.

The table below describes the 15 values in the KITTI metadata format. Only five of the possible fifteen values are used in the tool; the class name (in column 1), and the minimum bounding rectangle comprised of four image coordinate locations (columns 5-8). The minimum bounding rectangle encompasses the training chip used in the deep learning classifier.


Columns	Name	Description
1	Class value	The class value of the object, listed in stats.txt file.
2	Unused
3	Unused
4	Unused
5 - 8	Bbox	The two-dimensional bounding box of object in the image, based on a 0-based image space coordinate index. The bounding box contains the four coordinates for the left, top, right, bottom pixel.
9 - 11	Unused
12 - 14	Unused
15	Unused

For more information, see KITTI metadata format .

An example of the PASCAL VOC is shown below.

<?xml version=”1.0”?>
- <layout>
      <image>000000000</image>
      <object>1</object>
    - <part>
         <class>1</class>
       - <bndbox>
            <xmin>31.85</xmin>
            <ymin>101.52</ymin>
            <xmax>256.00</xmax>
            <ymax>256.00</ymax>
         </bndbox>
      </part>
  </layout>

For more information, see PASCAL Visual Object Classes.

String

start_index

(Optional)

Allows you to set the start index for the sequence of image chips. This lets you append more image chips to an existing sequence. The default value is 0.

Long

Code sample

ExportTrainingDataForDeepLearning example 1 (Python window)

This example creates training samples for deep learning.

from arcpy.sa import *

ExportTrainingDataForDeepLearning("c:/test/image.tif", "c:/test/outfolder", 
                                 "c:/test/training.shp", "TIFF", "256", 
                                 "256", "128", "128", "NO", "KITTI_rectangles")

ExportTrainingDataForDeepLearning example 2 (stand-alone script)

This example creates training samples for deep learning.

# Import system modules and check out ArcGIS Spatial Analyst extension license
import arcpy
arcpy.CheckOutExtension("Spatial")
from arcpy.sa import *

# Set local variables
inRaster = "c:/test/image.tif"
out_folder = "c:/test/outfolder"
in_training = "c:/test/training.shp"
image_chip_format = "TIFF"
tile_size_x = "256"
tile_size_y = "256"
stride_x="128"
stride_y="128"
output_nofeature_tiles="NO"
metadata_format="KITTI_rectangles"

# Execute 
ExportTrainingDataForDeepLearning(inRaster, out_folder, in_training, 
                                 image_chip_format,tile_size_x, tile_size_y, 
                                 stride_x, stride_y,output_nofeature_tiles, 
                                 metadata_format)

Environments

Extent

Licensing information

ArcGIS Desktop Basic: Requires Spatial Analyst
ArcGIS Desktop Standard: Requires Spatial Analyst
ArcGIS Desktop Advanced: Requires Spatial Analyst

ArcMap

Export Training Data For Deep Learning

Summary

Usage

Syntax

Code sample

ExportTrainingDataForDeepLearning example 1 (Python window)

ExportTrainingDataForDeepLearning example 2 (stand-alone script)

Environments

Licensing information

Related topics