Working with Image Data in FFCV¶
Images can often be responsible for the majority of resources (storage and/or compute) consumed by computer vision datasets. FFCV offers a wide range of options to control the storage and retrieval of images, allowing the user to cater to the specific needs of each project and hardware configuration.
Note
This page is specifically about the options and API for writing and reading image data with FFCV—for information about how to choose these options based on your task and systems specifications, The Bottleneck Doctor might be more useful.
Writing image datasets¶
In most machine learning datasets, images are compressed using JPEG then
stored. While this scheme is very space-efficient, decoding JPEG images
requires
significant resources and is usually the bottleneck for loading speed.
Given access to fast
storage (RAM, SSD) in sufficient quantities, other alternatives might be
preferable (see The Bottleneck Doctor for more details).
For the rest of this guide, we’ll assume you’ve already read
Writing a dataset to FFCV format, so you’re familiar with the
ffcv.fields.Field classes as well as
ffcv.writer.DatasetWriter.
Images are supported in FFCV via the ffcv.fields.RGBImageField class.
The first initialization parameter of the RGBImageField is
the write_mode argument, which specifies the format with which to write the
dataset, and can take the following values:
jpg: All the images in the dataset will be stored in JPEG (compressed) format.Warning
JPEG is a lossy file format. The images read from the data loader might be slightly different from the ones passed to the
DatasetWriterraw: All images are stored uncompressed. This dramatically reduces CPU usage at loading time but also requires more storage. Given enough RAM to cache the entirety of the dataset, this will usually yield the best performance.proportion: This will generate a hybrid dataset with a mix ofJPEGandrawimages. Each image will be compressed with probabilitycompress_probability. This option is mostly useful for users who wish to achieve storage/speed trade-offs betweenjpgandraw.smart: This is similar toproportionexcept that an image will be compressed if itsrawrepresentation has area (H x W) more thansmart_threshold. This option is suited for datasets with large variation in image sizes, as it will ensure that a few large outliers do not significantly impact the total dataset size or loading speed.
Next, DatasetWriter supports a jpeg_quality argument which
selects the image quality for images that are JPEG-compressed (this
applies to all values write_mode other than raw). Reducing JPEG quality
will both reduce the size of the file generated and make data loading faster.
Datasets like ImageNet contain images of various sizes.
For many applications, storing full-sized images is unnecessary, and it may be
beneficial to resize the largest images.
The max_resolution argument in the initializer of
DatasetWriter lets you pick an image side length threshold
for which all larger images are resized (while preserving their aspect ratio).
The following code block provides an example of a
DatasetWriter for image data:
writer = DatasetWriter('my_file.beton', {
# Roughly 25% of the images will be stored in raw and the other in jpeg
'image': RGBImageField(
write_mode='proportion', # Randomly compress
compress_probability=0.25, # Compress a random 1/4 of the dataset
max_resolution=(256, 256), # Resize anything above 256 to 256
jpeg_quality=50 # Use 50% quality when compressing an image using JPG
),
'label': IntField()
},
)
Decoding options¶
Other fields offer a single Decoder suited to read data from the dataset file. For images
we currently offer the following options:
SimpleRGBImageDecoder: This is the default decoder used when no pipeline is passed to theLoader. It simply produces the entire image and forwards it to the next operations in the pipeline. Note that as a result, for this decoder to work all images in a dataset need to have the same resolution as they have to fit in the same batch.RandomResizedCropRGBImageDecoder. This decoder will first take a random section of the image and resize it before populating the batch with the image. This decoder is intended to mimic the behavior oftorchvision.transforms.RandomResizedCrop.CenterCropRGBImageDecoder. Similar toRandomResizedCropRGBImageDecoderexcept that it mimicstorchvision.transforms.CenterCrop.
writer = Loader('my_file.beton',
batch_size=15,
num_workers=10
pipelines = {
'image': [RandomResizedCropRGBImageDecoder((224, 224))]
'other_image_field': [CenterCropRGBImageDecoder((224, 224), 224/256)]
}
)