FFCV

Making an FFCV dataloader

After writing an FFCV dataset, we are ready to start loading data (and training models)! We’ll continue using the same regression dataset as the previous guide, and we’ll assume that the dataset has been written to /path/to/dataset.beton.

In order to load the dataset that we’ve written, we’ll need the ffcv.loader.Loader class (which will do most of the heavy lifting), and a set of decoders corresponding to the fields present in the dataset (so in our case, we will use the FloatDecoder and NDArrayDecoder classes):

from ffcv.loader import Loader, OrderOption
from ffcv.fields.decoders import NDArrayDecoder, FloatDecoder

Our first step is instantiating the Loader class:

loader = Loader('/path/to/dataset.beton',
                batch_size=BATCH_SIZE,
                num_workers=NUM_WORKERS,
                order=ORDERING,
                pipelines=PIPELINES)

In order to create a loader, we need to specify a path to the FFCV dataset, batch size, number of workers, as well as two less standard arguments, order and pipelines, which we discuss below:

Dataset ordering

The order option in the loader initialization is similar to PyTorch DataLoader’s shuffle option, with some additional options. This argument takes an enum provided by ffcv.loader.OrderOption:

from ffcv.loader import OrderOption

# Truly random shuffling (shuffle=True in PyTorch)
ORDERING = OrderOption.RANDOM

# Unshuffled (i.e., served in the order the dataset was written)
ORDERING = OrderOption.SEQUENTIAL

# Memory-efficient but not truly random loading
# Speeds up loading over RANDOM when the whole dataset does not fit in RAM!
ORDERING = OrderOption.QUASI_RANDOM

Note

order options require different amounts of RAM, thus should be used considering how much RAM available in a case-by-case basis.

  • RANDOM requires RAM the most since it will have to cache the entire dataset to sample perfectly at random. If the available RAM is not enough, it will throw an exception.

  • QUASI_RANDOM requires much less RAM than RANDOM, but a bit more than SEQUENTIAL, in order to cache a part of samples. It is used when the entire dataset can not fit RAM.

  • SEQUENTIAL requires least RAM. It only keeps several samples loaded ahead of time used in incoming training iterations.

Pipelines

The pipeline option in Loader specifies the dataset and tells the loader what fields to read, how to read them, and what operations to apply on top. Specifically, a pipeline is a key-value dictionary where the key matches the one used in writing the dataset, and the value is a sequence of operations to perform. The operations must start with a ffcv.fields.decoders.Decoder object corresponding to that field followed by a sequence of transforms. For example, the following pipeline reads the fields and then converts each one to a PyTorch tensor:

from ffcv.transforms import ToTensor

PIPELINES = {
  'covariate': [NDArrayDecoder(), ToTensor()],
  'label': [FloatDecoder(), ToTensor()]
}

This is already enough to start loading data, but pipelines are also our opportunity to apply fast pre-processing to the data through a series of transformations—transforms are automatically compiled to machine code at runtime and, for GPU-intensive applications like training neural networks, can reduce any additional training overhead.

Note

In fact, declaring field pipelines is optional: for any field that exists in the dataset file without a corresponding pipeline specified in the pipelines dictionary, the Loader will default to the bare-bones pipeline above, i.e., first a decoder then a conversion to PyTorch tensor. (You can force FFCV to explicitly not load a field by adding a corresponding None entry to the pipelines dictionary.)

If the entire pipelines argument is unspecified, this bare-bones pipeline will be applied to all fields.

Transforms

There are three easy ways to specify transformations in a pipeline:

  • A set of standard transformations in the ffcv.transforms module. These include standard image data augmentations such as RandomHorizontalFlip and Cutout.

  • Any subclass of torch.nn.Module: FFCV automatically converts them into an operation.

  • Custom transformations: you can implement your own by subclassing ffcv.transforms.Operation, as discussed in the Making custom transforms guide.

The following shows an example of a full pipeline for a vector field starts with the field decoder, NDArrayDecoder, followed by conversion to torch.Tensor, and a custom transform implemented as a torch.nn.Module that adds Gaussian noise to each vector:

class AddGaussianNoise(ch.nn.Module):
    def __init__(self, scale=1):
        super(AddGaussianNoise, self).__init__()
        self.scale = scale

    def forward(self, x):
        return x + ch.randn_like(x) * self.scale

pipeline: List[Operation] = [
    NDArrayDecoder(),
    ToTensor(),
    AddGaussianNoise(0.1)
]

For an example of a different field, this could be a pipeline for an RGBImageField:

image_pipeline: List[Operation] = [
    SimpleRGBImageDecoder(),
    RandomHorizontalFlip(),
    torchvision.transforms.ColorJitter(.4,.4,.4),
    RandomTranslate(padding=2),
    ToTensor(),
    ToDevice('cuda:0', non_blocking=True),
    ToTorchImage(),
    Convert(ch.float16),
    torchvision.transforms.Normalize(MEAN, STD), # Normalize using image statistics
])

Putting together

Back to our running linear regression dataset example, in summary the final loader can be constructed as follows:

loader = Loader('/path/to/dataset.beton',
                batch_size=BATCH_SIZE,
                num_workers=NUM_WORKERS,
                order=OrderOption.RANDOM,
                pipelines={
                  'covariate': [NDArrayDecoder(), ToTensor(), AddGaussianNoise(0.1)],
                  'label': [FloatDecoder(), ToTensor()]
                })

Other options

You can also specify the following additional options when constructing an ffcv.loader.Loader:

  • os_cache: If True, the OS automatically determines whether the dataset is held in memory or not, depending on available RAM. If False, FFCV manages the caching, and the amount of RAM needed depends on order option.

  • distributed: For training on multiple GPUs

  • seed: Specify the random seed for batch ordering

  • indices: Provide indices to load a subset of the dataset

  • custom_fields: For specifying decoders for fields with custom encoders

  • drop_last: If True, drops the last non-full batch from each iteration

  • batches_ahead: Set the number of batches prepared in advance. Increasing it absorbs variation in processing time to make sure the training loop does not stall for too long to process batches. Decreasing it reduces RAM usage.

  • recompile: Recompile every iteration. Useful if you have transforms that change their behavior from epoch to epoch, for instance code that uses the shape as a compile time param. (But if they just change their memory usage, e.g., the resolution changes, it’s not necessary.)

More information

For information on available transforms and the Loader class, see our API Reference.

For examples of constructing loaders and using them, see the tutorials Training CIFAR-10 in 36 seconds on a single A100 and Large-Scale Linear Regression.