FFCV

Welcome to FFCV’s documentation!

View Homepage or on GitHub.

Install ffcv:

conda create -y -n ffcv python=3.9 cupy pkg-config compilers libjpeg-turbo opencv pytorch torchvision cudatoolkit=11.3 numba -c pytorch -c conda-forge
conda activate ffcv
pip install ffcv

We also provide a Dockerfile that installs ffcv in few steps.

Introduction

ffcv is a drop-in data loading system that dramatically increases data throughput in model training:

  • Train an ImageNet model on one GPU in 35 minutes (98¢/model on AWS)

  • Train a CIFAR-10 model on one GPU in 36 seconds (2¢/model on AWS)

  • Train a $YOUR_DATASET model $REALLY_FAST (for $WAY_LESS)

Keep your training algorithm the same, just replace the data loader! Look at these speedups:

_images/headline.svg

With ffcv, we enable significantly faster training:

_images/perf_scatterplot.svg

See ImageNet Benchmarks for further benchmark details.

See the Features section below for a more detailed glance at what FFCV can do.

Tutorials

We provide a walk-through of basic usage, a performance guide, complete examples (including advanced customizations), as well as detailed benchmarks on ImageNet.

Features

Computer vision or not, FFCV can help make training faster in a variety of resource-constrained settings! Our Performance Guide has a more detailed account of the ways in which FFCV can adapt to different performance bottlenecks.

  • Plug-and-play with any existing training code: Rather than changing aspects of model training itself, FFCV focuses on removing data bottlenecks, which turn out to be a problem everywhere from neural network training to linear regression. This means that:

    • FFCV can be introduced into any existing training code in just a few lines of code (e.g., just swapping out the data loader and optionally the augmentation pipeline);

    • you don’t have to change the model itself to make it faster (e.g., feel free to analyze models without CutMix, Dropout, momentum scheduling, etc.);

    • FFCV can speed up a lot more beyond just neural network training—in fact, the more data-bottlenecked the application (e.g., linear regression, bulk inference, etc.), the faster FFCV will make it!

    See our Getting started guide, Examples, and code examples to see how easy it is to get started!

  • Fast data processing without the pain: FFCV automatically handles data reading, pre-fetching, caching, and transfer between devices in an extremely efficiently way, so that users don’t have to think about it.

  • Automatically fused-and-compiled data processing: By either using pre-written FFCV transformations or easily writing custom ones, users can take advantage of FFCV’s compilation and pipelining abilities, which will automatically fuse and compile simple Python augmentations to machine code using Numba, and schedule them asynchronously to avoid loading delays.

  • Load data fast from RAM, SSD, or networked disk: FFCV exposes user-friendly options that can be adjusted based on the resources available. For example, if a dataset fits into memory, FFCV can cache it at the OS level and ensure that multiple concurrent processes all get fast data access. Otherwise, FFCV can use fast process-level caching and will optimize data loading to minimize the underlying number of disk reads. See The Bottleneck Doctor guide for more information.

  • Training multiple models per GPU: Thanks to fully asynchronous thread-based data loading, you can now interleave training multiple models on the same GPU efficiently, without any data-loading overhead. See this guide for more info.

  • Dedicated tools for image handling: All the features above are equally applicable to all sorts of machine learning models, but FFCV also offers some vision-specific features, such as fast JPEG encoding and decoding, storing datasets as mixtures of raw and compressed images to trade off I/O overhead and compute overhead, etc. See the Working with images guide for more information.

API Reference

Citation

If you use this library in your research, cite it as follows:

@misc{leclerc2022ffcv,
   author = {Guillaume Leclerc and Andrew Ilyas and Logan Engstrom and Sung Min Park and Hadi Salman and Aleksander Madry},
   title = {ffcv},
   year = {2022},
   howpublished = {\url{https://github.com/libffcv/ffcv/}},
   note = {commit xxxxxxx}
}

(Have you used the package and found it useful? Let us know!).

Contributors

Indices and tables