FFCV

ffcv.loader module

class ffcv.loader.Loader(fname: str, batch_size: int, num_workers: int = -1, os_cache: bool = True, order: ~typing.Union[~ffcv.traversal_order.base.TraversalOrder, ~typing.Literal[<OrderOption.SEQUENTIAL: 1>, <OrderOption.RANDOM: 2>]] = OrderOption.SEQUENTIAL, distributed: bool = False, seed: ~typing.Optional[int] = None, indices: ~typing.Optional[~typing.Sequence[int]] = None, pipelines: ~typing.Mapping[str, ~typing.Sequence[~typing.Union[~ffcv.pipeline.operation.Operation, torch.nn.Module]]] = {}, custom_fields: ~typing.Mapping[str, ~typing.Type[~ffcv.fields.base.Field]] = {}, drop_last: bool = True, batches_ahead: int = 3, recompile: bool = False)[source]

FFCV loader class that can be used as a drop-in replacement for standard (e.g. PyTorch) data loaders.

Parameters:
  • fname (str) – Full path to the location of the dataset (.beton file format).

  • batch_size (int) – Batch size.

  • num_workers (int) – Number of workers used for data loading. Consider using the actual number of cores instead of the number of threads if you only use JITed augmentations as they usually don’t benefit from hyper-threading.

  • os_cache (bool) – Leverages the operating for caching purposes. This is beneficial when there is enough memory to cache the dataset and/or when multiple processes on the same machine training using the same dataset. See https://docs.ffcv.io/performance_guide.html for more information.

  • order (OrderOption) –

    Traversal order, one of: SEQEUNTIAL, RANDOM, QUASI_RANDOM

    QUASI_RANDOM is a random order that tries to be as uniform as possible while minimizing the amount of data read from the disk. Note that it is mostly useful when os_cache=False. Currently unavailable in distributed mode.

  • distributed (bool) – For distributed training (multiple GPUs). Emulates the behavior of DistributedSampler from PyTorch.

  • seed (int) – Random seed for batch ordering.

  • indices (Sequence[int]) – Subset of dataset by filtering only some indices.

  • pipelines (Mapping[str, Sequence[Union[Operation, torch.nn.Module]]) – Dictionary defining for each field the sequence of Decoders and transforms to apply. Fileds with missing entries will use the default pipeline, which consists of the default decoder and ToTensor(), but a field can also be disabled by explicitly by passing None as its pipeline.

  • custom_fields (Mapping[str, Field]) – Dictonary informing the loader of the types associated to fields that are using a custom type.

  • drop_last (bool) – Drop non-full batch in each iteration.

  • batches_ahead (int) – Number of batches prepared in advance; balances latency and memory.

  • recompile (bool) – Recompile every iteration. This is necessary if the implementation of some augmentations are expected to change during training.

next_traversal_order()[source]
filter(field_name: str, condition: Callable[[Any], bool]) Loader[source]
generate_function_call(pipeline_name, op_id, needs_indices)[source]
generate_stage_code(stage, stage_ix, functions)[source]
generate_code()[source]
class ffcv.loader.OrderOption(value)[source]

An enumeration.

SEQUENTIAL = 1
RANDOM = 2
QUASI_RANDOM = 3