ffcv.writer module

ffcv.writer.from_shard(shard, pipeline)[source]
ffcv.writer.count_samples_in_shard(shard, pipeline)[source]
ffcv.writer.handle_sample(sample, dest_ix, field_names, metadata, allocator, fields)[source]
ffcv.writer.worker_job_webdataset(input_queue, metadata_sm, metadata_type, fields, allocator, done_number, allocations_queue, pipeline)[source]
ffcv.writer.worker_job_indexed_dataset(input_queue, metadata_sm, metadata_type, fields, allocator, done_number, allocations_queue, dataset)[source]
class ffcv.writer.DatasetWriter(fname: str, fields: Mapping[str, Field], page_size: int = 8388608, num_workers: int = -1)[source]

Writes given dataset into FFCV format (.beton). Supports indexable objects (e.g., PyTorch Datasets) and webdataset.

  • fname (str) – File name to store dataset in FFCV format (.beton)

  • fields (Mapping[str, Field]) – Map from keys to Field’s (order matters!)

  • page_size (int) – Page size used internally

  • num_workers (int) – Number of processes to use

from_indexed_dataset(dataset, indices: Optional[List[int]] = None, chunksize=100, shuffle_indices: bool = False)[source]

Read dataset from an indexable dataset. See https://docs.ffcv.io/writing_datasets.html#indexable-dataset for sample usage.

  • dataset (Indexable) – An indexable object that implements __getitem__ and __len__.

  • indices (List[int]) – Use a subset of the dataset specified by indices.

  • chunksize (int) – Size of chunks processed by each worker during conversion.

  • shuffle_indices (bool) – Shuffle order of the dataset.

from_webdataset(shards: List[str], pipeline: Callable)[source]

Read from webdataset-like format. See https://docs.ffcv.io/writing_datasets.html#webdataset for sample usage.

  • shards (List[str]) – List of shards that comprise the dataset folder.

  • pipeline (Callable) – Called by each worker to decode. Similar to pipelines used to load webdataset.