Skip to content

core.data.dataset.dataset_collection

dataset_collection

Classes:

Name Description
DatasetCollection

A collection of datasets for different splits of a dataset.

DatasetCollection(datasets)

Bases: ABC, Generic[TBaseDataset]

A collection of datasets for different splits of a dataset.

This class aggregates datasets for the common splits used in machine learning projects: training, validation, and testing. It provides a convenient way to access and manipulate these datasets as a unified object. The class supports direct access to individual dataset contexts, iteration over all contexts, and collective operations on all contexts, such as downloading assets.

Parameters:

Name Type Description Default

datasets

List[TDataset]

A list of datasets for different splits (train, val, test).

required

Methods:

Name Description
download_all

Downloads all assets and annotations for every dataset in the collection.

Attributes:

Name Type Description
datasets

A dictionary of datasets, indexed by their names.

dataset_path str | None

The path to the dataset directory.

datasets = {dataset.name: datasetfor dataset in datasets} instance-attribute

A dictionary of datasets, indexed by their names.

dataset_path = None instance-attribute

The path to the dataset directory.

download_all(images_destination_dir, annotations_destination_dir, use_id=True, skip_asset_listing=False)

Downloads all assets and annotations for every dataset in the collection.

For each dataset, this method: 1. Downloads the assets (images) to the corresponding image directory. 2. Downloads and builds the COCO annotation file for each dataset.

Parameters:

Name Type Description Default
images_destination_dir
str

The directory where images will be saved.

required
annotations_destination_dir
str

The directory where annotations will be saved.

required
use_id
Optional[bool]

Whether to use asset IDs in the file paths. If None, the internal logic of each dataset will handle it.

True
skip_asset_listing
bool

If True, skips listing the assets when downloading. Defaults to False.

False
Example

If you want to download assets and annotations for both train and validation datasets, this method will create two directories (e.g., train/images, train/annotations, val/images, val/annotations) under the specified destination_path.