Skip to content

core.data.dataset.yolo_dataset

yolo_dataset

Classes:

Name Description
YoloDataset

A specialized dataset for handling YOLO-formatted annotations.

YoloDataset(name, dataset_version, assets=None, labelmap=None)

Bases: BaseDataset

A specialized dataset for handling YOLO-formatted annotations.

This class provides methods to download, process, and unzip YOLO annotations in batches, making it easier to handle large datasets for object detection tasks.

Parameters:

Name Type Description Default

name

str

The name of the dataset.

required

dataset_version

DatasetVersion

The version of the dataset to work with.

required

assets

Optional[MultiAsset]

Preloaded assets, if available.

None

labelmap

Optional[Dict[str, Label]]

Mapping of labels for the dataset.

None

Methods:

Name Description
download_annotations

Downloads YOLO annotations for the dataset in batches.

unzip

Extracts the contents of a ZIP file into the specified destination directory.

download_assets

Downloads all assets (e.g., images) associated with the dataset to the specified directory.

get_assets_batch

Retrieves a batch of assets from the dataset based on the specified limit and offset.

Attributes:

Name Type Description
name

The name of the dataset.

dataset_version

The version of the dataset from Picsellia.

assets

A preloaded collection of assets. If not provided, assets will be dynamically listed.

labelmap
images_dir str | None

The local directory where image assets are downloaded.

annotations_dir str | None

The local directory where annotation files are stored.

name = name instance-attribute

The name of the dataset.

dataset_version = dataset_version instance-attribute

The version of the dataset from Picsellia.

assets = assets instance-attribute

A preloaded collection of assets. If not provided, assets will be dynamically listed.

labelmap = get_labelmap(dataset_version=dataset_version) instance-attribute

images_dir = None instance-attribute

The local directory where image assets are downloaded.

annotations_dir = None instance-attribute

The local directory where annotation files are stored.

download_annotations(destination_dir, use_id=True)

Downloads YOLO annotations for the dataset in batches.

This method retrieves YOLO annotation files in batches, unzips them, and saves the contents to the specified directory.

Parameters:

Name Type Description Default
destination_dir
str

The directory where annotations will be saved.

required
use_id
Optional[bool]

Whether to use asset IDs in file paths (default: True).

True

unzip(zip_path, destination_path)

Extracts the contents of a ZIP file into the specified destination directory.

This method removes the original ZIP file after extraction and cleans up any empty directories.

Parameters:

Name Type Description Default
zip_path
str

The full path to the ZIP file.

required
destination_path
str

The directory where the contents will be extracted.

required

download_assets(destination_dir, use_id=True, skip_asset_listing=False)

Downloads all assets (e.g., images) associated with the dataset to the specified directory.

This method retrieves and downloads all the assets linked to the dataset version. If assets are preloaded, they are directly downloaded; otherwise, the method dynamically lists and downloads them from the dataset version.

Parameters:

Name Type Description Default
destination_dir
str

The directory where assets will be saved locally.

required
use_id
Optional[bool]

If True, uses asset IDs to generate file paths. Defaults to True.

True
skip_asset_listing
Optional[bool]

If True, skips listing assets after downloading. Defaults to False.

False
Side Effects
  • Creates the destination_path directory if it doesn't already exist.
  • Sets self.images_dir to the destination_path.

get_assets_batch(limit, offset)

Retrieves a batch of assets from the dataset based on the specified limit and offset.

This method is useful for processing large datasets in smaller chunks.

Parameters:

Name Type Description Default
limit
int

The maximum number of assets to retrieve in the batch.

required
offset
int

The starting index for asset retrieval.

required

Returns:

Name Type Description
MultiAsset MultiAsset

A collection of assets retrieved from the dataset.