Skip to content

core.data.dataset.base_dataset

base_dataset

Classes:

Name Description
BaseDataset

A base class to manage the context of a dataset, including metadata, paths,

BaseDataset(name, dataset_version, assets=None, labelmap=None)

A base class to manage the context of a dataset, including metadata, paths, assets, and annotation management.

This class provides methods to handle dataset assets and annotations, ensuring compatibility with the Picsellia SDK. Subclasses should implement the download_annotations method to manage annotation-specific logic.

Parameters:

Name Type Description Default

name

str

The name of the dataset.

required

dataset_version

DatasetVersion

The version of the dataset as managed by Picsellia.

required

assets

Optional[MultiAsset]

A preloaded collection of assets. If not provided, assets will be listed dynamically as needed.

None

labelmap

Optional[Dict[str, Label]]

A preloaded mapping of labels. If not provided, the labelmap will be fetched from the DatasetVersion.

None

Methods:

Name Description
download_annotations

Abstract method to download annotations for the dataset.

download_assets

Downloads all assets (e.g., images) associated with the dataset to the specified directory.

get_assets_batch

Retrieves a batch of assets from the dataset based on the specified limit and offset.

Attributes:

Name Type Description
name

The name of the dataset.

dataset_version

The version of the dataset from Picsellia.

assets

A preloaded collection of assets. If not provided, assets will be dynamically listed.

labelmap
images_dir str | None

The local directory where image assets are downloaded.

annotations_dir str | None

The local directory where annotation files are stored.

name = name instance-attribute

The name of the dataset.

dataset_version = dataset_version instance-attribute

The version of the dataset from Picsellia.

assets = assets instance-attribute

A preloaded collection of assets. If not provided, assets will be dynamically listed.

labelmap = get_labelmap(dataset_version=dataset_version) instance-attribute

images_dir = None instance-attribute

The local directory where image assets are downloaded.

annotations_dir = None instance-attribute

The local directory where annotation files are stored.

download_annotations(destination_dir, use_id=True) abstractmethod

Abstract method to download annotations for the dataset.

Subclasses must implement this method to define how annotations are retrieved and stored locally.

Parameters:

Name Type Description Default
destination_dir
str

The directory where the annotations will be saved locally.

required
use_id
Optional[bool]

If True, uses asset IDs for file naming. Defaults to True.

True

download_assets(destination_dir, use_id=True, skip_asset_listing=False)

Downloads all assets (e.g., images) associated with the dataset to the specified directory.

This method retrieves and downloads all the assets linked to the dataset version. If assets are preloaded, they are directly downloaded; otherwise, the method dynamically lists and downloads them from the dataset version.

Parameters:

Name Type Description Default
destination_dir
str

The directory where assets will be saved locally.

required
use_id
Optional[bool]

If True, uses asset IDs to generate file paths. Defaults to True.

True
skip_asset_listing
Optional[bool]

If True, skips listing assets after downloading. Defaults to False.

False
Side Effects
  • Creates the destination_path directory if it doesn't already exist.
  • Sets self.images_dir to the destination_path.

get_assets_batch(limit, offset)

Retrieves a batch of assets from the dataset based on the specified limit and offset.

This method is useful for processing large datasets in smaller chunks.

Parameters:

Name Type Description Default
limit
int

The maximum number of assets to retrieve in the batch.

required
offset
int

The starting index for asset retrieval.

required

Returns:

Name Type Description
MultiAsset MultiAsset

A collection of assets retrieved from the dataset.