Skip to content

core.data.dataset.coco_dataset

coco_dataset

Classes:

Name Description
CocoDataset

A specialized dataset for managing COCO annotations, enabling downloading, batching,

CocoDataset(name, dataset_version, assets=None, labelmap=None)

Bases: BaseDataset

A specialized dataset for managing COCO annotations, enabling downloading, batching, and merging of annotation files.

This class provides methods for downloading annotations in batches, merging them into a single COCO file, and loading the data for further processing.

Parameters:

Name Type Description Default

name

str

The name of the dataset.

required

dataset_version

DatasetVersion

The version of the dataset to work with.

required

assets

Optional[MultiAsset]

Preloaded assets, if available.

None

labelmap

Optional[Dict[str, Label]]

Mapping of labels for the dataset.

None

Methods:

Name Description
download_annotations

Download COCO annotations in batches, optionally merging them into a single file.

load_coco_file_data

Load COCO annotation data from the merged annotation file.

download_assets

Downloads all assets (e.g., images) associated with the dataset to the specified directory.

get_assets_batch

Retrieves a batch of assets from the dataset based on the specified limit and offset.

Attributes:

Name Type Description
coco_file_path str | None

The path to the merged COCO annotation file.

coco_data dict[str, Any] | None

The loaded COCO annotation data.

name

The name of the dataset.

dataset_version

The version of the dataset from Picsellia.

assets

A preloaded collection of assets. If not provided, assets will be dynamically listed.

labelmap
images_dir str | None

The local directory where image assets are downloaded.

annotations_dir str | None

The local directory where annotation files are stored.

coco_file_path = None instance-attribute

The path to the merged COCO annotation file.

coco_data = None instance-attribute

The loaded COCO annotation data.

name = name instance-attribute

The name of the dataset.

dataset_version = dataset_version instance-attribute

The version of the dataset from Picsellia.

assets = assets instance-attribute

A preloaded collection of assets. If not provided, assets will be dynamically listed.

labelmap = get_labelmap(dataset_version=dataset_version) instance-attribute

images_dir = None instance-attribute

The local directory where image assets are downloaded.

annotations_dir = None instance-attribute

The local directory where annotation files are stored.

download_annotations(destination_dir, use_id=True)

Download COCO annotations in batches, optionally merging them into a single file.

Parameters:

Name Type Description Default
destination_dir
str

Directory to save the COCO annotation files.

required
use_id
Optional[bool]

Whether to use asset IDs in file paths (default: True).

True

load_coco_file_data()

Load COCO annotation data from the merged annotation file.

Returns:

Type Description
dict[str, Any]

Dict[str, Any]: The COCO data loaded as a dictionary.

download_assets(destination_dir, use_id=True, skip_asset_listing=False)

Downloads all assets (e.g., images) associated with the dataset to the specified directory.

This method retrieves and downloads all the assets linked to the dataset version. If assets are preloaded, they are directly downloaded; otherwise, the method dynamically lists and downloads them from the dataset version.

Parameters:

Name Type Description Default
destination_dir
str

The directory where assets will be saved locally.

required
use_id
Optional[bool]

If True, uses asset IDs to generate file paths. Defaults to True.

True
skip_asset_listing
Optional[bool]

If True, skips listing assets after downloading. Defaults to False.

False
Side Effects
  • Creates the destination_path directory if it doesn't already exist.
  • Sets self.images_dir to the destination_path.

get_assets_batch(limit, offset)

Retrieves a batch of assets from the dataset based on the specified limit and offset.

This method is useful for processing large datasets in smaller chunks.

Parameters:

Name Type Description Default
limit
int

The maximum number of assets to retrieve in the batch.

required
offset
int

The starting index for asset retrieval.

required

Returns:

Name Type Description
MultiAsset MultiAsset

A collection of assets retrieved from the dataset.