core.data.dataset.coco_dataset¶
coco_dataset
¶
Classes:
Name | Description |
---|---|
CocoDataset |
A specialized dataset for managing COCO annotations, enabling downloading, batching, |
CocoDataset(name, dataset_version, assets=None, labelmap=None)
¶
Bases: BaseDataset
A specialized dataset for managing COCO annotations, enabling downloading, batching, and merging of annotation files.
This class provides methods for downloading annotations in batches, merging them into a single COCO file, and loading the data for further processing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The name of the dataset. |
required |
|
DatasetVersion
|
The version of the dataset to work with. |
required |
|
Optional[MultiAsset]
|
Preloaded assets, if available. |
None
|
|
Optional[Dict[str, Label]]
|
Mapping of labels for the dataset. |
None
|
Methods:
Name | Description |
---|---|
download_annotations |
Download COCO annotations in batches, optionally merging them into a single file. |
load_coco_file_data |
Load COCO annotation data from the merged annotation file. |
download_assets |
Downloads all assets (e.g., images) associated with the dataset to the specified directory. |
get_assets_batch |
Retrieves a batch of assets from the dataset based on the specified limit and offset. |
Attributes:
Name | Type | Description |
---|---|---|
coco_file_path |
str | None
|
The path to the merged COCO annotation file. |
coco_data |
dict[str, Any] | None
|
The loaded COCO annotation data. |
name |
The name of the dataset. |
|
dataset_version |
The version of the dataset from Picsellia. |
|
assets |
A preloaded collection of assets. If not provided, assets will be dynamically listed. |
|
labelmap |
|
|
images_dir |
str | None
|
The local directory where image assets are downloaded. |
annotations_dir |
str | None
|
The local directory where annotation files are stored. |
coco_file_path = None
instance-attribute
¶
The path to the merged COCO annotation file.
coco_data = None
instance-attribute
¶
The loaded COCO annotation data.
name = name
instance-attribute
¶
The name of the dataset.
dataset_version = dataset_version
instance-attribute
¶
The version of the dataset from Picsellia.
assets = assets
instance-attribute
¶
A preloaded collection of assets. If not provided, assets will be dynamically listed.
labelmap = get_labelmap(dataset_version=dataset_version)
instance-attribute
¶
images_dir = None
instance-attribute
¶
The local directory where image assets are downloaded.
annotations_dir = None
instance-attribute
¶
The local directory where annotation files are stored.
download_annotations(destination_dir, use_id=True)
¶
load_coco_file_data()
¶
Load COCO annotation data from the merged annotation file.
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Dict[str, Any]: The COCO data loaded as a dictionary. |
download_assets(destination_dir, use_id=True, skip_asset_listing=False)
¶
Downloads all assets (e.g., images) associated with the dataset to the specified directory.
This method retrieves and downloads all the assets linked to the dataset version. If assets are preloaded, they are directly downloaded; otherwise, the method dynamically lists and downloads them from the dataset version.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The directory where assets will be saved locally. |
required |
|
Optional[bool]
|
If True, uses asset IDs to generate file paths. Defaults to True. |
True
|
|
Optional[bool]
|
If True, skips listing assets after downloading. Defaults to False. |
False
|
Side Effects
- Creates the
destination_path
directory if it doesn't already exist. - Sets
self.images_dir
to thedestination_path
.
get_assets_batch(limit, offset)
¶
Retrieves a batch of assets from the dataset based on the specified limit and offset.
This method is useful for processing large datasets in smaller chunks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
int
|
The maximum number of assets to retrieve in the batch. |
required |
|
int
|
The starting index for asset retrieval. |
required |
Returns:
Name | Type | Description |
---|---|---|
MultiAsset |
MultiAsset
|
A collection of assets retrieved from the dataset. |