core.data.dataset.base_dataset¶
base_dataset
¶
Classes:
Name | Description |
---|---|
BaseDataset |
A base class to manage the context of a dataset, including metadata, paths, |
BaseDataset(name, dataset_version, assets=None, labelmap=None)
¶
A base class to manage the context of a dataset, including metadata, paths, assets, and annotation management.
This class provides methods to handle dataset assets and annotations, ensuring
compatibility with the Picsellia SDK. Subclasses should implement the
download_annotations
method to manage annotation-specific logic.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The name of the dataset. |
required |
|
DatasetVersion
|
The version of the dataset as managed by Picsellia. |
required |
|
Optional[MultiAsset]
|
A preloaded collection of assets. If not provided, assets will be listed dynamically as needed. |
None
|
|
Optional[Dict[str, Label]]
|
A preloaded mapping of labels. If not provided,
the labelmap will be fetched from the |
None
|
Methods:
Name | Description |
---|---|
download_annotations |
Abstract method to download annotations for the dataset. |
download_assets |
Downloads all assets (e.g., images) associated with the dataset to the specified directory. |
get_assets_batch |
Retrieves a batch of assets from the dataset based on the specified limit and offset. |
Attributes:
Name | Type | Description |
---|---|---|
name |
The name of the dataset. |
|
dataset_version |
The version of the dataset from Picsellia. |
|
assets |
A preloaded collection of assets. If not provided, assets will be dynamically listed. |
|
labelmap |
|
|
images_dir |
str | None
|
The local directory where image assets are downloaded. |
annotations_dir |
str | None
|
The local directory where annotation files are stored. |
name = name
instance-attribute
¶
The name of the dataset.
dataset_version = dataset_version
instance-attribute
¶
The version of the dataset from Picsellia.
assets = assets
instance-attribute
¶
A preloaded collection of assets. If not provided, assets will be dynamically listed.
labelmap = get_labelmap(dataset_version=dataset_version)
instance-attribute
¶
images_dir = None
instance-attribute
¶
The local directory where image assets are downloaded.
annotations_dir = None
instance-attribute
¶
The local directory where annotation files are stored.
download_annotations(destination_dir, use_id=True)
abstractmethod
¶
Abstract method to download annotations for the dataset.
Subclasses must implement this method to define how annotations are retrieved and stored locally.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The directory where the annotations will be saved locally. |
required |
|
Optional[bool]
|
If True, uses asset IDs for file naming. Defaults to True. |
True
|
download_assets(destination_dir, use_id=True, skip_asset_listing=False)
¶
Downloads all assets (e.g., images) associated with the dataset to the specified directory.
This method retrieves and downloads all the assets linked to the dataset version. If assets are preloaded, they are directly downloaded; otherwise, the method dynamically lists and downloads them from the dataset version.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The directory where assets will be saved locally. |
required |
|
Optional[bool]
|
If True, uses asset IDs to generate file paths. Defaults to True. |
True
|
|
Optional[bool]
|
If True, skips listing assets after downloading. Defaults to False. |
False
|
Side Effects
- Creates the
destination_path
directory if it doesn't already exist. - Sets
self.images_dir
to thedestination_path
.
get_assets_batch(limit, offset)
¶
Retrieves a batch of assets from the dataset based on the specified limit and offset.
This method is useful for processing large datasets in smaller chunks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
int
|
The maximum number of assets to retrieve in the batch. |
required |
|
int
|
The starting index for asset retrieval. |
required |
Returns:
Name | Type | Description |
---|---|---|
MultiAsset |
MultiAsset
|
A collection of assets retrieved from the dataset. |