Skip to content

steps.base.dataset.preprocessor

preprocessor

Functions:

Name Description
prepare_classification_datasets

Prepares a classification dataset by organizing image files into category-based subdirectories.

prepare_classification_datasets(dataset_collection, destination_dir)

Prepares a classification dataset by organizing image files into category-based subdirectories.

This function processes a dataset collection by sorting images into directories named after their respective class labels (categories). The dataset is restructured into a format that is compatible with models training for classification tasks, where each category of images is placed into its own folder.

Parameters:

Name Type Description Default

dataset_collection

DatasetCollection

The dataset collection to prepare, which includes images and the corresponding class labels.

required

destination_dir

str

The destination directory where the prepared dataset will be saved, with category-based subdirectories for each class.

required

Returns:

Name Type Description
DatasetCollection DatasetCollection

A dataset collection with images organized into subdirectories, each named after the corresponding class labels.

Examples:

Before Preparation:

dataset/
├── train/
│   ├── image1.jpg
│   ├── image2.jpg
│   ├── image3.jpg
├── val/
│   ├── image4.jpg
│   ├── image5.jpg
│   ├── image6.jpg
└── test/
    ├── image7.jpg
    ├── image8.jpg
    └── image9.jpg

After Preparation:

dataset/
├── train/
│   ├── category1/
│   │   ├── image1.jpg
│   │   └── image3.jpg
│   └── category2/
│       └── image2.jpg
├── val/
│   ├── category1/
│   │   └── image4.jpg
│   └── category2/
│       ├── image5.jpg
│       └── image6.jpg
└── test/
    ├── category1/
    │   └── image7.jpg
    └── category2/
        ├── image8.jpg
        └── image9.jpg