Dataset Version Creation Pipeline¶
This guide explains how to create, customize, test, and deploy a dataset processing pipeline using pxl-pipeline
cli with the dataset_version_creation
template.
These pipelines are typically used to modify images and annotations — for example, applying augmentations or filtering classes.
1. Initialize your pipeline¶
pxl-pipeline init my_custom_pipeline --type processing --template dataset_version_creation
This generates a pipeline folder with standard files. See project structure for details.
2. Customize your pipeline logic¶
steps.py¶
The process_images()
function defines the core logic. It takes input images and COCO annotations, applies transformations, and writes the output to new directories.
from picsellia_cv_engine import step
@step()
def process_images(input_images_dir: str, input_coco: dict, output_images_dir: str, output_coco: dict, parameters: dict):
# Modify images and annotations here
...
return output_coco
You can split your logic into multiple steps if needed.
Input/output contract¶
Each dataset processing step uses these I/O conventions:
-
input_images_dir
– Folder with input images -
input_coco
– COCO annotation dict for input dataset -
parameters
– Dict of pipeline parameters (see Working with parameters) -
output_images_dir
– Empty folder where processed images must be saved -
output_coco
– Empty dict where modified annotations must be written
💡 You must fill both output_images_dir
and output_coco
. They are automatically uploaded by the CLI after the step completes.
Image processing example¶
Save processed images like this:
processed_img.save(os.path.join(output_images_dir, image_filename))
Update output_coco with metadata:
output_coco["images"].append({
"id": new_id,
"file_name": image_filename,
"width": processed_img.width,
"height": processed_img.height,
})
Be sure to also update the "annotations" field.
✔️ Checklist:¶
-
Process and save all images to output_images_dir
-
Append image metadata to output_coco["images"]
-
Copy and adapt annotations to output_coco["annotations"]
3. Define pipeline parameters¶
Parameters can be passed through the pipeline’s context. If you need custom ones, define them in utils/parameters.py
using a class that inherits from Parameters:
class ProcessingParameters(Parameters):
def __init__(self, log_data):
super().__init__(log_data)
self.blur = self.extract_parameter(["blur"], expected_type=bool, default=False)
See Working with pipeline parameters for more.
4. Manage dependencies with uv¶
To add Python packages, use:
uv add opencv-python --project my_custom_pipeline
uv add git+https://github.com/picselliahq/picsellia-cv-engine.git --project my_custom_pipeline
Dependencies are declared in pyproject.toml. You don’t need to activate or install manually — see dependency management with uv.
5. Test your pipeline locally¶
Run your test with:
pxl-pipeline test my_custom_pipeline
This will:
- Prompt for the input dataset and output name
- Run the pipeline via local_pipeline.py
- Save everything under runs/runX/ (see How runs work)
To reuse the same folder and avoid re-downloading assets, use:
pxl-pipeline test my_custom_pipeline --reuse-dir
See how runs/ work for more details.
6. Deploy to pipeline¶
pxl-pipeline deploy my_custom_pipeline
This will:
-
Build and push the Docker image
-
Register the pipeline in Picsellia under Processings → Dataset → Private
See deployment lifecycle.
Make sure you’re logged in to Docker before deploying.