Dataset Version Processing Pipeline¶

This guide explains how to create, customize, test, and deploy a processing pipeline that runs on a Dataset Version using the pxl-pipeline cli with the dataset_version template.

These pipelines receive a dataset version as their target and can apply transformations, filters, or any custom logic to the data.

1. Initialize your pipeline¶

pxl-pipeline init my_dataset_pipeline --type processing --template dataset_version

This generates a pipeline folder with standard files. See project structure for details.

2. Customize your pipeline logic¶

`steps.py`¶

Contains the process() step where your core logic lives. The context gives you access to the target dataset version, parameters, and any inputs you've declared.

@step
def process():
    context: PicselliaDatasetProcessingContext = Pipeline.get_active_context()
    parameters = context.processing_parameters
    dataset_version = context.target

    # If you want to process only selected assets:
    asset_ids_to_process = context.asset_ids

    # Your logic goes here ...

`utils/parameters.py`¶

Define custom parameters using a class that inherits from Parameters:

class ProcessingParameters(Parameters):
    def __init__(self, log_data):
        super().__init__(log_data)
        self.example_parameter = self.extract_parameter(
            ["example_parameter"], expected_type=str, default="default"
        )

See Working with pipeline parameters for more.

`utils/inputs.py`¶

Define the inputs your processing expects. Inputs are registered on the Picsellia platform when you deploy and are validated at launch time.

from picsellia.types.enums import ProcessingInputType
from picsellia_pipelines_cli.utils.inputs import PipelineInputs


class ProcessingInputs(PipelineInputs):
    def __init__(self):
        super().__init__()
        self.define_input(
            name="example_input",
            input_type=ProcessingInputType.TEXT,
            required=True,
        )

See Working with pipeline inputs for the full guide.

3. Configure `run_config.toml` for local testing¶

When you run pxl-pipeline init, a run_config.toml is generated. It contains the target, inputs, and parameters needed to run locally:

override_outputs = true

target_id = ""

[job]
type = "DATASET_VERSION_CREATION"

[inputs]
example_input = "example_value"

[parameters]
example_parameter = "default"

Fill in the target_id with the UUID of the dataset version you want to process.

4. Manage dependencies with `uv`¶

uv add opencv-python --project my_dataset_pipeline

Dependencies are declared in pyproject.toml. See dependency management with uv.

5. Test your pipeline locally¶

pxl-pipeline test my_dataset_pipeline

This will:

Prompt for the target dataset version if not set in the run config
Scaffold any missing inputs with empty defaults
Run the pipeline via pipeline.py --mode local
Save everything under runs/runX/

To reuse the same folder and avoid re-downloading assets:

pxl-pipeline test my_dataset_pipeline --reuse-dir

See how runs/ work for more details.

6. Deploy to Picsellia¶

pxl-pipeline deploy my_dataset_pipeline

This will:

Build and push the Docker image
Register the pipeline in Picsellia
Sync the declared inputs to the platform (add new inputs, update existing ones, remove stale ones)

See deployment lifecycle.

Dataset Version Processing Pipeline¶

1. Initialize your pipeline¶

2. Customize your pipeline logic¶

steps.py¶

utils/parameters.py¶

utils/inputs.py¶

3. Configure run_config.toml for local testing¶

4. Manage dependencies with uv¶