Simple Training Pipeline¶
This guide explains how to create, customize, test, and deploy a training pipeline using the simple_training template from the pxl-pipeline cli.
This is the recommended starting point for training pipelines. It provides a minimal, framework-agnostic scaffold that you can extend with any ML framework of your choice.
1. Initialize your pipeline¶
pxl-pipeline init my_training_pipeline --type training --template simple_training
This generates a pipeline folder with standard files. See project structure for details.
During init, you'll be prompted to:
- Create a new model version or select an existing one
- If you create one, default parameters from
TrainingHyperParameterswill be used - If using an existing model, ensure the parameter class matches the version's expected inputs
2. Customize your pipeline¶
pipeline.py¶
The entry point creates a training context from your config and runs the pipeline:
context = create_training_context_from_config(
hyperparameters_cls=TrainingHyperParameters,
augmentation_parameters_cls=AugmentationParameters,
export_parameters_cls=ExportParameters,
mode=args.mode,
config_file_path=args.config_file,
)
@pipeline(context=context, log_folder_path="logs/", remove_logs_on_completion=False)
def my_training_pipeline():
datasets = list_training_datasets()
print(context.hyperparameters.epochs)
# Your training code goes here ...
steps.py¶
Contains your training steps. The template includes a simple step to list attached datasets:
@step()
def list_training_datasets() -> list[DatasetVersion]:
context = Pipeline.get_active_context()
experiment = context.experiment
datasets = experiment.list_attached_dataset_versions()
return datasets
You can add more steps for data preprocessing, model building, training loops, evaluation, and artifact saving.
utils/parameters.py¶
This file defines the training hyperparameters for the pipeline:
class TrainingHyperParameters(HyperParameters):
def __init__(self, log_data: LogDataType):
super().__init__(log_data=log_data)
self.epochs = self.extract_parameter(["epochs"], expected_type=int, default=3)
self.batch_size = self.extract_parameter(["batch_size"], expected_type=int, default=8)
self.image_size = self.extract_parameter(["image_size"], expected_type=int, default=640)
To add a new hyperparameter (e.g., learning rate):
self.learning_rate = self.extract_parameter(["lr"], expected_type=float, default=0.001)
See Working with pipeline parameters for more advanced usage.
pyproject.toml: Customize your dependencies¶
Dependencies are managed with uv. To add a new package to the pipeline environment:
uv add torch --project my_training_pipeline
To install a Git-based package:
uv add git+https://github.com/picselliahq/picsellia-cv-engine.git --project my_training_pipeline
This updates the pyproject.toml and uv.lock.
The CLI will automatically install everything on the next test or deploy.
See dependency management with uv for full details.
3. Configure run_config.toml for local testing¶
When you run pxl-pipeline init, a run_config.toml is generated:
override_outputs = true
[job]
type = "TRAINING"
[input.train_dataset_version]
id = ""
[input.model_version]
id = ""
[output.experiment]
name = "my_training_pipeline_exp1"
project_name = "my_training_pipeline"
[hyperparameters]
epochs = 3
batch_size = 8
image_size = 640
Fill in the input.train_dataset_version.id and input.model_version.id with the UUIDs from your Picsellia workspace. The output.experiment section defines where the experiment results will be stored.
4. Test your pipeline locally¶
pxl-pipeline test my_training_pipeline
This will:
- Create a
.venvin the pipeline folder - Install dependencies using uv
- Prompt for an
experiment_id
You must create the experiment manually in the Picsellia UI and attach the correct model version and training datasets.
Outputs will be saved under:
my_training_pipeline/runs/<runX>/
├── run_config.toml
├── dataset/
└── models/
See how runs/ work for details on configuration reuse.
5. Deploy to Picsellia¶
pxl-pipeline deploy my_training_pipeline
This will:
- Build a Docker image (based on your Dockerfile)
- Push it to your Docker registry
- Register the pipeline with the selected model version in Picsellia
Your Dockerfile installs picsellia-cv-engine and any other dependencies from pyproject.toml.