Skip to content

frameworks.clip.services.trainer

trainer

Classes:

Name Description
ClipModelTrainer

CLIP model trainer using BLIP-generated captions for fine-tuning.

Functions:

Name Description
prepare_caption_model

Load the BLIP processor and model for caption generation.

generate_caption

Generate a caption from an image using BLIP.

export_dataset_to_clip_json

Convert a COCO-format dataset to a JSONL file for CLIP training.

build_clip_command

Build CLI command for CLIP training.

parse_and_log_training_output

Parse stdout of subprocess and log relevant training metrics.

run_clip_training

Run CLIP training with provided hyperparameters and log the output.

ClipModelTrainer(model, context)

CLIP model trainer using BLIP-generated captions for fine-tuning.

Parameters:

Name Type Description Default

model

CLIPModel

The Picsellia model wrapper.

required

context

PicselliaTrainingContext | LocalTrainingContext

Training context containing experiment, paths, and hyperparameters.

required

Methods:

Name Description
train_model

Run the full CLIP fine-tuning process using BLIP captions.

save_best_checkpoint

Save the best checkpoint by selecting the latest one.

Attributes:

Name Type Description
model
context
model_dir
run_script_path

model = model instance-attribute

context = context instance-attribute

model_dir = os.path.join(model.results_dir, 'clip_finetuned') instance-attribute

run_script_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'clip_utils.py') instance-attribute

train_model(dataset_collection)

Run the full CLIP fine-tuning process using BLIP captions.

Parameters:

Name Type Description Default
dataset_collection
DatasetCollection

Collection with train, validation, and test datasets.

required

Returns:

Type Description
CLIPModel

The trained model with exported weights set.

save_best_checkpoint(output_dir, context)

Save the best checkpoint by selecting the latest one.

Parameters:

Name Type Description Default
output_dir
str

Directory where checkpoints are stored.

required
context
PicselliaTrainingContext | LocalTrainingContext

Training context for logging.

required

prepare_caption_model()

Load the BLIP processor and model for caption generation.

Returns:

Type Description
tuple[PreTrainedModel, PreTrainedTokenizer]

A tuple containing the model and processor.

generate_caption(model, processor, image_path, prompt, device)

Generate a caption from an image using BLIP.

Parameters:

Name Type Description Default

model

PreTrainedModel

Captioning model.

required

processor

PreTrainedTokenizer

Processor for BLIP input formatting.

required

image_path

str

Path to the image.

required

prompt

str

Prompt to guide the captioning.

required

device

str

Target device.

required

Returns:

Type Description
str

A string caption.

export_dataset_to_clip_json(model, processor, dataset, output_path, device, prompt)

Convert a COCO-format dataset to a JSONL file for CLIP training.

Parameters:

Name Type Description Default

model

PreTrainedModel

Captioning model.

required

processor

PreTrainedTokenizer

Processor for image and prompt.

required

dataset

CocoDataset

Dataset to process.

required

output_path

str

Where to save the JSONL file.

required

device

str

Target device.

required

prompt

str

Prompt to use for all captions.

required

build_clip_command(model_name_or_path, script_path, output_dir, train_file, val_file, test_file, epochs, batch_size, learning_rate, warmup_steps, weight_decay)

Build CLI command for CLIP training.

Returns:

Type Description
list[str]

List of command-line arguments.

parse_and_log_training_output(process, context, log_file_path)

Parse stdout of subprocess and log relevant training metrics.

Parameters:

Name Type Description Default

process

Popen[str]

Running training process.

required

context

PicselliaTrainingContext | LocalTrainingContext

Training context to log metrics.

required

log_file_path

str

Path to write full logs.

required

run_clip_training(run_script_path, output_dir, train_json, val_json, test_json, batch_size, epochs, context)

Run CLIP training with provided hyperparameters and log the output.

Parameters:

Name Type Description Default

run_script_path

str

Path to training script.

required

output_dir

str

Output directory for results.

required

train_json

str

Path to training JSON file.

required

val_json

str

Path to validation JSON file.

required

test_json

str

Path to test JSON file.

required

batch_size

int

Batch size for training.

required

epochs

int

Number of training epochs.

required

context

PicselliaTrainingContext | LocalTrainingContext

Context holding hyperparameters and experiment.

required