Skip to content

frameworks.clip.services.evaluator

evaluator

Functions:

Name Description
run_umap_dbscan_clustering

Run UMAP dimensionality reduction followed by DBSCAN clustering with automatic eps search.

save_clustering_visualizations

Save clustering plots, cluster image grids, and outlier grids. Optionally logs them to an experiment.

generate_embeddings_from_results

Combine image paths and embeddings from batched inference results.

load_stored_embeddings

Load stored embeddings and image paths from a .npz file.

reduce_dimensionality_umap

Reduce embedding dimensionality using UMAP.

apply_dbscan_clustering

Apply DBSCAN clustering on embeddings.

save_clustering_plots

Save annotated DBSCAN clustering plot.

save_cluster_images_plot

Save a grid of images for each cluster.

save_outliers_images

Save a grid of images classified as outliers.

find_best_eps

Find the best epsilon value for DBSCAN using silhouette score.

run_umap_dbscan_clustering(embeddings, min_samples=5, initial_eps_list=None, fallback_eps_list=None, default_eps=0.3)

Run UMAP dimensionality reduction followed by DBSCAN clustering with automatic eps search.

save_clustering_visualizations(reduced_embeddings, cluster_labels, image_paths, results_dir, log_images=True, experiment=None)

Save clustering plots, cluster image grids, and outlier grids. Optionally logs them to an experiment.

Parameters:

Name Type Description Default

reduced_embeddings

ndarray

UMAP-reduced 2D embeddings.

required

cluster_labels

ndarray

DBSCAN-assigned cluster labels.

required

image_paths

list[str]

Corresponding image file paths.

required

results_dir

str

Output directory for saved plots.

required

log_images

bool

Whether to log images via experiment.

True

experiment

Experiment | None

Experiment if logging is enabled.

None

generate_embeddings_from_results(image_batches, batch_results)

Combine image paths and embeddings from batched inference results.

Parameters:

Name Type Description Default

image_batches

Sequence[list[str]]

List of image path batches.

required

batch_results

Sequence[list[dict[str, Any]]]

List of inference result batches.

required

Returns:

Type Description
tuple[ndarray, list[str]]

A tuple of (embeddings array, image path list).

load_stored_embeddings(file_path)

Load stored embeddings and image paths from a .npz file.

Parameters:

Name Type Description Default

file_path

str

Path to the .npz file.

required

Returns:

Type Description
tuple[ndarray, list[str]]

Tuple of (embeddings array, image paths).

reduce_dimensionality_umap(embeddings, n_components)

Reduce embedding dimensionality using UMAP.

Parameters:

Name Type Description Default

embeddings

ndarray

High-dimensional embeddings.

required

n_components

int

Target number of dimensions.

required

Returns:

Type Description
ndarray

UMAP-reduced embeddings.

apply_dbscan_clustering(embeddings, dbscan_eps, dbscan_min_samples)

Apply DBSCAN clustering on embeddings.

Parameters:

Name Type Description Default

embeddings

ndarray

2D array of points to cluster.

required

dbscan_eps

float

Epsilon parameter for DBSCAN.

required

dbscan_min_samples

int

Minimum samples per cluster.

required

Returns:

Type Description
ndarray

Array of cluster labels.

save_clustering_plots(reduced_embeddings, cluster_labels, results_dir)

Save annotated DBSCAN clustering plot.

Parameters:

Name Type Description Default

reduced_embeddings

ndarray

2D UMAP-reduced embeddings.

required

cluster_labels

ndarray

Cluster labels for each point.

required

results_dir

str

Directory to save the plot.

required

save_cluster_images_plot(image_paths, cluster_labels, results_dir, max_images_per_cluster=25, grid_size=(5, 5))

Save a grid of images for each cluster.

Parameters:

Name Type Description Default

image_paths

list[str]

List of image file paths.

required

cluster_labels

ndarray

Cluster label for each image.

required

results_dir

str

Directory to save plots.

required

max_images_per_cluster

int

Maximum number of images per plot.

25

grid_size

tuple[int, int]

Size of the plot grid (rows, cols).

(5, 5)

save_outliers_images(image_paths, cluster_labels, results_dir, max_images=25, grid_size=(5, 5))

Save a grid of images classified as outliers.

Parameters:

Name Type Description Default

image_paths

list[str]

List of image file paths.

required

cluster_labels

ndarray

Cluster label for each image.

required

results_dir

str

Directory to save the output.

required

max_images

int

Maximum number of outlier images to display.

25

grid_size

tuple[int, int]

Size of the output grid (rows, cols).

(5, 5)

find_best_eps(reduced, eps_list)

Find the best epsilon value for DBSCAN using silhouette score.

Parameters:

Name Type Description Default

reduced

ndarray

2D array of reduced embeddings.

required

eps_list

list[float]

List of candidate epsilon values.

required

Returns:

Type Description
float | None

The epsilon value with the highest silhouette score.