frameworks.clip.services.evaluator¶
evaluator
¶
Functions:
| Name | Description |
|---|---|
run_umap_dbscan_clustering |
Run UMAP dimensionality reduction followed by DBSCAN clustering with automatic eps search. |
save_clustering_visualizations |
Save clustering plots, cluster image grids, and outlier grids. Optionally logs them to an experiment. |
generate_embeddings_from_results |
Combine image paths and embeddings from batched inference results. |
load_stored_embeddings |
Load stored embeddings and image paths from a .npz file. |
reduce_dimensionality_umap |
Reduce embedding dimensionality using UMAP. |
apply_dbscan_clustering |
Apply DBSCAN clustering on embeddings. |
save_clustering_plots |
Save annotated DBSCAN clustering plot. |
save_cluster_images_plot |
Save a grid of images for each cluster. |
save_outliers_images |
Save a grid of images classified as outliers. |
find_best_eps |
Find the best epsilon value for DBSCAN using silhouette score. |
run_umap_dbscan_clustering(embeddings, min_samples=5, initial_eps_list=None, fallback_eps_list=None, default_eps=0.3)
¶
Run UMAP dimensionality reduction followed by DBSCAN clustering with automatic eps search.
save_clustering_visualizations(reduced_embeddings, cluster_labels, image_paths, results_dir, log_images=True, experiment=None)
¶
Save clustering plots, cluster image grids, and outlier grids. Optionally logs them to an experiment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
ndarray
|
UMAP-reduced 2D embeddings. |
required |
|
ndarray
|
DBSCAN-assigned cluster labels. |
required |
|
list[str]
|
Corresponding image file paths. |
required |
|
str
|
Output directory for saved plots. |
required |
|
bool
|
Whether to log images via experiment. |
True
|
|
Experiment | None
|
Experiment if logging is enabled. |
None
|
generate_embeddings_from_results(image_batches, batch_results)
¶
Combine image paths and embeddings from batched inference results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Sequence[list[str]]
|
List of image path batches. |
required |
|
Sequence[list[dict[str, Any]]]
|
List of inference result batches. |
required |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, list[str]]
|
A tuple of (embeddings array, image path list). |
load_stored_embeddings(file_path)
¶
Load stored embeddings and image paths from a .npz file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
Path to the .npz file. |
required |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, list[str]]
|
Tuple of (embeddings array, image paths). |
reduce_dimensionality_umap(embeddings, n_components)
¶
apply_dbscan_clustering(embeddings, dbscan_eps, dbscan_min_samples)
¶
Apply DBSCAN clustering on embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
ndarray
|
2D array of points to cluster. |
required |
|
float
|
Epsilon parameter for DBSCAN. |
required |
|
int
|
Minimum samples per cluster. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Array of cluster labels. |
save_clustering_plots(reduced_embeddings, cluster_labels, results_dir)
¶
save_cluster_images_plot(image_paths, cluster_labels, results_dir, max_images_per_cluster=25, grid_size=(5, 5))
¶
Save a grid of images for each cluster.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
list[str]
|
List of image file paths. |
required |
|
ndarray
|
Cluster label for each image. |
required |
|
str
|
Directory to save plots. |
required |
|
int
|
Maximum number of images per plot. |
25
|
|
tuple[int, int]
|
Size of the plot grid (rows, cols). |
(5, 5)
|
save_outliers_images(image_paths, cluster_labels, results_dir, max_images=25, grid_size=(5, 5))
¶
Save a grid of images classified as outliers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
list[str]
|
List of image file paths. |
required |
|
ndarray
|
Cluster label for each image. |
required |
|
str
|
Directory to save the output. |
required |
|
int
|
Maximum number of outlier images to display. |
25
|
|
tuple[int, int]
|
Size of the output grid (rows, cols). |
(5, 5)
|
find_best_eps(reduced, eps_list)
¶
Find the best epsilon value for DBSCAN using silhouette score.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
ndarray
|
2D array of reduced embeddings. |
required |
|
list[float]
|
List of candidate epsilon values. |
required |
Returns:
| Type | Description |
|---|---|
float | None
|
The epsilon value with the highest silhouette score. |