steps.base.datalake.loader¶
loader
¶
Functions:
Name | Description |
---|---|
load_datalake |
Loads and prepares data from a Picsellia Datalake. |
load_datalake()
¶
Loads and prepares data from a Picsellia Datalake.
This function retrieves input and output datalakes from an active processing job and downloads all associated data (e.g., images). It supports both single datalake extraction (input only) and dual datalake extraction (input & output).
Usage: - Extracts one or two datalakes from the active processing job. - Downloads all associated data and organizes them into a structured object. - Ideal for data processing tasks requiring images from a Datalake.
Behavior:
- If only an input datalake is available, it downloads and returns Datalake
.
- If both input and output datalakes exist, it returns a DatalakeCollection
,
allowing access to both datasets.
Requirements:
- The processing job must have at least one attached datalake.
- Ensure job_id
is set in the active processing context.
- Data assets should be stored in the Picsellia Datalake.
Returns:
Type | Description |
---|---|
Datalake | DatalakeCollection
|
|
Datalake | DatalakeCollection
|
|
Example:
from picsellia_cv_engine.steps.data_extraction.processing.datalake import load_datalake
# Load datalake data from the active processing job
datalake_data = load_datalake()
# Check if the function returned a single datalake or a collection
if isinstance(datalake_data, DatalakeCollection):
print("Using both input and output datalakes.")
print(f"Input datalake images: {datalake_data.input.image_dir}")
print(f"Output datalake images: {datalake_data.output.image_dir}")
else:
print("Using only input datalake.")
print(f"Input datalake images: {datalake_data.image_dir}")