Skip to content

Core Components

Benchmark

Bases: BenchmarkEntity

Benchmark class represents a benchmark in the LunaBench system.

This class is responsible for managing benchmark-related operations, including creating and deleting benchmarks. It provides methods for interacting with the benchmark data and executing benchmark runs.

create(name: str) -> Benchmark staticmethod

Create a new benchmark with the given name.

The name for a benchmark must be unique. The returned Benchmark object can be used to interact and configure the new benchmark.

Parameters:

  • name (str) –

    The name of the new benchmark.

Returns:

  • Benchmark

    The newly created Benchmark object.

open(name: str) -> Benchmark staticmethod

Load a benchmark if it exists, otherwise create a new one.

Parameters:

  • name (str) –

    The name of the benchmark.

Returns:

  • Benchmark

    The loaded or newly created Benchmark object.

load(name: str) -> Benchmark staticmethod

Load a benchmark from the database by its name.

Parameters:

  • name (str) –

    The name of the benchmark to load.

Returns:

load_all() -> list[Benchmark] staticmethod

Load all benchmarks from the database.

Loading all benchmarks from the database can be a slow operation and should be used sparingly.

Returns:

  • list[Benchmark]

    A list of Benchmark objects representing all benchmarks in the database. If no benchmarks are found, an empty list is returned.

set_modelset(modelset: str | ModelSet) -> None

Set the modelset for the benchmark.

This method sets the modelset for the benchmark. Changing the modelset can affect the results of the benchmark. Therfore its recommended to not change the modelset after the benchmark has been created. If it is necessary, the results of the benchmark should be deleted and the benchmark itself should be re-run.

Parameters:

  • modelset (str | ModelSet) –

    Set the modelset for the benchmark to this modelset. It can be the name of the modelset or the modelset itself.

remove_modelset() -> None

Remove the modelset from the benchmark.

This method removes the modelset from the benchmark. If the modelset is not set, this method does nothing. After removing the modelset, the results of the benchmark may be invalid.j

get_feature(name: str) -> FeatureEntity

Get a feature by its name from a benchmark.

If the feature is not present, an error will be raised.

Parameters:

  • name (str) –

    The name of the feature to be retrieved.

Raises:

  • DataNotExistError

    Raised if its name couldn't retrieve the feature.

add_feature(name: str, feature: BaseFeature) -> FeatureEntity

Add a feature to the benchmark with a given name.

This method adds a feature to the benchmark. The name must be unique within the benchmark. When the benchmark is rerun, the feature will be used to calculate the metrics for each algorithm result.

Also, the feature must be defined in the registry. If this isn't the case, an error will be received. To fix this, please check the documentation on how to do this.

Parameters:

  • name (str) –

    Name of the feature to add.

  • feature (BaseFeature) –

    The feature to add.

Returns:

  • Feature

    The added feature.

remove_feature(feature: str | FeatureEntity) -> None

Remove a feature from the benchmark.

Parameters:

  • feature (str | FeatureEntity) –

    The name of the feature to remove or the feature object itself. Make sure to use the FeatureUserModel object and not only an IFeature object. This is important because the feature name is used to identify the feature.

get_metric(name: str) -> MetricEntity

Get a metric by its name from a benchmark.

If the metric is not present, an error will be raised.

Parameters:

  • name (str) –

    The name of the metric to be retrieved.

Raises:

  • DataNotExistError

    Raised if its name couldn't retrieve the metric.

add_metric(name: str, metric: BaseMetric) -> MetricEntity

Add a metric to the benchmark with a given name.

This method adds a metric to the benchmark. The name must be unique within the benchmark. When the benchmark is rerun, the metric will be calculated for each algorithm result.

Also, the metric must be defined in the registry. If this isn't the case, an error will be received. To fix this, please check the documentation on how to do this.

Parameters:

  • name (str) –

    The name of the metric to add.

  • metric (BaseMetric) –

    An instance of the metric to add.

Returns:

  • Metric

    The added metric.

remove_metric(metric: str | MetricEntity) -> None

Remove a metric from the benchmark.

Parameters:

  • metric (str | MetricEntity) –

    The name of the metric to remove or the metric object itself. Make sure to use the MetricUserModel object and not only an IMetric object. This is important because the metric name is used to identify the metric.

get_algorithm(name: str) -> AlgorithmEntity

Get an algorithm by its name from a benchmark.

If the algorithm is not present, an error will be raised.

Parameters:

  • name (str) –

    The name of the algorithm to be retrieved.

Raises:

  • DataNotExistError

    Raised if its name couldn't retrieve the feature.

add_algorithm(name: str, algorithm: IAlgorithm[Any] | BaseAlgorithmSync | BaseAlgorithmAsync[Any]) -> AlgorithmEntity

Add an algorithm to the benchmark with a given name.

This method adds an algorithm to the benchmark. The name must be unique within the benchmark. When the benchmark is rerun, the results for this algorithm will be calculated.

Also, the algorithm must be defined in the registry. If this isn't the case, an error will be received. To fix this, please check the documentation on how to do this.

Parameters:

Returns:

  • AlgorithmEntity

    The added algorithm.

remove_algorithm(algorithm: str | AlgorithmEntity) -> None

Remove an algorithm from the benchmark.

Parameters:

  • algorithm (str | AlgorithmEntity) –

    The name of the algorithm to remove or the algorithm object itself. Make sure to use the AlgorithmUserModel object and not only an IAlgorithm object. This is important because the algorithm name is used to identify the algorithm.

get_plot(name: str) -> PlotEntity

Get a plot by its name from a benchmark.

If the plot is not present, an error will be raised.

Parameters:

  • name (str) –

    The name of the algorithm to be retrieved.

Raises:

  • DataNotExistError

    Raised if its name couldn't retrieve the plot.

add_plot(name: str, plot: BasePlot) -> PlotEntity

Add a plot to the benchmark with a given name.

This method adds a plot to the benchmark. The name must be unique within the benchmark. When the benchmark is rerun, the results for this plot will be calculated.

Also, the plot must be defined in the registry. If this isn't the case, an error will be received. To fix this, please check the documentation on how to do this.

Parameters:

  • name (str) –

    The name of the plot to add.

  • plot (BasePlot) –

    The plot to add.

Returns:

  • Plot

    The added plot.

remove_plot(plot: str | PlotEntity) -> None

Remove a plot from the benchmark.

Parameters:

  • plot (str | Plot) –

    The name of the plot to remove or the plot object itself. Make sure to use the Plot object and not only an IPlot object. This is important because the plot name is used to identify the plot.

run_features() -> None

Calculate all configured features for all models of this benchmark.

Parameters:

  • benchmark_run_features

run_algorithms() -> None

Calculate all configured features for all models of this benchmark.

run_plots() -> None

Execute all plots registered in the benchmark.

Iterates through all plots in the benchmark, validates each plot against the benchmark data, and executes the plot generation. Each plot is validated before execution to ensure required data (metrics, features, etc.) is available. Plot execution is sequential and follows the order defined in the benchmark configuration.

Raises:

  • RuntimeError

    If plot validation or execution fails. The RuntimeError wraps the underlying error, which may be PlotRunError (for validation failures) or UnknownLunaBenchError (for unexpected execution errors). Only raised in FAIL_ON_ERROR mode; in CONTINUE_ON_ERROR mode, errors are logged as warnings instead.

Notes

In FAIL_ON_ERROR mode, the method stops at the first validation or execution error. In CONTINUE_ON_ERROR mode, errors are logged and execution continues with remaining plots.

add_dependencies() -> None

Add any required dependencies for the benchmark execution.

run() -> None

Execute the benchmark.

results_to_dataframe(*, inlcude_solution: bool = False) -> pd.DataFrame

Return all benchmark results as a single DataFrame.

Builds individual DataFrames for each feature (see .features_to_dataframe()), algorithm (see .algorithms_to_dataframe), and metric entity (see .metrics_to_dataframe()), then merges them. Features merge on model, metrics merge on (algorithm, model). Feature values are repeated across algorithms for the same model since features are model-level.

Returns:

  • DataFrame

    A DataFrame with columns algorithm, model, plus one column per result field of each feature and metric.

features_to_dataframe(feature_entity: FeatureEntity) -> pd.DataFrame

Return results for a single feature entity as a DataFrame with one row per model.

all_features_to_dataframe() -> pd.DataFrame

Return all feature results merged into a single DataFrame on model.

metrics_to_dataframe(metric_entity: MetricEntity) -> pd.DataFrame

Return results for a single metric entity as a DataFrame with one row per (algorithm, model).

all_metrics_to_dataframe() -> pd.DataFrame

Return all metric results merged into a single DataFrame on (algorithm, model).

algorithms_to_dataframe(exclude: set[str] | None = None) -> pd.DataFrame

Return all algorithm (algorithm, model) combinations as a DataFrame.

list_feature_classes() -> list[type[BaseFeature]]

Return the feature classes registered on this benchmark.

list_metrics_classes() -> list[type[BaseMetric]]

Return the metric classes registered on this benchmark.

list_plots_classes() -> list[type[BasePlot]]

Return the plot classes registered on this benchmark.

list_algorithms() -> list[tuple[type[BaseAlgorithmSync | BaseAlgorithmAsync[Any]], dict[str, Any]]]

Return the algorithm classes registered on this benchmark.

ModelSet

Bases: ModelSetEntity

Set of models.

Represents a collection of models with operations for creating, loading, adding, removing, and deleting models.

Attributes:

  • id (int) –

    The unique identifier for the model set.

  • name (str) –

    The name of the model set.

  • models (list[ModelMetadata]) –

    A list of ModelData objects representing the models in this set.

create(modelset_name: str) -> ModelSet staticmethod

Create a new model set with the given dataset name.

Creates a new model set using the provided dataset name and a model set creation use case.

Parameters:

  • modelset_name (str) –

    The name of the dataset.

  • modelset_create ((ModelSetCreateUc, injected)) –

    The use case for creating model sets, by default, it's provided by dependency injection.

Returns:

  • ModelSet

    An instance of ModelSet representing the successfully created model set.

load(name: str) -> ModelSet staticmethod

Load a model set by its ID.

Retrieves a model set from the database using its unique identifier.

Parameters:

  • name (str) –

    The unique name of the model set to load.

  • modelset_load ((ModelSetLoadUc, injected)) –

    The use case for loading model sets, by default provided by dependency injection.

Returns:

load_all() -> list[ModelSet] staticmethod

Load all model sets from the database.

Retrieves all model sets stored in the database.

Returns:

load_all_models() -> list[ModelMetadata] staticmethod

Load all models from the database.

Retrieves all models stored in the database, regardless of which model set they belong to.

Parameters:

  • model_all ((ModelAllUc, injected)) –

    The use case for retrieving all models, by default provided by dependency injection.

Returns:

  • list[ModelMetadata]

    A list of ModelData objects representing all models in the database.

add(model: Model) -> None

Add a model to this model set.

Adds the specified model to this model set and updates the model set's state.

Parameters:

  • model (Model) –

    The model to add to this model set.

  • modelset_add ((ModelSetAddUc, injected)) –

    The use case for adding models to a model set, by default provided by dependency injection.

remove_model(model: Model) -> None

Remove a model from this model set.

Removes the specified model from this model set and updates the model set's state.

Parameters:

  • model (Model) –

    The model to remove from this model set.

  • modelset_remove ((ModelSetRemoveUc, injected)) –

    The use case for removing models from a model set, by default provided by dependency injection.

delete() -> None

Delete this model set from the database.

Permanently removes this model set from the database.

Parameters:

  • modelset_delete_uc ((ModelSetDeleteUc, injected)) –

    The use case for deleting model sets, by default provided by dependency injection.