Metrics
Overview
Metrics evaluate the quality and characteristics of algorithm solutions produced during a
benchmark run. Each metric runs once per (algorithm, model) pair, receiving the
Solution object together with any previously computed feature results. Metrics are the
primary mechanism for quantifying solver performance in Luna Bench.
All metrics subclass BaseMetric (from luna_bench.custom) and implement a single method:
Metrics are registered with the @metric decorator and declare their feature dependencies by
passing the feature class(es) straight to that decorator.
Built-in Metrics
Luna Bench ships with six built-in metrics covering the most common evaluation scenarios.
Runtime
-
Module:
luna_bench.metrics.runtimeCaptures the total wall-clock runtime of the algorithm in seconds.
Returns:
RuntimeResult(runtime_seconds: float)
FeasibilityRatio
-
Module:
luna_bench.metrics.feasbility_ratioComputes the fraction of feasible solutions in the sample set. A value of
1.0indicates that every sampled solution satisfies all constraints.Returns:
FeasibilityRatioResult(feasibility_ratio: float)
ApproximationRatio
-
Module:
luna_bench.metrics.approximation_ratioComputes the ratio of the solution quality relative to the known optimal.
Requires:
OptSolFeatureReturns:
ApproximationRatioResult(approximation_ratio: float)Parameters:
Name Type Default Description abt_difffloat1e-3Absolute tolerance used in ratio comparison.
BestSolutionFound
-
Module:
luna_bench.metrics.best_solution_foundRecords the best objective function value found by the algorithm.
Requires:
OptSolFeatureReturns:
BestSolutionFoundResult(best_solution_found: float)Parameters:
Name Type Default Description abs_tolfloat1e-3Absolute tolerance for treating a value as zero (avoids divide-by-zero).
TimeToSolution
-
Module:
luna_bench.metrics.time_to_solutionEstimates how long the solver needs to find the optimal solution with high probability, based on the fraction of samples that reached the optimum.
Requires:
OptSolFeatureReturns:
TimeToSolutionResult(time_to_solution: float, probability_optimal: float, num_optimal_found: int, num_samples: int)Parameters:
Name Type Default Description target_probabilityfloat0.99Target probability of finding the optimal solution. abs_tolfloat1e-6Absolute tolerance for comparing objective values.
FractionOfOverallBestSolution
-
Module:
luna_bench.metrics.fraction_of_overall_best_solutionComputes the fraction of the solution quality relative to the best solution found across all algorithms in the benchmark, enabling cross-solver comparison.
Requires:
OptSolFeatureReturns:
FractionOfOverallBestSolutionResult(fraction_of_overall_best_solution: float)Parameters:
Name Type Default Description abs_tolfloat1e-6Absolute tolerance for treating two values as equal.
Feature Dependencies
Metrics can depend on features that must be computed before the metric runs. Declare those
dependencies by passing the feature class (or a list of classes) to the @metric decorator:
from luna_bench.custom import BaseMetric, metric
from luna_bench.features import OptSolFeature
@metric(OptSolFeature)
class MyMetric(BaseMetric[MyMetricResult]):
...
When a metric declares dependencies, Luna Bench guarantees those features have been computed
and their results are available in the FeatureResultContainer passed to run().
Accessing feature results
FeatureResultContainer offers a few ways to retrieve computed feature data:
# The single result for a feature class
result = feature_results.first(OptSolFeature)
# A specific result by feature name
result = feature_results.get(OptSolFeature, "optimal-solution")
# Every result for a feature class, keyed by name
all_results = feature_results.get_all(OptSolFeature)
Each method has a *_with_config variant (first_with_config, get_with_config,
get_all_with_config) that also returns the feature instance that produced the result.
Writing Custom Metrics
You can write a custom metric as a class or as a plain function. The class form gives you a typed result and configuration parameters; the function form is a quick way to return a single value. The two paths differ, so pick a tab and follow its steps:
Write the Function
Decorate a function that takes (solution, feature_results) and returns a value. A float
or int is auto-wrapped in a MetricResult, so there is no separate result type to define.
Declare feature dependencies by passing their classes to @metric, just as in the class
form:
from luna_bench.custom import FeatureResultContainer, metric
from luna_bench.features import OptSolFeature
from luna_quantum import Solution
@metric(OptSolFeature)
def my_metric(solution: Solution, feature_results: FeatureResultContainer) -> float:
opt_sol = feature_results.first(OptSolFeature)
return solution.expectation_value() / opt_sol.best_sol # auto-wrapped in a MetricResult
Define a Result Type
Every metric returns an instance of a MetricResult subclass. Define the fields that
your metric produces:
Implement the Metric
Subclass BaseMetric, implement run, and register with @metric. If the metric depends
on features, pass them to the decorator:
from luna_bench.custom import BaseMetric, FeatureResultContainer, metric
from luna_bench.features import OptSolFeature
from luna_quantum import Solution
@metric(OptSolFeature)
class MyMetric(BaseMetric[MyMetricResult]):
threshold: float = 0.5
def run(self, solution: Solution, feature_results: FeatureResultContainer) -> MyMetricResult:
opt_sol = feature_results.first(OptSolFeature)
score = solution.expectation_value() / opt_sol.best_sol
return MyMetricResult(score=score)
Configuration Parameters
Metrics are Pydantic models, so configuration parameters are class-level attributes with
default values. In the example above, threshold: float = 0.5 can be overridden when
adding the metric to a benchmark.
Adding Metrics to a Benchmark
Once a metric is defined and registered, include it in your benchmark configuration. The
framework resolves feature dependencies automatically: any features you passed to @metric
are scheduled to run before the metric.
Refer to the Getting Started guide for details on how to wire metrics into a full benchmark run.