K-Medoids Clustering Example
K-Medoids clustering partitions data points into k clusters, each represented by a medoid (actual data point), minimizing the total distance from each point to its assigned medoid. Unlike k-means, it is more robust to outliers.
import getpass
import os
import numpy as np
from dotenv import load_dotenv
from luna_quantum.algorithms import SCIP
from luna_usecases.k_medoids_clustering import (
KMedoidsClusteringCollection,
KMedoidsClusteringData,
KMedoidsClusteringFormulation,
KMedoidsClusteringInstance,
)
load_dotenv()
if "LUNA_API_KEY" not in os.environ:
os.environ["LUNA_API_KEY"] = getpass.getpass("Enter your Luna API key: ")
Create Data
Define pairwise distances between 5 points and cluster into k=2 groups.
distance_matrix = np.array(
[
[0.0, 2.0, 6.0, 7.0, 5.0],
[2.0, 0.0, 5.0, 8.0, 4.0],
[6.0, 5.0, 0.0, 3.0, 7.0],
[7.0, 8.0, 3.0, 0.0, 6.0],
[5.0, 4.0, 7.0, 6.0, 0.0],
]
)
data = KMedoidsClusteringData.from_distance_matrix(
distance_matrix=distance_matrix,
k=2,
node_names=["P1", "P2", "P3", "P4", "P5"],
)
print(data.to_string())
Plot Data
Visualize pairwise distance matrix.
Create Formulation
Select k medoid points minimizing total distance from each point to its nearest medoid.
K-Medoids Clustering Formulation:
Points: 5
Clusters: 2
Indices:
i = point index (i = 0, ..., 4)
j = medoid index (j = 0, ..., 4)
Decision Variables:
z[i] in {0,1} for i = 0, ..., 4
z[i] = 1 if point i is a medoid
y[i,j] in {0,1} for i = 0, ..., 4, j = 0, ..., 4
y[i,j] = 1 if point i is assigned to medoid j
Total: 5 + 25 = 30 binary variables
Objective:
minimize sum_{i,j} distance[i,j] * y[i,j]
Constraints:
1. Exactly k medoids (1 constraint):
sum_i z[i] == 2
2. Each point assigned to exactly one medoid (5 constraints):
sum_j y[i,j] == 1 for all i = 0, ..., 4
3. Assign only to medoids (25 constraints):
y[i,j] <= z[j] for all i = 0, ..., 4, j = 0, ..., 4
4. Medoid self-assignment (5 constraints):
y[j,j] >= z[j] for all j = 0, ..., 4
Create Instance
Combine data and formulation into a solvable instance.
instance = KMedoidsClusteringInstance(data=data, formulation=formulation)
print(instance.to_string())
Data:K-Medoids Clustering Data:
Points: 5
Clusters (k): 2
Node names: ['P1', 'P2', 'P3', 'P4', 'P5']
Formulation:K-Medoids Clustering Formulation:
Points: 5
Clusters: 2
Indices:
i = point index (i = 0, ..., 4)
j = medoid index (j = 0, ..., 4)
Decision Variables:
z[i] in {0,1} for i = 0, ..., 4
z[i] = 1 if point i is a medoid
y[i,j] in {0,1} for i = 0, ..., 4, j = 0, ..., 4
y[i,j] = 1 if point i is assigned to medoid j
Total: 5 + 25 = 30 binary variables
Objective:
minimize sum_{i,j} distance[i,j] * y[i,j]
Constraints:
1. Exactly k medoids (1 constraint):
sum_i z[i] == 2
2. Each point assigned to exactly one medoid (5 constraints):
sum_j y[i,j] == 1 for all i = 0, ..., 4
3. Assign only to medoids (25 constraints):
y[i,j] <= z[j] for all i = 0, ..., 4, j = 0, ..., 4
4. Medoid self-assignment (5 constraints):
y[j,j] >= z[j] for all j = 0, ..., 4
Formulate Model
Translate the instance into a mathematical optimization model.
Solve and Interpret
Solve the model with SCIP and interpret the raw result into a use-case-specific solution.
scip = SCIP()
job = scip.run(model)
sol = job.result()
uc_solution = instance.interpret(sol)
print(uc_solution.to_string())
/Users/maximilianjanetschek/PycharmProjects/luna-usecases/.venv/lib/python3.13/site-packages/rich/live.py:260:
UserWarning: install "ipywidgets" for Jupyter support
warnings.warn('install "ipywidgets" for Jupyter support')
K-Medoids Clustering Solution:
Medoids: ['P2', 'P3']
Total objective: 9.0
Valid: True
Assignments: {'P1': 'P2', 'P2': 'P2', 'P3': 'P3', 'P4': 'P3', 'P5': 'P2'}
Plot Solution
Visualize the optimal solution.
Collections
Generate a benchmark collection of random instances for batch processing.
collection = KMedoidsClusteringCollection.from_random(min_size=5, max_size=8, seed=42)
model = collection.instances[0].formulate()
print(model)
Model: k_medoids_clustering<k_medoids_clustering>
Minimize
0.8688069106061098 * y_0_1 + 0.17188939807344877 * y_0_2
+ 15.455985385363086 * y_0_3 + 14.900406680376998 * y_0_4
+ 0.8688069106061098 * y_1_0 + 1.008323032006659 * y_1_2
+ 14.71531403287725 * y_1_3 + 14.154918604568424 * y_1_4
+ 0.17188939807344877 * y_2_0 + 1.008323032006659 * y_2_1
+ 15.626107385193926 * y_2_3 + 15.070250556066696 * y_2_4
+ 15.455985385363086 * y_3_0 + 14.71531403287725 * y_3_1
+ 15.626107385193926 * y_3_2 + 0.5800841951928127 * y_3_4
+ 14.900406680376998 * y_4_0 + 14.154918604568424 * y_4_1
+ 15.070250556066696 * y_4_2 + 0.5800841951928127 * y_4_3
Subject To
exactly_k_medoids: z_0 + z_1 + z_2 + z_3 + z_4 == 2
assign_one_0: y_0_0 + y_0_1 + y_0_2 + y_0_3 + y_0_4 == 1
assign_one_1: y_1_0 + y_1_1 + y_1_2 + y_1_3 + y_1_4 == 1
assign_one_2: y_2_0 + y_2_1 + y_2_2 + y_2_3 + y_2_4 == 1
assign_one_3: y_3_0 + y_3_1 + y_3_2 + y_3_3 + y_3_4 == 1
assign_one_4: y_4_0 + y_4_1 + y_4_2 + y_4_3 + y_4_4 == 1
only_medoid_0_0: -z_0 + y_0_0 <= 0
only_medoid_0_1: y_0_1 - z_1 <= 0
only_medoid_0_2: y_0_2 - z_2 <= 0
only_medoid_0_3: y_0_3 - z_3 <= 0
only_medoid_0_4: y_0_4 - z_4 <= 0
only_medoid_1_0: -z_0 + y_1_0 <= 0
only_medoid_1_1: -z_1 + y_1_1 <= 0
only_medoid_1_2: y_1_2 - z_2 <= 0
only_medoid_1_3: y_1_3 - z_3 <= 0
only_medoid_1_4: y_1_4 - z_4 <= 0
only_medoid_2_0: -z_0 + y_2_0 <= 0
only_medoid_2_1: -z_1 + y_2_1 <= 0
only_medoid_2_2: -z_2 + y_2_2 <= 0
only_medoid_2_3: y_2_3 - z_3 <= 0
only_medoid_2_4: y_2_4 - z_4 <= 0
only_medoid_3_0: -z_0 + y_3_0 <= 0
only_medoid_3_1: -z_1 + y_3_1 <= 0
only_medoid_3_2: -z_2 + y_3_2 <= 0
only_medoid_3_3: -z_3 + y_3_3 <= 0
only_medoid_3_4: y_3_4 - z_4 <= 0
only_medoid_4_0: -z_0 + y_4_0 <= 0
only_medoid_4_1: -z_1 + y_4_1 <= 0
only_medoid_4_2: -z_2 + y_4_2 <= 0
only_medoid_4_3: -z_3 + y_4_3 <= 0
only_medoid_4_4: -z_4 + y_4_4 <= 0
self_assign_0: -z_0 + y_0_0 >= 0
self_assign_1: -z_1 + y_1_1 >= 0
self_assign_2: -z_2 + y_2_2 >= 0
self_assign_3: -z_3 + y_3_3 >= 0
self_assign_4: -z_4 + y_4_4 >= 0
Binary
z_0 y_0_0 y_0_1 y_0_2 y_0_3 y_0_4 z_1 y_1_0 y_1_1 y_1_2 y_1_3 y_1_4 z_2 y_2_0
y_2_1 y_2_2 y_2_3 y_2_4 z_3 y_3_0 y_3_1 y_3_2 y_3_3 y_3_4 z_4 y_4_0 y_4_1
y_4_2 y_4_3 y_4_4