Skip to content

K-Medoids Clustering Example

Download Notebook


K-Medoids clustering partitions data points into k clusters, each represented by a medoid (actual data point), minimizing the total distance from each point to its assigned medoid. Unlike k-means, it is more robust to outliers.

import getpass
import os

import numpy as np
from dotenv import load_dotenv
from luna_quantum.algorithms import SCIP

from luna_usecases.k_medoids_clustering import (
    KMedoidsClusteringCollection,
    KMedoidsClusteringData,
    KMedoidsClusteringFormulation,
    KMedoidsClusteringInstance,
)

load_dotenv()
if "LUNA_API_KEY" not in os.environ:
    os.environ["LUNA_API_KEY"] = getpass.getpass("Enter your Luna API key: ")

Create Data

Define pairwise distances between 5 points and cluster into k=2 groups.

distance_matrix = np.array(
    [
        [0.0, 2.0, 6.0, 7.0, 5.0],
        [2.0, 0.0, 5.0, 8.0, 4.0],
        [6.0, 5.0, 0.0, 3.0, 7.0],
        [7.0, 8.0, 3.0, 0.0, 6.0],
        [5.0, 4.0, 7.0, 6.0, 0.0],
    ]
)
data = KMedoidsClusteringData.from_distance_matrix(
    distance_matrix=distance_matrix,
    k=2,
    node_names=["P1", "P2", "P3", "P4", "P5"],
)
print(data.to_string())
K-Medoids Clustering Data:
  Points: 5
  Clusters (k): 2
  Node names: ['P1', 'P2', 'P3', 'P4', 'P5']

Plot Data

Visualize pairwise distance matrix.

data.plot()

<Axes: title={'center': 'K-Medoids Clustering — 5 points, k=2'}>
png

Create Formulation

Select k medoid points minimizing total distance from each point to its nearest medoid.

formulation = KMedoidsClusteringFormulation()
print(formulation.to_string(data))
K-Medoids Clustering Formulation:
  Points: 5
  Clusters: 2

Indices:
  i = point index (i = 0, ..., 4)
  j = medoid index (j = 0, ..., 4)

Decision Variables:
  z[i] in {0,1} for i = 0, ..., 4
  z[i] = 1 if point i is a medoid
  y[i,j] in {0,1} for i = 0, ..., 4, j = 0, ..., 4
  y[i,j] = 1 if point i is assigned to medoid j
  Total: 5 + 25 = 30 binary variables

Objective:
  minimize sum_{i,j} distance[i,j] * y[i,j]

Constraints:
  1. Exactly k medoids (1 constraint):
     sum_i z[i] == 2
  2. Each point assigned to exactly one medoid (5 constraints):
     sum_j y[i,j] == 1  for all i = 0, ..., 4
  3. Assign only to medoids (25 constraints):
     y[i,j] <= z[j]  for all i = 0, ..., 4, j = 0, ..., 4
  4. Medoid self-assignment (5 constraints):
     y[j,j] >= z[j]  for all j = 0, ..., 4

Create Instance

Combine data and formulation into a solvable instance.

instance = KMedoidsClusteringInstance(data=data, formulation=formulation)
print(instance.to_string())
Data:K-Medoids Clustering Data:
  Points: 5
  Clusters (k): 2
  Node names: ['P1', 'P2', 'P3', 'P4', 'P5']
Formulation:K-Medoids Clustering Formulation:
  Points: 5
  Clusters: 2

Indices:
  i = point index (i = 0, ..., 4)
  j = medoid index (j = 0, ..., 4)

Decision Variables:
  z[i] in {0,1} for i = 0, ..., 4
  z[i] = 1 if point i is a medoid
  y[i,j] in {0,1} for i = 0, ..., 4, j = 0, ..., 4
  y[i,j] = 1 if point i is assigned to medoid j
  Total: 5 + 25 = 30 binary variables

Objective:
  minimize sum_{i,j} distance[i,j] * y[i,j]

Constraints:
  1. Exactly k medoids (1 constraint):
     sum_i z[i] == 2
  2. Each point assigned to exactly one medoid (5 constraints):
     sum_j y[i,j] == 1  for all i = 0, ..., 4
  3. Assign only to medoids (25 constraints):
     y[i,j] <= z[j]  for all i = 0, ..., 4, j = 0, ..., 4
  4. Medoid self-assignment (5 constraints):
     y[j,j] >= z[j]  for all j = 0, ..., 4

Formulate Model

Translate the instance into a mathematical optimization model.

model = instance.formulate()

Solve and Interpret

Solve the model with SCIP and interpret the raw result into a use-case-specific solution.

scip = SCIP()
job = scip.run(model)
sol = job.result()
uc_solution = instance.interpret(sol)
print(uc_solution.to_string())
/Users/maximilianjanetschek/PycharmProjects/luna-usecases/.venv/lib/python3.13/site-packages/rich/live.py:260:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')








K-Medoids Clustering Solution:
  Medoids: ['P2', 'P3']
  Total objective: 9.0
  Valid: True
  Assignments: {'P1': 'P2', 'P2': 'P2', 'P3': 'P3', 'P4': 'P3', 'P5': 'P2'}

Plot Solution

Visualize the optimal solution.

uc_solution.plot(data)

<Axes: title={'center': 'K-Medoids Clustering — objective: 9.0000'}>
png

Collections

Generate a benchmark collection of random instances for batch processing.

collection = KMedoidsClusteringCollection.from_random(min_size=5, max_size=8, seed=42)
model = collection.instances[0].formulate()
print(model)
Model: k_medoids_clustering<k_medoids_clustering>
Minimize
  0.8688069106061098 * y_0_1 + 0.17188939807344877 * y_0_2
  + 15.455985385363086 * y_0_3 + 14.900406680376998 * y_0_4
  + 0.8688069106061098 * y_1_0 + 1.008323032006659 * y_1_2
  + 14.71531403287725 * y_1_3 + 14.154918604568424 * y_1_4
  + 0.17188939807344877 * y_2_0 + 1.008323032006659 * y_2_1
  + 15.626107385193926 * y_2_3 + 15.070250556066696 * y_2_4
  + 15.455985385363086 * y_3_0 + 14.71531403287725 * y_3_1
  + 15.626107385193926 * y_3_2 + 0.5800841951928127 * y_3_4
  + 14.900406680376998 * y_4_0 + 14.154918604568424 * y_4_1
  + 15.070250556066696 * y_4_2 + 0.5800841951928127 * y_4_3
Subject To
  exactly_k_medoids: z_0 + z_1 + z_2 + z_3 + z_4 == 2
  assign_one_0: y_0_0 + y_0_1 + y_0_2 + y_0_3 + y_0_4 == 1
  assign_one_1: y_1_0 + y_1_1 + y_1_2 + y_1_3 + y_1_4 == 1
  assign_one_2: y_2_0 + y_2_1 + y_2_2 + y_2_3 + y_2_4 == 1
  assign_one_3: y_3_0 + y_3_1 + y_3_2 + y_3_3 + y_3_4 == 1
  assign_one_4: y_4_0 + y_4_1 + y_4_2 + y_4_3 + y_4_4 == 1
  only_medoid_0_0: -z_0 + y_0_0 <= 0
  only_medoid_0_1: y_0_1 - z_1 <= 0
  only_medoid_0_2: y_0_2 - z_2 <= 0
  only_medoid_0_3: y_0_3 - z_3 <= 0
  only_medoid_0_4: y_0_4 - z_4 <= 0
  only_medoid_1_0: -z_0 + y_1_0 <= 0
  only_medoid_1_1: -z_1 + y_1_1 <= 0
  only_medoid_1_2: y_1_2 - z_2 <= 0
  only_medoid_1_3: y_1_3 - z_3 <= 0
  only_medoid_1_4: y_1_4 - z_4 <= 0
  only_medoid_2_0: -z_0 + y_2_0 <= 0
  only_medoid_2_1: -z_1 + y_2_1 <= 0
  only_medoid_2_2: -z_2 + y_2_2 <= 0
  only_medoid_2_3: y_2_3 - z_3 <= 0
  only_medoid_2_4: y_2_4 - z_4 <= 0
  only_medoid_3_0: -z_0 + y_3_0 <= 0
  only_medoid_3_1: -z_1 + y_3_1 <= 0
  only_medoid_3_2: -z_2 + y_3_2 <= 0
  only_medoid_3_3: -z_3 + y_3_3 <= 0
  only_medoid_3_4: y_3_4 - z_4 <= 0
  only_medoid_4_0: -z_0 + y_4_0 <= 0
  only_medoid_4_1: -z_1 + y_4_1 <= 0
  only_medoid_4_2: -z_2 + y_4_2 <= 0
  only_medoid_4_3: -z_3 + y_4_3 <= 0
  only_medoid_4_4: -z_4 + y_4_4 <= 0
  self_assign_0: -z_0 + y_0_0 >= 0
  self_assign_1: -z_1 + y_1_1 >= 0
  self_assign_2: -z_2 + y_2_2 >= 0
  self_assign_3: -z_3 + y_3_3 >= 0
  self_assign_4: -z_4 + y_4_4 >= 0
Binary
  z_0 y_0_0 y_0_1 y_0_2 y_0_3 y_0_4 z_1 y_1_0 y_1_1 y_1_2 y_1_3 y_1_4 z_2 y_2_0
  y_2_1 y_2_2 y_2_3 y_2_4 z_3 y_3_0 y_3_1 y_3_2 y_3_3 y_3_4 z_4 y_4_0 y_4_1
  y_4_2 y_4_3 y_4_4