Credit Scoring Feature Selection Example

Download Notebook

Feature Selection identifies the most informative variables for a predictive model. In credit scoring, the goal is typically to predict whether a borrower will default or not, which is represented by binary labels (e.g., 0 = no default, 1 = default).

This formulation selects a subset of features that maximizes correlation with the target variable (predictive power) while minimizing redundancy between features (to avoid overlapping information). The trade-off between these objectives is controlled by a parameter, encouraging solutions that balance relevance and diversity among the selected features.

import getpass
import os

import numpy as np
from dotenv import load_dotenv
from luna_quantum.algorithms import SCIP

from luna_usecases.credit_scoring_feature_selection import (
    CreditScoringFeatureSelectionCollection,
    CreditScoringFeatureSelectionData,
    CreditScoringFeatureSelectionFormulation,
    CreditScoringFeatureSelectionInstance,
)

load_dotenv()
if "LUNA_API_KEY" not in os.environ:
    os.environ["LUNA_API_KEY"] = getpass.getpass("Enter your Luna API key: ")

Create Data

Define a credit scoring dataset with 6 applicants and 4 features (age, income, score, history). labels

design_matrix = np.array(
    [
        [25, 50000, 720, 2],
        [35, 80000, 780, 4],
        [45, 60000, 690, 1],
        [30, 90000, 760, 2],
        [50, 40000, 650, 0],
        [28, 70000, 740, 4],
    ]
)

labels = [0, 0, 1, 0, 1, 0]

data = CreditScoringFeatureSelectionData.from_values(design_matrix=design_matrix, labels=labels, alpha=0.5)
print(data.to_string())

Credit Scoring Feature Selection Data:
  Samples: 6
  Features: 4
  Alpha: 0.5
  Label balance: 0=4, 1=2

Plot Data

Visualize feature distributions across credit classes.

data.plot()

<Axes: title={'center': 'Feature-Feature Correlation (|corr|)'}>

Create Formulation

Select the most predictive feature subset using a QUBO formulation.

formulation = CreditScoringFeatureSelectionFormulation()
print(formulation.to_string(data))

Credit Scoring Feature Selection Formulation:
  Features: 4
  Samples: 6
  Alpha: 0.5

Decision Variables:
  x[i] in {0,1} for i = 0, ..., 3
  x[i] = 1 if feature i is selected
  Total: 4 binary variables

Objective:
  maximize alpha * sum_i corr_label[i] * x[i]
           - (1 - alpha) * sum_False corr_feat[i,j] * x[i] * x[j]

Constraints:
  None (unconstrained)

Create Instance

Combine data and formulation into a solvable instance.

instance = CreditScoringFeatureSelectionInstance(data=data, formulation=formulation)
print(instance.to_string())

Data:Credit Scoring Feature Selection Data:
  Samples: 6
  Features: 4
  Alpha: 0.5
  Label balance: 0=4, 1=2
Formulation:Credit Scoring Feature Selection Formulation:
  Features: 4
  Samples: 6
  Alpha: 0.5

Decision Variables:
  x[i] in {0,1} for i = 0, ..., 3
  x[i] = 1 if feature i is selected
  Total: 4 binary variables

Objective:
  maximize alpha * sum_i corr_label[i] * x[i]
           - (1 - alpha) * sum_False corr_feat[i,j] * x[i] * x[j]

Constraints:
  None (unconstrained)

Formulate Model

Translate the instance into a mathematical optimization model.

model = instance.formulate()

Solve and Interpret

Solve the model with SCIP and interpret the raw result into a use-case-specific solution.

scip = SCIP()
job = scip.run(model)
sol = job.result()
uc_solution = instance.interpret(sol)
print(uc_solution.to_string())

/Users/maximilianjanetschek/PycharmProjects/luna-usecases/.venv/lib/python3.13/site-packages/rich/live.py:260:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

2026-05-29 11:33:30 INFO     Sleeping for 5.0 seconds. Waiting and checking a function in a loop.

2026-05-29 11:33:37 INFO     Sleeping for 10.0 seconds. Waiting and checking a function in a loop.

2026-05-29 11:33:48 INFO     Sleeping for 15.0 seconds. Waiting and checking a function in a loop.

Credit Scoring Feature Selection Solution:
  Selected features: [0]
  Influence score: 0.9318485437788233
  Independence score: 0.0
  Valid: True

Plot Solution

Visualize the optimal solution.

uc_solution.plot(data)

<Axes: title={'center': 'Feature Selection Trade-off Space\nselected=1, influence=0.932, independence=0.000'}, xlabel='Redundancy (mean |corr| with other features)', ylabel='Influence (|corr| with labels)'>

Collections

Generate a benchmark collection of random instances for batch processing.

collection = CreditScoringFeatureSelectionCollection.from_random(
    min_size=4, max_size=8, num_instances=1, n_samples=20, alpha=0.5, seed=42
)
model = collection.instances[0].formulate()
print(model)

Model: credit_scoring_feature_selection<credit_scoring_feature_selection>
Maximize
  -0.05525998531698857 * x_0 * x_1 - 0.06282849724852296 * x_0 * x_2
  - 0.14383103386753227 * x_0 * x_3 - 0.41721170409928643 * x_1 * x_2
  - 0.2520532722752783 * x_1 * x_3 - 0.1766337591341169 * x_2 * x_3
  + 0.36634960890022616 * x_0 + 0.10244618603962699 * x_1
  + 0.1096902833547969 * x_2 + 0.0357968007481356 * x_3
Binary
  x_0 x_1 x_2 x_3