Multi-Armed Bandit

Multi-armed bandit (MAB) experiments dynamically reallocate traffic toward better-performing variants as data accumulates, rather than maintaining fixed allocations throughout the experiment. This maximizes reward during the experiment itself at the cost of some statistical power.


A/B Test vs. Bandit — When to Use Which

ScenarioUse A/B TestUse Bandit
Need unbiased causal estimates for a permanent change
Regulatory or compliance reporting
Short-lived promotions where maximizing conversions matters
Content recommendation / personalization
Rapidly iterating with many variants
Statistical power and precise effect sizes required

Algorithms

Thompson Sampling (Default)

Maintains a Beta distribution for each variant based on its observed successes and total pulls. At each assignment, samples from each distribution and routes the user to the variant with the highest sample. Naturally balances exploration and exploitation.

Best for: most MAB use cases, especially when you want smooth convergence.

UCB1 (Upper Confidence Bound)

Selects the variant with the highest upper confidence bound: mean + sqrt(2 * ln(total_pulls) / variant_pulls). Explores variants with high uncertainty (few pulls) while exploiting high-performing ones.

Best for: situations where you want a deterministic, easily auditable algorithm.

Epsilon-Greedy

With probability epsilon, selects a random variant (exploration). Otherwise selects the current best variant (exploitation). Simple and fast.

Best for: high-volume, low-latency scenarios where simplicity matters.


Creating a MAB Experiment

Set optimization_type to one of thompson_sampling, ucb1, or epsilon_greedy when creating an experiment. All other experiment fields (variants, metrics, targeting) work the same as standard A/B experiments.

curl -X POST "http://localhost:8000/api/v1/experiments" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "CTA Button Bandit",
    "optimization_type": "thompson_sampling",
    "variants": [
      {"name": "control", "is_control": true},
      {"name": "red_button"},
      {"name": "green_button"}
    ]
  }'

API Reference

GET /api/v1/bandit/{experiment_id}

Returns the current bandit state: variant weights, pulls, successes, and conversion rates.

Example Response

{
  "experiment_id": "exp-uuid",
  "optimization_type": "thompson_sampling",
  "total_pulls": 12480,
  "last_updated": "2026-03-02T14:32:00Z",
  "estimated_regret_pct": 2.1,
  "variant_weights": [
    {
      "variant_id": "v1-uuid",
      "variant_name": "control",
      "current_weight": 0.18,
      "successes": 820,
      "pulls": 2246,
      "conversion_rate": 0.365
    },
    {
      "variant_id": "v2-uuid",
      "variant_name": "red_button",
      "current_weight": 0.61,
      "successes": 3890,
      "pulls": 6382,
      "conversion_rate": 0.610
    },
    {
      "variant_id": "v3-uuid",
      "variant_name": "green_button",
      "current_weight": 0.21,
      "successes": 1240,
      "pulls": 3852,
      "conversion_rate": 0.322
    }
  ]
}

Response Fields

FieldDescription
current_weightCurrent traffic allocation fraction (0–1, sums to 1.0 across variants)
estimated_regret_pctEstimated cumulative regret as a % of optimal reward. Lower is better.
pullsTotal assignments to this variant
successesConversion events recorded for this variant

POST /api/v1/bandit/{experiment_id}/update

Manually trigger a weight recalculation outside the scheduler cycle. Requires DEVELOPER or ADMIN role.

curl -X POST "http://localhost:8000/api/v1/bandit/exp-uuid/update" \
  -H "Authorization: Bearer $TOKEN"

PUT /api/v1/bandit/{experiment_id}/weights

Override variant weights manually. Requires ADMIN role. Use with caution — overrides are replaced on the next scheduler cycle unless the experiment is paused.

curl -X PUT "http://localhost:8000/api/v1/bandit/exp-uuid/weights" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "weights": {
      "v1-uuid": 0.1,
      "v2-uuid": 0.8,
      "v3-uuid": 0.1
    }
  }'

Scheduler

The bandit scheduler runs on a background cycle and recalculates weights for all active MAB experiments. The frequency is configurable via BANDIT_SCHEDULER_INTERVAL_SECONDS in settings. Default: every 5 minutes.


Permissions

ActionMinimum Role
View bandit statusVIEWER
Trigger weight updateDEVELOPER
Override weightsADMIN