Multi-Armed Bandit
Multi-armed bandit (MAB) experiments dynamically reallocate traffic toward better-performing variants as data accumulates, rather than maintaining fixed allocations throughout the experiment. This maximizes reward during the experiment itself at the cost of some statistical power.
A/B Test vs. Bandit — When to Use Which
| Scenario | Use A/B Test | Use Bandit |
|---|---|---|
| Need unbiased causal estimates for a permanent change | ✅ | ❌ |
| Regulatory or compliance reporting | ✅ | ❌ |
| Short-lived promotions where maximizing conversions matters | ❌ | ✅ |
| Content recommendation / personalization | ❌ | ✅ |
| Rapidly iterating with many variants | ❌ | ✅ |
| Statistical power and precise effect sizes required | ✅ | ❌ |
Algorithms
Thompson Sampling (Default)
Maintains a Beta distribution for each variant based on its observed successes and total pulls. At each assignment, samples from each distribution and routes the user to the variant with the highest sample. Naturally balances exploration and exploitation.
Best for: most MAB use cases, especially when you want smooth convergence.
UCB1 (Upper Confidence Bound)
Selects the variant with the highest upper confidence bound: mean + sqrt(2 * ln(total_pulls) / variant_pulls). Explores variants with high uncertainty (few pulls) while exploiting high-performing ones.
Best for: situations where you want a deterministic, easily auditable algorithm.
Epsilon-Greedy
With probability epsilon, selects a random variant (exploration). Otherwise selects the current best variant (exploitation). Simple and fast.
Best for: high-volume, low-latency scenarios where simplicity matters.
Creating a MAB Experiment
Set optimization_type to one of thompson_sampling, ucb1, or epsilon_greedy when creating an experiment. All other experiment fields (variants, metrics, targeting) work the same as standard A/B experiments.
curl -X POST "http://localhost:8000/api/v1/experiments" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "CTA Button Bandit",
"optimization_type": "thompson_sampling",
"variants": [
{"name": "control", "is_control": true},
{"name": "red_button"},
{"name": "green_button"}
]
}'
API Reference
GET /api/v1/bandit/{experiment_id}
Returns the current bandit state: variant weights, pulls, successes, and conversion rates.
Example Response
{
"experiment_id": "exp-uuid",
"optimization_type": "thompson_sampling",
"total_pulls": 12480,
"last_updated": "2026-03-02T14:32:00Z",
"estimated_regret_pct": 2.1,
"variant_weights": [
{
"variant_id": "v1-uuid",
"variant_name": "control",
"current_weight": 0.18,
"successes": 820,
"pulls": 2246,
"conversion_rate": 0.365
},
{
"variant_id": "v2-uuid",
"variant_name": "red_button",
"current_weight": 0.61,
"successes": 3890,
"pulls": 6382,
"conversion_rate": 0.610
},
{
"variant_id": "v3-uuid",
"variant_name": "green_button",
"current_weight": 0.21,
"successes": 1240,
"pulls": 3852,
"conversion_rate": 0.322
}
]
}
Response Fields
| Field | Description |
|---|---|
current_weight | Current traffic allocation fraction (0–1, sums to 1.0 across variants) |
estimated_regret_pct | Estimated cumulative regret as a % of optimal reward. Lower is better. |
pulls | Total assignments to this variant |
successes | Conversion events recorded for this variant |
POST /api/v1/bandit/{experiment_id}/update
Manually trigger a weight recalculation outside the scheduler cycle. Requires DEVELOPER or ADMIN role.
curl -X POST "http://localhost:8000/api/v1/bandit/exp-uuid/update" \
-H "Authorization: Bearer $TOKEN"
PUT /api/v1/bandit/{experiment_id}/weights
Override variant weights manually. Requires ADMIN role. Use with caution — overrides are replaced on the next scheduler cycle unless the experiment is paused.
curl -X PUT "http://localhost:8000/api/v1/bandit/exp-uuid/weights" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"weights": {
"v1-uuid": 0.1,
"v2-uuid": 0.8,
"v3-uuid": 0.1
}
}'
Scheduler
The bandit scheduler runs on a background cycle and recalculates weights for all active MAB experiments. The frequency is configurable via BANDIT_SCHEDULER_INTERVAL_SECONDS in settings. Default: every 5 minutes.
Permissions
| Action | Minimum Role |
|---|---|
| View bandit status | VIEWER |
| Trigger weight update | DEVELOPER |
| Override weights | ADMIN |