Core Concepts

This guide explains the fundamental concepts you need to understand before working with the platform. Whether you are a product manager designing experiments, an engineer integrating feature flags, or an analyst reading results, this reference will clarify what each term means and how the pieces fit together.


Feature Flags

A feature flag (also called a feature toggle or feature switch) is a configuration key that controls whether a piece of functionality is visible or active for a given user. Instead of deploying code to enable or disable a feature, you change a flag value at runtime — no deployment required.

On/Off Flags

The simplest type. A boolean value controls whether the feature is shown at all.

dark-mode: true   →  show dark theme
dark-mode: false  →  show light theme

Common uses: gradual rollouts, kill switches, beta programs, maintenance mode.

Multivariate Flags

Instead of a boolean, the flag returns one of several string values. Each value maps to a different experience.

checkout-layout: "classic"
checkout-layout: "simplified"
checkout-layout: "one-page"

Common uses: testing more than two variants, configuring UI themes, enabling different feature tiers per plan.

Why Use Feature Flags

  • Deploy code that is hidden behind a flag before it is ready to ship
  • Roll out to a small percentage of users first, then expand gradually
  • Instantly disable a problematic feature without a code deploy
  • Enable features for internal testers before public launch
  • Target features to specific segments (premium users, US users, mobile users)

Experiments / A/B Tests

An experiment (also called an A/B test or controlled experiment) is the process of splitting users into groups, exposing each group to a different version of a feature, and measuring which version performs better on a defined metric.

Hypothesis

Every experiment starts with a hypothesis: a precise, falsifiable statement of what you expect to happen and why.

Good hypothesis: "Changing the checkout CTA button from blue to green will increase checkout completion rate by 5% because green creates stronger visual contrast against the white background."

A good hypothesis specifies the change, the expected direction, the magnitude, and the reason.

Control vs Treatment

  • Control: the existing experience, unchanged. Serves as the baseline.
  • Treatment (or Variant): the new experience you are testing.

An experiment can have one control and one or more treatments. With two variants (control + one treatment) it is called an A/B test. With more than two variants it is called an A/B/n test or multivariate test.

Statistical Significance

Statistical significance tells you how confident you are that the difference between control and treatment is real — not just random noise. It is expressed as a p-value.

  • p-value < 0.05: 95% confidence the difference is real. Standard threshold for most experiments.
  • p-value < 0.01: 99% confidence. Use for high-stakes decisions.

A result is statistically significant when p < your chosen alpha threshold (usually 0.05).

Important: statistical significance does not tell you the effect is large or meaningful. Always check effect size alongside p-value.


Variants

A variant is one of the possible experiences in an experiment. Every experiment has at least two variants:

  • Control variant: the baseline experience (usually what users see today)
  • Treatment variant(s): the new experience(s) you are testing

Each variant is assigned a traffic weight — the fraction of eligible users who will be shown that variant. Weights must sum to 100% (or less if you want to exclude some users from the experiment entirely).

Variants vs Feature Flags

A feature flag controls access to a feature for a user. A variant controls which version of a feature a user sees within an experiment. Experiments are temporary; feature flags are often permanent configuration.


Metrics

A metric is the measurable outcome you use to evaluate whether a variant is better or worse than the control.

Metric Types

TypeDescriptionExample
ConversionBinary: did the event happen or not?Clicked button, completed checkout, signed up
RevenueDollar valueOrder value, lifetime value
CountHow many times the event occurredPage views per session, items added to cart
DurationTime measurementSession length, time to first action

Primary vs Secondary Metrics

Each experiment should have exactly one primary metric — the north star for your ship/no-ship decision. All other metrics are secondary and used to check for unexpected side effects.

Example: primary = checkout completion rate; secondary = cart abandonment rate, average order value, customer support contacts.

Guardrail Metrics

Guardrail metrics are secondary metrics with a minimum acceptable threshold. If a treatment improves your primary metric but degrades a guardrail metric beyond its threshold, the experiment should not ship.

Example guardrail: "Support ticket rate must not increase by more than 5%."


User Bucketing

User bucketing is the process of assigning each user to a variant. The platform uses consistent hash-based bucketing.

Consistent Hash Assignment

The assignment is computed as a deterministic hash of experiment_key + user_id. This means:

  • The same user always gets the same variant for a given experiment (sticky assignment)
  • No database lookup is needed to retrieve the assignment — it can be computed on the fly
  • Sticky assignment survives server restarts, cache clears, and SDK reinitializations

The hash output is mapped to a bucket number (0–99). Variant boundaries are set by traffic weights: if control has 50% weight and treatment has 50% weight, users with bucket 0–49 get control and users with bucket 50–99 get treatment.

Why Consistent Bucketing Matters

Without consistent bucketing, a user could see the control variant on Monday and the treatment variant on Wednesday. This would contaminate your results and create a confusing user experience.


Traffic Allocation vs Rollout Percentage

These two concepts are related but distinct:

Traffic Allocation (Experiments)

Traffic allocation is the fraction of users who are included in the experiment at all. An experiment with 80% traffic allocation means 20% of eligible users are excluded from the experiment entirely and see the default experience.

Within the 80% who are included, users are split between variants according to their weights.

Rollout Percentage (Feature Flags)

Rollout percentage is the fraction of users for whom a feature flag evaluates to true (enabled). A feature flag at 10% rollout means 10% of users see the feature and 90% do not.

Unlike experiment traffic allocation, rollout percentage is usually increased over time as confidence grows.


Targeting Rules

Targeting rules define which users are eligible for an experiment or feature flag. A user who does not match the targeting rules is excluded entirely — they are not assigned to any variant and do not count in the results.

Attribute-Based Rules

Rules are evaluated against user attributes that your application passes to the platform at evaluation time.

country IN [US, CA, GB]
plan EQUALS premium
account_age_days GREATER_THAN 30
app_version SEMVER_GTE 3.0.0
email ENDS_WITH @company.com

Combining Rules

Rules can be combined using AND (all must match), OR (at least one must match), and NOT (inverts the result).

AND:
  country IN [US, CA]
  plan EQUALS enterprise
  account_age_days GREATER_THAN 14

This would target enterprise users in the US or Canada who have had their account for at least two weeks.

Common Targeting Attributes

AttributeTypeExample
user_idstringTarget a specific beta tester
countrystringGeographic targeting
planstringSubscription tier
account_age_daysnumberTarget new vs returning users
app_versionsemverTarget specific app versions
device_typestringMobile vs desktop
emailstringTarget internal users by domain

Environments

The platform supports multiple environments so you can test experiments and flags in staging before they reach production users.

Staging vs Production

EnvironmentPurpose
stagingTest experiment logic, SDK integration, and targeting rules before launch
productionLive experiments and flags serving real users

Each environment has separate feature flag states and separate experiment assignments. A feature flag that is 100% enabled in staging has no effect in production unless you also enable it there.

Environment Variables

Your SDK client is initialized with an API key that is scoped to a specific environment. Use separate API keys for staging and production to prevent accidental cross-environment leakage.


Glossary

TermDefinition
A/B testExperiment comparing two variants (control and treatment)
A/B/n testExperiment comparing more than two variants
Alpha (α)The significance threshold; probability of a false positive you are willing to accept (typically 0.05)
BucketingThe process of assigning users to variants
Confidence intervalRange of values within which the true effect size likely falls
ControlThe baseline variant; typically the existing experience
Conversion rateFraction of users who completed a defined action
CovariatePre-experiment user attribute used in CUPED variance reduction
CUPEDControlled-experiment Using Pre-Experiment Data; a variance reduction technique
Effect sizeThe magnitude of the difference between variants
ExperimentA controlled test that assigns users to variants and measures outcomes
Feature flagA runtime configuration key that controls feature visibility
Guardrail metricA secondary metric that must not degrade beyond a threshold
HoldoutA group of users excluded from all experiments, used to measure cumulative experiment impact
HypothesisA falsifiable statement predicting what will happen in an experiment and why
MABMulti-Armed Bandit; an adaptive experiment that shifts traffic toward better-performing variants
MDEMinimum Detectable Effect; the smallest improvement worth measuring
MetricA measurable outcome used to evaluate experiment results
Mutual exclusion groupA set of experiments configured so users are enrolled in at most one at a time
p-valueProbability of observing results as extreme as these if there were no true effect
Primary metricThe single metric that determines the ship/no-ship decision
Rollout percentageFraction of users for whom a feature flag is enabled
SegmentA subset of users defined by shared attributes
Statistical significancep-value below the chosen alpha threshold
Targeting ruleAn attribute-based condition that determines experiment eligibility
Traffic allocationFraction of eligible users included in an experiment
TreatmentA non-control variant in an experiment
VariantOne of the possible experiences in an experiment