Feature Flags and Progressive Delivery: Safe Releases at Scale

Feature flags — also called feature toggles or feature switches — decouple code deployment from feature release. You deploy code to production with the new feature disabled. When you're ready, you enable it for 1% of users, watch metrics, enable for 10%, verify, then 100%. If something goes wrong, you flip a switch and it's gone — no rollback deploy, no database migration, no 2am deployment.

At scale, feature flags become a core part of your deployment infrastructure. Companies like Facebook, LinkedIn, and Spotify deploy dozens of times per day, with every significant change behind a feature flag. This article covers the implementation patterns, not the philosophy.

Flag Types and Use Cases

Mental Model

Connecting isolated components into a resilient, scalable, and observable distributed web.

graph LR
    Producer[Producer Service] -->|Publish Event| Kafka[Kafka / Event Bus]
    Kafka -->|Consume| Consumer1[Consumer Group A]
    Kafka -->|Consume| Consumer2[Consumer Group B]
    Consumer1 --> DB1[(Primary DB)]
    Consumer2 --> Cache[(Redis)]

Release flags:
  → Hide incomplete features in production (trunk-based development)
  → "newCheckoutFlow": false in production, code exists but inaccessible

Kill switches:
  → Emergency disablement of a feature causing incidents
  → "paymentService": false → fallback to manual processing

Ops flags:
  → Control infrastructure behavior (circuit breakers, cache TTLs)
  → "enableRedisCaching": true/false

Experiment flags:
  → A/B testing: 50% users see variant A, 50% see variant B
  → "checkoutButtonColor": { "control": "blue", "treatment": "green" }

Permission flags:
  → Enable features for specific users (beta, premium, internal)
  → "advancedAnalytics": enabled for { tier: "enterprise" }

Architecture: Evaluation and Storage

Flag evaluation architecture:

SDK (in application) → Local cache (in-memory)
                              ↑
                    Background polling/streaming
                              ↑
                    Flag service (LaunchDarkly / Flagsmith / internal)
                              ↑
                    Flag storage (database / config service)

Critical design constraint: Flag evaluation must be SYNCHRONOUS and LOCAL.
Calling a remote API for each flag evaluation adds latency to every request.
The SDK maintains a local in-memory copy of all flags, refreshed every 30s
or via streaming (Server-Sent Events). Evaluation is a local lookup — < 1ms.

OpenFeature: Vendor-Neutral Flag SDK

OpenFeature is a CNCF standard for feature flag evaluation. Use it to avoid vendor lock-in:

// Maven:
// openfeature-java-sdk + provider (LaunchDarkly, Flagsmith, etc.)

@Configuration
public class FeatureFlagConfig {

    @Bean
    public OpenFeatureAPI openFeatureAPI() {
        // Provider can be swapped without changing application code:
        FeatureProvider provider = new LaunchDarklyProvider(
            new LDConfig.Builder()
                .offline(false)
                .build(),
            new LDClient(System.getenv("LAUNCHDARKLY_SDK_KEY"))
        );

        OpenFeatureAPI api = OpenFeatureAPI.getInstance();
        api.setProvider(provider);
        return api;
    }

    @Bean
    public Client featureFlagClient(OpenFeatureAPI api) {
        return api.getClient("order-service");
    }
}

// Flag evaluation in services:
@Service
public class CheckoutService {

    @Autowired
    private Client featureFlags;

    public CheckoutResult checkout(CartRequest cart, User user) {
        // Create evaluation context from the user:
        EvaluationContext context = new ImmutableContext(user.getId(), Map.of(
            "email", Value.objectToValue(user.getEmail()),
            "tier", Value.objectToValue(user.getTier()),
            "region", Value.objectToValue(user.getRegion()),
            "betaUser", Value.objectToValue(user.isBetaOptIn())
        ));

        // Boolean flag evaluation (with default):
        boolean useNewCheckoutFlow = featureFlags.getBooleanValue(
            "new-checkout-flow",
            false,  // Default: old flow (fail-safe)
            context
        );

        if (useNewCheckoutFlow) {
            return newCheckoutService.process(cart);
        } else {
            return legacyCheckoutService.process(cart);
        }
    }

    // Multivariate flag (A/B/C testing):
    public String getRecommendationAlgorithm(User user) {
        EvaluationContext ctx = buildContext(user);
        return featureFlags.getStringValue(
            "recommendation-algorithm",
            "collaborative-filtering",  // Default
            ctx
        );
        // Returns: "collaborative-filtering", "content-based", or "hybrid"
        // based on targeting rules configured in the flag service
    }
}

Percentage Rollouts

Percentage rollout implementation:

User ID: "user-12345"
Flag name: "new-checkout-flow"
Target percentage: 10%

Hash: SHA256("user-12345" + "new-checkout-flow")
    = a1b2c3d4e5... (deterministic)

Bucket: parseInt(hash[0:4], 16) % 10000 = 6521

6521 / 10000 = 65.21% → User is NOT in 10% rollout (65% > 10%)

Properties:
- Same user always gets same result (consistent experience)
- Increasing percentage from 10% → 20% adds new users, keeps existing 10% in
- No server-side state needed — pure function of userId + flagName + percentage

// Simple percentage rollout without external flag service:
@Component
public class FeatureFlagEvaluator {

    public boolean isEnabled(String flagName, String userId, int targetPercent) {
        String input = userId + ":" + flagName;
        int hash = Math.abs(MurmurHash3.hash32(input.getBytes())) % 10000;
        return hash < targetPercent * 100;
    }
}

// Usage:
boolean showNewUI = flagEvaluator.isEnabled("new-ui", user.getId(), 15);
// 15% of users deterministically get the new UI

Flag Lifecycle: Avoiding "Flag Debt"

Feature flags accumulate. A codebase with 200 flags — half of which are fully rolled out and forgotten — becomes unmaintainable. Each flag adds a branch in your code; 200 flags means thousands of untested combinations.

Flag lifecycle stages:
1. Created     → default false, no targeting
2. Testing     → enabled for QA/internal users only
3. Canary      → 1-5% production users
4. Rollout     → gradual increase: 10% → 25% → 50% → 100%
5. Cleanup     → flag removed from code, flag config deleted

When a flag reaches 100% rollout (or 0% = permanently disabled), it must be cleaned up. This means:

Delete the flag from the flag service
Remove the flag evaluation from code
Delete the unused code path

// Code BEFORE cleanup (flag at 100%):
if (featureFlags.getBooleanValue("new-checkout-flow", false, context)) {
    return newCheckoutService.process(cart);
} else {
    return legacyCheckoutService.process(cart);  // Dead code
}

// Code AFTER cleanup:
return newCheckoutService.process(cart);  // Permanent — no flag check

Track flag cleanup as a first-class engineering task. Some teams use automatic expiry dates — flags that aren't cleaned up by their expiry date trigger alerts.

Kill Switches: Emergency Degradation

Kill switches are flags designed for emergency use — they should be evaluated extremely quickly and fail safe:

@Service
public class PaymentService {

    @Autowired
    private Client featureFlags;

    @Autowired
    private ManualPaymentService manualPaymentService;

    public PaymentResult processPayment(PaymentRequest request) {
        // Kill switch: if payment service is having issues, use manual fallback
        boolean paymentServiceEnabled = featureFlags.getBooleanValue(
            "payment-service-enabled",
            true,   // Default TRUE — service is enabled by default
            EvaluationContext.EMPTY  // No user context needed for kill switches
        );

        if (!paymentServiceEnabled) {
            log.warn("Payment service kill switch active — using manual fallback");
            return manualPaymentService.queue(request);
        }

        return stripeService.charge(request);
    }
}

Kill switch defaults must be safe state (what behavior is acceptable during an incident):

payment-service-enabled: default true (payments work normally)
new-search-algorithm: default false (new algorithm is disabled by default)

If the flag service itself is unavailable (network partition, outage), the SDK uses cached values. If no cache exists, it uses the SDK default. Design your defaults for the worst case.

Metrics and Flag Evaluation Tracking

// Track flag evaluations for analysis:
@Aspect
@Component
public class FeatureFlagMetricsAspect {

    @Autowired
    private MeterRegistry meterRegistry;

    @Around("@annotation(featureFlagCheck)")
    public Object trackFlagEvaluation(ProceedingJoinPoint joinPoint,
                                       FeatureFlagCheck featureFlagCheck) throws Throwable {
        String flagName = featureFlagCheck.flag();
        Object result = joinPoint.proceed();

        meterRegistry.counter("feature_flag.evaluation",
            "flag", flagName,
            "value", result.toString()
        ).increment();

        return result;
    }
}

// Use in OpenFeature hooks:
public class MetricsHook implements Hook {
    @Override
    public void after(HookContext ctx, FlagEvaluationDetails details, Map<String, Object> hints) {
        // Record every flag evaluation with its result and variant
        metrics.record("feature_flag.evaluation", 1,
            "flag", details.getFlagKey(),
            "value", details.getValue().toString(),
            "reason", details.getReason()
        );
    }
}

Track flag evaluations in Grafana/Datadog, correlated with:

Error rate (did enabling this flag increase errors?)
Latency (did the new code path change P99?)
Business metrics (did the A/B test variant convert better?)

This telemetry turns flag evaluation into a decision-making tool, not just a deployment switch.

Self-Hosted vs. Managed Flag Service

Factor	Self-Hosted (Flagsmith, Unleash)	Managed (LaunchDarkly, Split.io)
Cost	Infrastructure only ($0-$200/mo)	$0-$50k/year depending on tier
Setup	Moderate (deploy + maintain)	None (SaaS)
Data privacy	All data stays in your infra	Data sent to vendor
Reliability	Your responsibility	Vendor SLA (99.99%+)
Features	Core + open source ecosystem	Full-featured (A/B stats, etc.)

Self-host Flagsmith or Unleash if: data residency requirements, budget constraints, or < 50 flags. Use LaunchDarkly if: large A/B testing programs, many flags, and the engineering time cost of maintaining self-hosted outweighs the subscription cost.

Feature flags are an investment in deployment safety. The teams that implement them stop having "all hands on deck" deployment nights. When something goes wrong, they turn a flag off instead of rolling back a deployment. The operational maturity that comes with progressive delivery — canary deployments, A/B testing, kill switches — is only possible when code changes can be separated from feature releases.

Technical Trade-offs: Messaging Systems

Pattern	Ordering	Durability	Throughput	Complexity
Log-based (Kafka)	Strict (per partition)	High	Very High	High
Memory-based (Redis Pub/Sub)	None	Low	High	Very Low
Push-based (RabbitMQ)	Fair	Medium	Medium	Medium

Key Takeaways

Same user always gets same result (consistent experience)
Increasing percentage from 10% → 20% adds new users, keeps existing 10% in
No server-side state needed — pure function of userId + flagName + percentage

Production Readiness Checklist

Before deploying this architecture to a production environment, ensure the following Staff-level criteria are met:

High Availability: Have we eliminated single points of failure across all layers?
Observability: Are we exporting structured JSON logs, custom Prometheus metrics, and OpenTelemetry traces?
Circuit Breaking: Do all synchronous service-to-service calls have timeouts and fallbacks (e.g., via Resilience4j)?
Idempotency: Can our APIs handle retries safely without causing duplicate side effects?
Backpressure: Does the system gracefully degrade or return HTTP 429 when resources are saturated?

Verbal Interview Script

Interviewer: "How would you ensure high availability and fault tolerance for this specific architecture?"

Candidate: "To achieve 'Five Nines' (99.999%) availability, we must eliminate all Single Points of Failure (SPOF). I would deploy the API Gateway and stateless microservices across multiple Availability Zones (AZs) behind an active-active load balancer. For the data layer, I would use asynchronous replication to a read-replica in a different region for disaster recovery. Furthermore, it's not enough to just deploy redundantly; we must protect the system from cascading failures. I would implement strict timeouts, retry mechanisms with exponential backoff and jitter, and Circuit Breakers (using a library like Resilience4j) on all synchronous network calls between microservices."