A/B Testing

A/B testing is a method of comparing two versions of a product to determine which performs better based on measurable outcomes.

Core Concepts

What is A/B Testing?

A/B testing (split testing) randomly shows users different versions of a feature to measure which performs better against a goal metric.

Key Components:

Control (A): Current version
Variant (B): New version
Metric: What you're measuring
Sample Size: Number of users needed
Statistical Significance: Confidence in results

Experiment Design

1. Define Hypothesis

## Hypothesis Template

**Current Situation:** 
Users abandon checkout at 45% rate

**Proposed Change:**
Add trust badges to checkout page

**Expected Outcome:**
Reduce abandonment by 10%

**Success Metric:**
Checkout completion rate

2. Choose Metrics

interface ExperimentMetrics {
  // Primary metric (one only)
  primary: {
    name: 'conversion_rate'
    target: 0.15 // 15% improvement
  }
  
  // Secondary metrics
  secondary: [
    { name: 'average_order_value', target: null },
    { name: 'time_to_checkout', target: null }
  ]
  
  // Guardrail metrics (shouldn't decrease)
  guardrails: [
    { name: 'page_load_time', threshold: 2000 }, // ms
    { name: 'error_rate', threshold: 0.01 } // 1%
  ]
}

3. Calculate Sample Size

function calculateSampleSize(
  baselineRate: number,
  minimumDetectableEffect: number,
  significance: number = 0.05,
  power: number = 0.8
): number {
  // Simplified calculation
  const p1 = baselineRate
  const p2 = baselineRate * (1 + minimumDetectableEffect)
  
  const z_alpha = 1.96 // 95% confidence
  const z_beta = 0.84  // 80% power
  
  const pooled = (p1 + p2) / 2
  
  const n = Math.pow(
    (z_alpha * Math.sqrt(2 * pooled * (1 - pooled)) + 
     z_beta * Math.sqrt(p1 * (1 - p1) + p2 * (1 - p2))),
    2
  ) / Math.pow(p1 - p2, 2)
  
  return Math.ceil(n)
}

// Example
const sampleSize = calculateSampleSize(
  0.10,  // 10% baseline conversion
  0.15,  // Want to detect 15% improvement
  0.05,  // 95% confidence
  0.8    // 80% power
)
// Result: ~3,500 users per variant

Implementation

Feature Flag Integration

import { useFeatureFlag } from './flags'

function CheckoutPage() {
  const showTrustBadges = useFeatureFlag('checkout-trust-badges', {
    userId: user.id,
    attributes: {
      country: user.country,
      plan: user.plan
    }
  })
  
  return (
    <div>
      {showTrustBadges ? (
        <CheckoutWithBadges />
      ) : (
        <CheckoutOriginal />
      )}
    </div>
  )
}

Event Tracking

interface ExperimentEvent {
  experimentId: string
  variant: 'control' | 'treatment'
  userId: string
  timestamp: Date
  event: string
  value?: number
}

class ExperimentTracker {
  track(event: ExperimentEvent) {
    // Send to analytics
    analytics.track('experiment_event', {
      experiment_id: event.experimentId,
      variant: event.variant,
      user_id: event.userId,
      event_name: event.event,
      value: event.value,
      timestamp: event.timestamp
    })
  }
  
  trackConversion(experimentId: string, variant: string, value: number) {
    this.track({
      experimentId,
      variant: variant as 'control' | 'treatment',
      userId: getCurrentUserId(),
      timestamp: new Date(),
      event: 'conversion',
      value
    })
  }
}

Statistical Analysis

Calculate Results

interface ExperimentResults {
  control: {
    users: number
    conversions: number
    rate: number
  }
  treatment: {
    users: number
    conversions: number
    rate: number
  }
  improvement: number
  pValue: number
  significant: boolean
}

function analyzeExperiment(data: ExperimentData): ExperimentResults {
  const controlRate = data.control.conversions / data.control.users
  const treatmentRate = data.treatment.conversions / data.treatment.users
  
  const improvement = (treatmentRate - controlRate) / controlRate
  
  // Calculate p-value (simplified)
  const pValue = calculatePValue(data)
  
  return {
    control: {
      users: data.control.users,
      conversions: data.control.conversions,
      rate: controlRate
    },
    treatment: {
      users: data.treatment.users,
      conversions: data.treatment.conversions,
      rate: treatmentRate
    },
    improvement,
    pValue,
    significant: pValue < 0.05
  }
}

Common Pitfalls

1. Peeking at Results

// ❌ Bad: Check results daily and stop early
if (pValue < 0.05 && daysSinceStart < 7) {
  stopExperiment() // Don't do this!
}

// ✅ Good: Wait for planned duration
if (daysSinceStart >= plannedDuration && sampleSizeReached) {
  const results = analyzeExperiment()
  if (results.significant) {
    rolloutWinner()
  }
}

2. Multiple Testing

// ❌ Bad: Test many metrics without correction
const significantMetrics = metrics.filter(m => m.pValue < 0.05)

// ✅ Good: Bonferroni correction
const adjustedAlpha = 0.05 / metrics.length
const significantMetrics = metrics.filter(m => m.pValue < adjustedAlpha)

Testing Tools

LaunchDarkly Example

import { useLDClient } from 'launchdarkly-react-client-sdk'

function ProductPage() {
  const ldClient = useLDClient()
  const newLayout = ldClient?.variation('product-page-layout', false)
  
  useEffect(() => {
    if (newLayout) {
      ldClient?.track('product-page-viewed', {
        variant: 'new-layout'
      })
    }
  }, [newLayout])
  
  return newLayout ? <NewLayout /> : <OldLayout />
}

Optimizely Example

import { OptimizelyProvider, useExperiment } from '@optimizely/react-sdk'

function CheckoutButton() {
  const [variation, clientReady] = useExperiment('checkout-button-color')
  
  const buttonColor = variation === 'red' ? '#ff0000' : '#0070f3'
  
  return (
    <button style={{ backgroundColor: buttonColor }}>
      Checkout
    </button>
  )
}

✅ Have a clear hypothesis ✅ Calculate sample size beforehand ✅ Run for full business cycle ✅ Test one thing at a time ✅ Wait for statistical significance ✅ Consider seasonality ✅ Document everything ✅ Segment results

Don't

❌ Peek at results early ❌ Stop test too soon ❌ Change test mid-flight ❌ Test too many things ❌ Ignore guardrail metrics ❌ Run without power analysis ❌ Deploy without verification ❌ Forget about novelty effect

Multivariate Testing

interface MultivariateTest {
  factors: {
    buttonColor: ['blue', 'red', 'green']
    buttonText: ['Buy Now', 'Add to Cart', 'Purchase']
    trustBadge: [true, false]
  }
  // Total combinations: 3 × 3 × 2 = 18 variants
}

// Requires much larger sample size
const mvtSampleSize = calculateMVTSampleSize({
  variants: 18,
  baselineRate: 0.10,
  effect: 0.15
})

Advanced Techniques

Sequential Testing

// SPRT (Sequential Probability Ratio Test)
function shouldStopExperiment(data: ExperimentData): {
  stop: boolean
  winner?: 'control' | 'treatment'
} {
  const ratio = calculateLikelihoodRatio(data)
  
  if (ratio > upperBound) {
    return { stop: true, winner: 'treatment' }
  }
  
  if (ratio < lowerBound) {
    return { stop: true, winner: 'control' }
  }
  
  return { stop: false }
}

Bayesian A/B Testing

function bayesianAnalysis(data: ExperimentData) {
  // Calculate probability that B beats A
  const probBBeatsA = calculatePosterior(data)
  
  return {
    probabilityOfSuperiority: probBBeatsA,
    expectedLift: calculateExpectedLift(data),
    credibleInterval: calculateCredibleInterval(data)
  }
}

Tools & Platforms

Optimizely: Enterprise A/B testing
Google Optimize: Free A/B testing
VWO: Conversion optimization
LaunchDarkly: Feature flagging + experiments
Split.io: Feature delivery platform
Statsig: Experimentation platform

Resources

"Trustworthy Online Controlled Experiments" - Kohavi et al.
Evan Miller's A/B testing tools
GrowthBook (open source)

A/B Testing

A/B Testing

Core Concepts

What is A/B Testing?

Experiment Design

1. Define Hypothesis

2. Choose Metrics

3. Calculate Sample Size

Implementation

Feature Flag Integration

Event Tracking

Statistical Analysis

Calculate Results

Common Pitfalls

1. Peeking at Results

2. Multiple Testing

Testing Tools

LaunchDarkly Example

Optimizely Example

Best Practices

Do

Don't

Multivariate Testing

Advanced Techniques

Sequential Testing

Bayesian A/B Testing

Tools & Platforms

Resources

On this page