How to Design Valid Tests
Principles for running scientifically sound experiments
1. Test One Variable at a Time
- Version A: Blue button, short copy, technical headline
- Version B: Red button, long copy, benefit headline
If B wins, you don't know which change drove improvement.
- Version A: "Swap tokens with 40% lower fees"
- Version B: "Earn more on every trade you make"
Only headline changed. Clear attribution of results.
2. Ensure Adequate Sample Size
Don't call tests early just because one version is ahead. You need sufficient data to determine if differences are real or random chance.
- 100+ conversions per variation (rule of thumb, but more is better)
- At least 1 week of data (accounts for day-of-week variation)
- Statistical significance (typically 95% confidence level)
Version A: 120 conversions from 40,000 impressions (0.30% CR)
Version B: 150 conversions from 40,000 impressions (0.375% CR)
B appears 25% better, but is it real? Use a statistical significance calculator. With these numbers, you'd have ~85% confidence - not quite enough. Run longer.
3. Run Tests Simultaneously
Don't test Version A for a week, then Version B the next week. Market conditions change, audience composition shifts, and external factors confound results.
Split traffic 50/50 and run both simultaneously for valid comparison.
4. Account for Time Variations
Conversion rates vary by:
- Day of week (weekends often different)
- Time of day (activity peaks and valleys)
- Market conditions (price movements affect behavior)
Run tests for at least one full week to capture these variations.