Controlled experiments are the most rigorous method available for establishing what advertising truly caused. Here’s what a geo experiment is, why it works, and what separates a result you can act on from an expensive piece of noise.
Thought Leadership - 7 min read - MASS Analytics
The principle behind a geo experiment is ancient and simple: divide your markets, treat some and not others, measure what’s different between them. That difference, after controlling for everything else that was happening at the same time, is what your advertising caused. That’s your incrementality number.
The complexity lies in executing it well enough that the number is trustworthy. A poorly designed experiment gives you a confident answer that’s wrong. A well-designed one gives you an incrementality estimate precise enough to calibrate your MMM and defend major budget decisions.
Why Geography is the Right Unit
Geo experiments have been used in advertising measurement since the 1950s, when P&G and Coca-Cola pioneered regional ad campaigns to measure sales lifts. They fell out of fashion when digital marketing made user-level testing possible. They’re back now, more important than ever, because the landscape has shifted.
The Four Non-negotiables of a Valid Experiment
1 – Designed before the campaign starts
An incrementality experiment is not something you run and then analyse retrospectively. Test and control groups must be assigned before any media runs. Post-hoc regional analysis is observation, not experimentation — and it does not establish causality.
2 – Randomised assignment
Regions are randomly allocated to test and control groups. Randomisation controls for confounding factors you didn’t know existed. Markets are messy. Randomisation levels the playing field — it’s what separates a rigorous estimate from a biased one.
3 – Sufficient scale and duration
Small tests are statistically fragile. Too few markets or too short a window and the experiment lacks the power to detect a real effect. Best practice is large-scale randomisation: all DMAs in the US, large sets of cities or postal clusters elsewhere. Scale strengthens both internal validity and the ability to generalise results.
4 – Clean Execution
If control regions are accidentally exposed to the treatment, or ads don’t deliver correctly in test regions, the incrementality estimate is contaminated. A pre-registered analysis plan, live monitoring during the campaign, and confirmed delivery separation are essential.
What a Good Experiment Gives You
Incremental ROAS (iROAS)
Revenue generated per pound of additional spend in test regions versus control, with confidence intervals. This is a direct causal measurement, not an inferred one.
Sales uplift curves
Week-by-week trajectory of test versus control, with a counterfactual estimate of what control regions would have sold had they been treated.
Response curve shape
By testing at different spend levels, you can establish where diminishing returns begin, directly measured rather than statistically estimated.
MMM calibration input
Formatted as a Bayesian prior or coefficient constraint, designed to feed directly into the modelling workflow and anchor the model’s estimates in causal reality.
When Experiment and Model Disagree
Sometimes an incrementality experiment and your MMM tell different stories. This is not a failure, it’s a signal to investigate. The gap often has a reason: the experimental setup may have introduced bias (a contaminated control, non-homogeneous samples), or the experiment may be measuring a short-term window that doesn’t capture the carryover effects the model accounts for.
One useful approach when results diverge: treat the experiment as an upper bound of the incremental uplift. The experiment shows what’s possible under controlled conditions; the model shows what’s plausible for planning, given the full market context. Used together as a triangulation rather than a competition, both outputs become more reliable over time.
What makes an incrementality estimate trustworthy
- Random assignment of regions, not convenience sampling
- Designed and pre-registered before media runs
- Sufficient scale and duration to achieve statistical power
- Clean delivery separation confirmed during the campaign
- KPIs and geography aligned with the MMM to enable direct calibration
- Carryover effects understood and accounted for when comparing with the model
Back to Series overview
Previous article: The Incrementality Gap: What Your Model Can’t Tell You
Next article: The $1 Billion Incrementality Question
