How Geo Experiments Measure Incrementality

Controlled experiments are the most rigorous method available for establishing what advertising truly caused. Here’s what a geo experiment is, why it works, and what separates a result you can act on from an expensive piece of noise.

Thought Leadership  -  7 min read  -  MASS Analytics

The principle behind a geo experiment is ancient and simple: divide your markets, treat some and not others, measure what’s different between them. That difference, after controlling for everything else that was happening at the same time, is what your advertising caused. That’s your incrementality number.

The complexity lies in executing it well enough that the number is trustworthy. A poorly designed experiment gives you a confident answer that’s wrong. A well-designed one gives you an incrementality estimate precise enough to calibrate your MMM and defend major budget decisions.

Why Geography is the Right Unit

Geo experiments have been used in advertising measurement since the 1950s, when P&G and Coca-Cola pioneered regional ad campaigns to measure sales lifts. They fell out of fashion when digital marketing made user-level testing possible. They’re back now, more important than ever, because the landscape has shifted.

The Four Non-negotiables of a Valid Experiment

1 – Designed before the campaign starts

An incrementality experiment is not something you run and then analyse retrospectively. Test and control groups must be assigned before any media runs. Post-hoc regional analysis is observation, not experimentation — and it does not establish causality.

2 – Randomised assignment

Regions are randomly allocated to test and control groups. Randomisation controls for confounding factors you didn’t know existed. Markets are messy. Randomisation levels the playing field — it’s what separates a rigorous estimate from a biased one.

3 – Sufficient scale and duration

Small tests are statistically fragile. Too few markets or too short a window and the experiment lacks the power to detect a real effect. Best practice is large-scale randomisation: all DMAs in the US, large sets of cities or postal clusters elsewhere. Scale strengthens both internal validity and the ability to generalise results.

4 – Clean Execution

If control regions are accidentally exposed to the treatment, or ads don’t deliver correctly in test regions, the incrementality estimate is contaminated. A pre-registered analysis plan, live monitoring during the campaign, and confirmed delivery separation are essential.

What a Good Experiment Gives You

Incremental ROAS (iROAS)

Revenue generated per pound of additional spend in test regions versus control, with confidence intervals. This is a direct causal measurement, not an inferred one.

Sales uplift curves

Week-by-week trajectory of test versus control, with a counterfactual estimate of what control regions would have sold had they been treated.

Response curve shape

By testing at different spend levels, you can establish where diminishing returns begin, directly measured rather than statistically estimated.

MMM calibration input

Formatted as a Bayesian prior or coefficient constraint, designed to feed directly into the modelling workflow and anchor the model’s estimates in causal reality.

When Experiment and Model Disagree

Sometimes an incrementality experiment and your MMM tell different stories. This is not a failure, it’s a signal to investigate. The gap often has a reason: the experimental setup may have introduced bias (a contaminated control, non-homogeneous samples), or the experiment may be measuring a short-term window that doesn’t capture the carryover effects the model accounts for.

One useful approach when results diverge: treat the experiment as an upper bound of the incremental uplift. The experiment shows what’s possible under controlled conditions; the model shows what’s plausible for planning, given the full market context. Used together as a triangulation rather than a competition, both outputs become more reliable over time.

What makes an incrementality estimate trustworthy

Random assignment of regions, not convenience sampling
Designed and pre-registered before media runs
Sufficient scale and duration to achieve statistical power
Clean delivery separation confirmed during the campaign
KPIs and geography aligned with the MMM to enable direct calibration
Carryover effects understood and accounted for when comparing with the model

Back to Series overview

Previous article: The Incrementality Gap: What Your Model Can’t Tell You

Next article: The $1 Billion Incrementality Question

Platform

Solutions

Academy

Resources

Company

Table of Contents