Incrementality Testing in Marketing: Closing the Causal Gap

What this article argues
  • How MMM and incrementality testing function as complementary layers, each doing a distinct job
  • How geo-based randomized controlled trials work as the standard incrementality test design
  • The Bayesian calibration loop that feeds experimental results back into the model
  • Why MMM should guide experimental design, not the other way round
  • How to treat divergence between experiments and MMM as a signal, not a failure

MMM is, for most organizations, the most complete picture of marketing performance they have. It reads the full media mix — including channels that attribution cannot see. Furthermore, it separates media effects from pricing, seasonality, and competitive activity, providing strategic answers at the scale a CFO can act on.

For over two decades, MMM has been the measurement method that serious marketing organizations build their planning around.

However, there is one question MMM does not settle on its own: causation. The model produces strong statistical inference — it tells you what correlated with your sales outcome. That correlation is not the same as proving the channel caused the outcome.

In most business decisions, that distinction does not matter. But in genuinely contested decisions — where budget is large, stakeholder skepticism is high, and someone in the room demands proof — it matters enormously.

Why Causation Changes Everything

Incrementality testing is what produces that proof. Not as a replacement for MMM, but as a precision layer on top of it. And it is only useful when MMM is doing its job first.

I have spent twenty years watching brands make expensive decisions on the wrong read of their data. The pattern is consistent: a channel looks high-ROI in the model, gets more budget, and the business grows. Everyone assumes the channel is working. Then someone cuts it in a downturn and nothing changes — or someone defends it in a board meeting and a skeptical CFO asks for causal proof, and the room goes quiet.

Incrementality testing is what ends that silence. Here is how the methodology actually works.

The Measurement Gap That Matters

An MMM that estimates a channel’s contribution at 8% produces a number with a confidence interval. In contrast, an experiment that withholds the channel in matched markets and observes a 7.4% revenue decline produces a number with causal certainty. When MMM is challenged by a skeptical CFO, an experimental result is what closes the conversation.

Diagram titled "The Integrated Measurement Architecture" showing three method columns — Attribution, MMM, and Experimentation — each compared across four dimensions: Granularity, Coverage, and Evidence, with a use-case label at the bottom. Attribution (cyan border) is the tactical reader: user-level, real-time, touchpoint granularity, digitally tracked coverage, correlational credit evidence, used for digital tactical adjustments. MMM (orange border) is the strategic anchor: aggregate, long horizon, channel and campaign granularity, all activity tracked or not, strong statistical inference, used for board defence and planning. Experimentation (dark navy border) is the causal validator: controlled, point-in-time, test versus control granularity, treatment under test coverage, causal proof evidence, used for calibration and contested decisions. To the right, three dashed arrows point to four stacked outcome boxes: Real-time Layer (in-campaign adjustment), Strategic Answer (allocation and scenarios), Causal Anchor (contested decisions), and One Evidence Base (three coherent readings).
Figure 1 – The integrated measurement architecture. MMM is the strategic anchor, providing full-channel coverage and board-level answers. Attribution reads tactical granularity in the digital layer. Experimentation provides causal proof for contested decisions.

What Incrementality Testing Actually Measures

Incrementality is a simple concept. Specifically, it asks: how much of this revenue would have happened without our advertising? Rather than identifying which channels look like they are performing, or which platform reports the highest ROAS, it asks what our marketing actually, causally produced.

The incrementality test answers that question through a controlled experiment. It withholds the advertising treatment from one group, applies it to another, and measures the difference in outcome. That difference is the incremental effect. Essentially, the channel either caused additional revenue or it did not.

This is fundamentally different from what attribution models and MMM produce. Attribution assigns credit across the digital touchpoints it can observe, while MMM uses statistical inference to separate the contribution of each factor from historical data. Both are powerful. However, neither is a controlled experiment.

Incrementality testing is therefore the only method that establishes whether a causal relationship exists — not just whether a correlational one does.

The distinction has real teeth. A channel can look high-ROI in your MMM and still be capturing sales that would have happened regardless. Conversely, it can look low-ROI and be driving significant incremental revenue through a pathway the model cannot fully see. MMM gives you a very strong read; incrementality testing tells you whether that read holds up under causal scrutiny.

Diagram explaining how an incrementality test works. Two panels side by side: Test Regions (advertising on) covering Regions A, B, C, and D with a measured revenue of 120 units; and Control Regions (advertising off) covering Regions E, F, G, and H with a measured revenue of 85 units. A dashed vertical line separates the two panels. An orange banner at the bottom reads: Incremental Lift = 120 − 85 = 35 units. Causal proof: the advertising caused 35 additional units of revenue.
Figure 2 – How an incrementality test works. Test regions receive advertising; matched control regions do not. The revenue difference between the two groups, normalized for baseline trends, is the incremental lift: causal proof that the advertising produced the outcome
Correlation vs. Causation

MMM tells you what correlated with your sales outcome. Incrementality testing tells you what caused it. The gap between those two statements is where the most expensive decisions get made.

How MMM and Incrementality Testing Compound Each Other

MMM and incrementality testing are not alternatives. Rather, they cover different blind spots and make each other more valuable when combined.

Why MMM Coefficients Need Experimental Grounding

MMM coefficients carry uncertainty. When a channel has little variation in spend over the modelling period, the model struggles to isolate its contribution precisely. As a result, the coefficient is statistically weak — you have a range of plausible estimates rather than a precise number. This is not a flaw in the methodology; it is an honest reflection of what the data can support.

Incrementality testing fills exactly that gap. Where the model has a wide, uncertain coefficient, you run an experiment. The experiment provides a causal lift estimate for that channel under controlled conditions. That estimate is then used to calibrate the model: the uncertain coefficient is anchored to observed causal evidence rather than left floating in statistical inference.

How Each Method Strengthens the Other

The relationship works in both directions. MMM guides better experiment design — it tells you how many weeks a test needs to run for a given channel, which regions provide the best test-control match, and what level of spend variation gives the experiment its best chance of yielding a usable estimate. Moreover, MMM anchors the experiment in commercial reality. The experiment, in turn, provides causal proof that strengthens the model. Together they compound.

How Geo-Based Randomized Controlled Trials Work in Practice

The standard implementation for incrementality testing in MMM calibration is the geographic randomized controlled trial. Specifically, it uses designated market areas (or equivalent geographic units) as the experimental unit. Some regions receive the advertising treatment; others do not. The difference in commercial outcomes between groups — after normalizing for baseline differences — is the estimated incremental effect.

Notably, the method is privacy-safe, omnichannel, and fully auditable. It requires no individual-level tracking and no vendor compliance beyond standard geo-targeting capability.

Scale matters critically. In the United States, all 210 nationally defined DMAs should be included wherever possible. More units mean greater statistical power and more precise lift estimates. The temptation to run small, fast experiments on a handful of markets produces estimates too uncertain to be trusted as calibration inputs.

The $1.5 billion branded search case we have written about elsewhere demonstrates precisely what is at stake when the methodology is applied at full scale. In that case, a channel flagged for cuts was generating revenue an order of magnitude larger than attribution had measured, through conversion paths that never appeared in the digital funnel.

Let MMM Guide the Experiment Design

Before running an experiment, ask MMM three questions: How many weeks does it need to run to produce a reliable signal? What split between test and control markets gives enough statistical power? What range of spend variation will produce a measurable lift? Without those inputs, experiments are often too short, too small, and too uncertain to be usable.

→ Getting curious? Read Part 3

The $1 Billion Incrementality Question ›

→ Getting curious? Read Part 2

How Geo Experiments Measure Incrementality ›

The Calibration Loop: How Experimental Results Enter the Model

The output of a geo-experiment is not a final verdict. Rather, it is structured causal evidence with a known confidence interval. The way that evidence enters the model is through Bayesian updating.

The experimental lift estimate and its standard error serve as the prior. The model’s likelihood from the observed historical data provides the updating step. As a result, the posterior is the calibrated coefficient that enters the next model run — reflecting both the observational evidence from the data and the causal evidence from the experiment, weighted by the precision of each.

Diagram showing the Bayesian-experiment loop: three connected stages — Bayesian MMM (posterior per coefficient estimated, high-uncertainty channels flagged), Experimental Design (MMM guides duration, test/control and variation, designed to detect causal lift), and Prior Update (result formalised as prior for the next model run, continuous knowledge growth). Solid arrows connect the stages left to right; a cyan dashed feedback arrow loops from Prior Update back to Bayesian MMM, illustrating that the posterior from each run becomes the prior for the next.
Figure 3 – How an incrementality test works. Test regions receive advertising; matched control regions do not. The revenue difference between the two groups, normalized for baseline trends, is the incremental lift: causal proof that the advertising produced the outcome.

The Compounding Cycle

The loop this creates is the most important structural feature of a mature measurement program:

  1. The model estimates channel contributions and identifies where coefficient uncertainty is greatest.
  2. Those channels consequently become the priority candidates for experimentation.
  3. The experiment produces a causal lift estimate.
  4. That estimate then calibrates the model.
  5. The improved model subsequently identifies the next most valuable experiment.

Each cycle compounds the precision and commercial credibility of both the model and the experimental evidence base.

Diagram of the MPE Cycle — a four-stage continuous loop connecting MMM and incrementality testing. Stage 1, Build/Refine Model (MASS Analytics · Marketing Mix Modelling): estimate channel contributions from observational data and identify where coefficient uncertainty is greatest. An arrow labelled "identify uncertainty" leads to Stage 2, Design Experiment (Central Control, Inc. · Rolling Thunder framework): prioritise the highest-uncertainty channel and design a geo-RCT at national scale aligned to the model's KPI structure. An arrow labelled "execute" leads to Stage 3, Run Experiment (Central Control, Inc. · iROAS measurement): execute the geo-RCT across all markets and measure incremental lift with full confidence intervals pre-specified. An arrow labelled "causal evidence" leads to Stage 4, Calibrate Model (MASS Analytics · Bayesian coefficient update): use the lift estimate as a Bayesian prior, update the coefficient distribution, and return to Step 1 with improved inputs. An arrow labelled "update" closes the loop back to Stage 1.
Figure 4 – The MPE continuous learning cycle. Build and refine the model, design the experiment, run it, calibrate with the result. Each iteration compounds precision across the measurement program

Why Sustained Programs Beat One-Off Experiments

The most common mistake in implementing this approach is treating it as a one-time exercise. A sustained program of experiments that progressively builds a causal evidence library across every major channel is qualitatively different from a single calibration run. Crucially, it is an asset no competitor can replicate quickly, because it requires time, data, and discipline in sequence.

A System, Not a Project

The model identifies where uncertainty is highest. The experiment then targets exactly that uncertainty. The result calibrates the model, and the improved model subsequently identifies the next uncertainty most worth testing. This is a continuously compounding system. A single experiment is a data point; a sustained program is a strategic asset.

Forthcoming Book

What Great CMOs Need: An Executive Playbook for Modern Marketing Mix Modelling

By Dr. Ramla Jarrar and Dr. Firas Jabloun. Be the first to receive it.

Get early access →

When Experiments and MMM Disagree

Sometimes they tell different stories. The experiment may signal a larger uplift than the model estimates, or the model’s coefficient implies more contribution than the experiment can find.

The first instinct is to decide which one is right. That is usually the wrong frame. Divergence between experiments and MMM is a signal to investigate, not a tie-breaker to call.

Diagram titled "Treating Divergence as a Signal" showing how to respond when MMM and an incrementality experiment produce different results. On the left, the MMM Estimate box shows iROAS = 4.2 (strong statistical inference, full historical period, all channels). On the right, the Experiment Result box shows iROAS = 6.1 (causal evidence, point-in-time, 6-week geo-RCT, Q3 only). Both feed into a central diamond labelled Divergence — Different readings. A dashed arrow points downward to "Investigate Both Sources," which branches into two diagnostic boxes: Check the Experiment Design (control group contamination, baseline trend matching, test duration, sample size and statistical power) and Check the Model Assumptions (representative period, seasonal effects, structural breaks, time period overlap). A blue banner at the bottom reads: Result: a refinement that makes both sources stronger. Use each as a bound, not a verdict.
Figure 5 – Treating divergence as a signal, not a failure. When MMM and the experiment produce different estimates, investigate both sources. Check the experimental design for contamination, sample size, and duration. Check the model assumptions for seasonal effects and structural breaks. The answer almost always produces a refinement that makes both sources stronger

1. Check the Experimental Design

A contaminated control group, markets that were not trending identically before the test launched, or a sample size too small to produce a reliable estimate — each of these can inflate or deflate the measured uplift. Specifically, an experiment that runs for six weeks on 20 markets is not equivalent evidence to a geo-RCT across 210 DMAs.

2. Check the Model Assumptions

If the experiment covers a specific season or market condition that does not represent the broader historical period, the model’s coefficient may reflect a different environment than the one the experiment measured. In such cases, seasonal or structural mismatch is often the culprit.

3. Use Each as a Bound, Not a Verdict

The experiment shows what is possible under test conditions. The model, in contrast, shows what is plausible for planning given the full commercial structure. Rather than overriding each other, the two should inform each other. The goal is calibration that a team can act on with confidence.

“When experiments and MMM tell different stories, treat the difference as the most valuable signal in your measurement program. The answer is almost always a refinement that makes both stronger.”

The Question Your Next Budget Review Will Ask

Marketing budget decisions are defended on evidence. The quality of that evidence determines how much latitude you get. A model result with a confidence interval defends a position; a causal result from a controlled experiment closes the conversation entirely.

The brands that build an integrated measurement architecture — MMM for the strategic view and continuous incrementality testing for the causal layer — are therefore building an evidence base that compounds over time. Each experiment calibrates the model. Each model run, in turn, identifies the next experiment most worth commissioning. As a result, the architecture grows more precise and more credible with every cycle.

If you are running MMM without an incrementality program alongside it, you are answering correlation questions with precision and leaving the causal questions open. Your CFO will eventually ask the causal question. The gap between your model and your proof is ultimately the gap they will stand in.

→ Getting curious? Read Part 6

Building an Always-On Incrementality Program ›

Ready to close the causal gap?

Talk to MASS Analytics about building an integrated measurement architecture that compounds in precision with every experiment you run.

Book a demo →

Key Takeaways

  • MMM is the strategic backbone. It covers the full media mix, produces board-defensible ROI estimates, and identifies where experimental evidence would add the most value.
  • Incrementality testing adds a causal layer on top. It is the only method that establishes whether advertising caused an outcome under controlled conditions.
  • Geographic RCTs are the standard design: privacy-compliant, omnichannel, independent of vendor compliance, and fully auditable.
  • The calibration loop is a compounding system. The model identifies uncertain coefficients; the experiment targets them; the result calibrates the model; and the improved model identifies the next experiment worth commissioning.
  • Divergence between experiments and MMM is a refinement signal. Investigate both sources rather than adjudicating. The answer almost always makes both stronger.