Marketing Mix Modeling: Measuring Marketing Beyond Last-Click
How adstock and saturation turn aggregate spend and sales data into actionable budget decisions — with a 26-week simulated example you can reproduce.
On this page
- Why it matters
- Background
- Data Requirements
- Method: the model in three pieces
- 1. Adstock (carryover)
- 2. Saturation (diminishing returns)
- A 26-week worked example
- Decomposition: where did the conversions come from?
- Saturation curves: what each channel can give you
- Reallocation: a small, honest lift
- Reproduction
- Business Insights
- Common pitfalls
- Further reading
Why it matters
Last-click attribution gives every credit to whichever ad someone happened to tap before conversion event. It is convenient because it is observable, and useful for seeing where demand was captured. But it is usually misleading as causal measurement: the final click is rarely the whole cause. Channels that close the sale (branded search, retargeting, affiliate, email) can look heroic, while the marketing that created demand earlier (TV, audio, OOH, YouTube, podcasts, display) gets systematically undercounted.
Marketing Mix Modeling (MMM)MMMMarketing Mix ModelingAn econometric approach for estimating how marketing, media, seasonality, pricing, promotions, and external factors contribute to business outcomes. Many MMM projects focus heavily on media budget allocation, which is why media mix modeling is sometimes used interchangeably. sidesteps this by modeling spend and outcomes at an aggregate daily or weekly level. The job of the model is to recover three things that last-click cannot see:
- Carryover — the model applies adstockAdstockcarryover effectThe decaying residual effect of past advertising on current sales. Geometric adstock: Aₜ = Mₜ + λ·Aₜ₋₁, with 0 ≤ λ < 1. Higher λ means a longer-lasting flight (TV ~0.7); lower λ means same-week response (paid search ~0.2). so this week’s effective TV is not only this week’s GRPsGRPGross Rating PointsClassic media-buying unit. GRP = Reach × Frequency. One GRP = 1% of the target audience reached once, on average. Normalizes across price differences so a Super Bowl spot and a 3am spot can be compared on exposure rather than cost.. It is current exposure plus a decaying share of prior exposures: for example, if a TV flight runs in weeks 1–3 and then goes dark, week 4 can still include part of week 3’s weight, week 5 part of week 4’s remaining stock, and so on. Outcomes in week t therefore pick up media from t − 1, t − 2, … until that stock dies off because consumers remember the ad, search later, discuss it, or delay purchase.
- Saturation — the tenth million in TV doesn’t work as hard as the first.
- IncrementalityIncrementalitytrue causal liftThe conversions that would not have happened without the marketing — as opposed to conversions that were going to happen anyway and just followed an ad. Measured by holdout tests; MMM estimates it econometrically; multi-touch attribution doesn't measure it at all. — what would have happened with zero marketing.
Get those three right and budget decisions stop being arguments about credit and start being arithmetic on marginal returns.
Background
Marketing Mix Modeling sits at the intersection of econometrics, marketing science, and brand management. Its intellectual roots go back to mid-20th-century marketing research, including the marketing-mix frameworks of the 1950s and 1960s, while early computerized marketing-mix models emerged in the 1970s and commercial MMM adoption accelerated among CPG and Fortune 500 brands in the late 1980s and 1990s. The shape of the modern formulation — base + adstockedAdstockcarryover effectThe decaying residual effect of past advertising on current sales. Geometric adstock: Aₜ = Mₜ + λ·Aₜ₋₁, with 0 ≤ λ < 1. Higher λ means a longer-lasting flight (TV ~0.7); lower λ means same-week response (paid search ~0.2)., saturatedSaturationdiminishing returns curveNon-linear response that turns extra ad spend into less-than-proportional extra sales. Modeled with the Hill function: f(x) = β·x^α / (K^α + x^α). The slope at your current spend is the marginal return — the number that should drive reallocation. media + controls — has been stable since the 1990s. What has changed is the tooling: frameworks like Meta’s Robyn automate regularized regression workflows with adstock, saturation, and hyperparameter search, while Bayesian frameworks like Google’s Meridian make priors and uncertainty first-class.
Unlike user-level attribution, MMM does not need cookies, MAIDs, or a logged-in graph. It needs a time series. That makes it the measurement workhorse of the post-IDFAIDFAIdentifier for AdvertisersApple's per-device advertising ID on iOS. Since iOS 14.5 (2021), apps must request user permission to read it (App Tracking Transparency) and most users decline. The 'post-IDFA era' broke user-level mobile attribution and pushed measurement back toward aggregate techniques like MMM., post-third-party-cookie era.
Data Requirements
A minimum-viable MMM needs three blocks of weekly (or daily) data:
| Block | Examples | Granularity |
|---|---|---|
| Media | Spend by channel; impressions, GRPsGRPGross Rating PointsClassic media-buying unit. GRP = Reach × Frequency. One GRP = 1% of the target audience reached once, on average. Normalizes across price differences so a Super Bowl spot and a 3am spot can be compared on exposure rather than cost., clicks where available | Per channel per week |
| Outcome | Sales, orders, sign-ups, revenue | Same cadence as media |
| Controls | Price, promo, distribution, holidays, weather, competitor activity | Same cadence |
Rule of thumb: two to three years of history, enough variation in spend by channel (no channel always-on at the same level), and a clean outcome series. If TV ran the same flight every week for three years, no model can disentangle its effect from base demand.
Method: the model in three pieces
The standard MMM is a regression where the response is decomposed into base demand, transformed media, and controls:
Two transformations carry most of the meaning.
1. Adstock (carryover)
The simplest form is geometric: this week’s effective media is what you spent plus a decayed fraction of last week’s effective media.
λ ≈ 0.7(TV, brand): a flight keeps working for weeks.λ ≈ 0.2(paid search, direct response): mostly same-week.
The steady-state multiplier is 1 / (1 − λ), so a channel with λ = 0.7 accumulates roughly 3.3× its weekly spend in effective media before saturation.
More flexible variants — Weibull adstock with a separate delay parameter — let the peak land a week or two after exposure (think TV creative that needs time to be remembered).
2. Saturation (diminishing returns)
Doubling spend rarely doubles sales. The Hill function is the workhorse:
β— the asymptote (max incremental conversions the channel can drive).K— the half-saturation point (spend at which response isβ/2).α— steepness;α > 1gives an S-curve,α ≤ 1gives pure diminishing returns.
The combination of adstock then saturation is the right order: media accumulates first, then runs into the response wall.
A 26-week worked example
Below is a single hypothetical brand running TV, Digital, and Print over 26 weeks. The numbers in every chart that follows come from running this simulation — same seed, same parameters, same data.
# Adstock + Hill response, three channels
channels = { "TV": {"lam": 0.70, "alpha": 2.2, "K": 220, "beta": 200}, "Digital": {"lam": 0.25, "alpha": 1.6, "K": 90, "beta": 110}, "Print": {"lam": 0.50, "alpha": 1.8, "K": 60, "beta": 55},}
def adstock(spend: list[float], lam: float) -> list[float]: """Geometric carryover: A_t = M_t + lam * A_{t-1}.""" out: list[float] = [] for x in spend: out.append(x + lam * (out[-1] if out else 0.0)) return out
def hill(x: float, *, alpha: float, K: float, beta: float) -> float: """Saturation: beta * x^alpha / (K^alpha + x^alpha).""" if x <= 0: return 0.0 return (beta * x**alpha) / (K**alpha + x**alpha)
# contribution_t = hill(adstock(spend, lam)[t], **channel_params)The TV plan is deliberately flighted (three weeks on, three weeks off) so you can see carryover with the naked eye. Digital is steady with a year-end push. Print is small and periodic.
Decomposition: where did the conversions come from?
Two things to notice:
- Adstock is visible. TV spend is zero in weeks 4–6, but the TV bar is still ~100 conversions tall. That is last month’s flight still selling.
- The base layer carries roughly a third of demand. In this simulation it is 35%. In real engagements, base typically lands between 30% and 70% depending on category. Brands that confuse base with marketing-driven conversions chronically overestimate ROI.
Over the full 26 weeks the share of conversions breaks down to Base 35%, TV 43%, Digital 18%, Print 4%. Any last-click report on this same brand would award almost all of the credit to Digital (and within Digital, mostly to branded search).
Saturation curves: what each channel can give you
Plotting steady-state response against weekly spend on a common axis makes diminishing returns concrete.
Read the curves left to right:
- Print climbs almost vertically up to $60k, then flattens. It is small but the most efficient at low budgets.
- Digital rolls over around $120–150k weekly spend. Past that point, more spend is mostly waste.
- TV needs scale: the first $50k buys very little, but the curve keeps climbing past $300k. TV rewards commitment, which is why brands that “test” it with low budgets usually conclude it doesn’t work.
The slope of each curve at the current spend level is the marginal ROI. The channel with the steepest slope is the one you should fund next.
Reallocation: a small, honest lift
At the average weekly spend in this simulation (TV $175k, Digital $93k, Print $18k), the marginal conversion per extra $1k is:
| Channel | Marginal conversions per +$1k/week | Reading |
|---|---|---|
| 1.12 | Severely underspent | |
| Digital | 0.45 | Near optimum |
| TV | 0.24 | Past the bend |
Holding the total weekly budget at $286k and grid-searching for the allocation that maximizes steady-state response:
The optimizer wants to cut TV by ~$35k/week and roughly triple Print, leaving Digital essentially unchanged. The projected lift is +5.2% incremental conversions — same budget, same channels, better mix.
That number is intentionally modest. Real MMM engagements that promise “+25% from reallocation” are usually conflating reallocation gains with the gain from also turning off ineffective channels, or they are working off uncertainty bands wide enough to drive a truck through. A well-calibrated MMM delivers small, repeatable lifts that compound quarter over quarter.
Reproduction
The full simulation, the Hill and adstock helpers, and the optimizer grid-search are about 60 lines of Python. Drop the snippet from the method section into a .py file and run it — every number, table, and chart on this page is reproducible from that script.
A companion Colab walking through the same example with PyMC priors and posterior predictive checks is on the roadmap. For now, see the further reading below for production-grade libraries.
Business Insights
- Stop reading channel ROI in a vacuum. A channel’s average ROI is the area under its response curve divided by spend. The number that drives decisions is the slope at your current spend — i.e. marginal ROI.
- Underinvested channels matter more than over-saturated ones. In the example above, Print is 6% of spend but generates the biggest marginal returns. Most “optimization” wins come from moving the last dollar, not the first.
- Always-on TV is a measurement problem, not just a media one. Without flighting (gaps where TV is off), the model cannot separate TV from base. Build small holdouts into the plan.
- Quote uncertainty. A point estimate that says “TV ROI = 2.4” is half a number. “TV ROI = 2.4 with 80% credible interval [1.7, 3.1]” is a decision-ready number. Bayesian MMM gives you the second for free.
Common pitfalls
- Multicollinearity. If TV and OOH always flight together, regularization (ridge, lasso, or Bayesian priors) is non-optional. Otherwise the model splits credit arbitrarily.
- Endogenous spend. Spending more when you expect a good week (planned promos) makes channels look better than they are. Use price, promo depth, and seasonality as controls.
- Aggregation bias. A national MMM hides geo-level heterogeneity. If you have DMA-level data, fit a hierarchical model — you’ll get tighter priors and free experiments out of the natural variation.
- Calibration drift. Models go stale. Refit quarterly; calibrate annually against a real incrementality test (a geo holdout or matched-market test).
Further reading
- Meta — Robyn open-source MMM
- Google — Meridian (Bayesian MMM)
- Jin et al. — Bayesian Methods for Media Mix Modeling with Carryover and Shape Effects (Google Research)
- Chan & Perry — Challenges and Opportunities in Media Mix Modeling (Google Research)
MMM is not the only measurement tool — incrementality tests, MTAMTAMulti-Touch AttributionA user-level approach that distributes credit for a conversion across the ads that user saw (rules like last-click / first-click / linear, or models like Markov / Shapley). Needs a stitched user graph and doesn't measure incrementality — only splits observed credit., and lift studies each answer questions MMM cannot. But for the recurring question of “where should the next dollar go?”, an honestly built MMM is still, decades after it was invented, the best answer marketing has.