On this page

Why it exists

The MMM Dataset is a shared lab dataset for this MMM curriculum. It is synthetic on purpose: the data-generating process is known, so each lesson can compare a model, chart, or business claim against the source of truth.

Real marketing data almost never gives you that luxury. You usually observe spend, controls, and outcomes, then infer carryover, diminishing returns, and contribution under uncertainty. A transparent synthetic dataset makes the mechanics inspectable before we move into messier validation work.

Downloads

The public files are served from /public/data/mmm/ in the repo and from /data/mmm/ on the website. The generator lives at scripts/generate-mmm-dataset.mjs.

Business scenario

The scenario is a DTC subscription brand with three years of weekly observations.

FieldValue
CadenceWeekly
Rows156
OutcomeNew paid subscriptions
Spend unitThousands of dollars
ChannelsPaid search, paid social, CTV/video, podcast/audio, influencer
ControlsTrend, seasonality, holiday season, promo flag, price index, competitor pressure

The observed modeling dataset is the subset a real analyst would plausibly have: date, controls, media spend, and observed subscriptions. The truth columns are included for teaching and validation.

Column groups

GroupColumnsUse
Keysweek, dateTime index
Controlstrend, seasonality, holiday_season, promo_flag, price_index, competitor_pressureBase demand and business context
Observed media*_spendInputs a normal MMM would use
Truth transforms*_adstock, *_contributionSynthetic source-of-truth for lessons
Outcomeobserved_subscriptionsNoisy observed business outcome
Truth outcomebase_demand, total_media_contribution, expected_subscriptionsDecomposition used for validation

When a lesson wants a realistic modeling exercise, use the observed columns only. When a lesson wants to explain the mechanism, use the truth columns.

Source-of-truth parameters

ChannelLambdaHalf-lifeBetaHalf-saturationSlope
Paid Search0.180.403101351.25
Paid Social0.420.802501651.35
CTV / Video0.722.114203601.80
Podcast / Audio0.621.452101751.55
Influencer0.350.661601051.30

The media transform is recursive geometric adstock:

The saturated contribution is generated with a Hill function:

The observed outcome adds controls and noise:

How modules should use it

  • Adstock as Memory uses *_spend, *_adstock, and *_contribution to show why effects persist after spend stops.
  • Saturation as Diminishing Attention can use the same adstocked media and Hill parameters to show why the next dollar does not behave like the first.
  • Validation modules can fit models using observed columns only, then compare estimated adstock and contribution against the truth columns.
  • Decision Systems modules can use the known response curves for budget scenarios and marginal return exercises.

Reproducibility

The generator is deterministic with seed 20260520. Running it again writes the same CSV and parameter JSON:

Terminal window
node scripts/generate-mmm-dataset.mjs

The generator has no external package dependency. It uses plain Node so the dataset can remain part of the static site workflow.