Scientific Modeling Made Simple

A practical guide for everyone—no PhD required

Statistical models are tools that help us understand what's really happening with our marketing. This guide explains the core principles in plain language, with visual demonstrations you can explore yourself. By the end, you'll understand why our approach produces more reliable insights than traditional methods.

Where this page fits

This is the guided read for first-time analysts. When you want to work the same workflow hands-on, the interactive step-by-step version walks all nine stages with live controls; for the underlying philosophy, see Scientific Statistical Modeling.

What is a Statistical Model?

At its core, a statistical model is simply a story about how your data came to be. It's a mathematical way of saying "we think sales happen because of these factors, in this way."

Think of it like a recipe

A cooking recipe tells you: "If you combine these ingredients in these proportions, you'll get this dish." A statistical model does the same thing for business outcomes: "If TV spend is this much, and digital is this much, and it's this time of year, you'll get approximately this many sales."

Just like recipes can be simplified (leaving out optional garnishes) or detailed (specifying exact temperatures), models can be simple or complex. The goal isn't to capture every possible factor—it's to capture the ones that matter for your question.

The Map Analogy

🗺️

Models are like maps, not photographs

A subway map doesn't show every building, tree, or street in the city. It leaves out most details to focus on what matters: how to get from A to B.

Similarly, our marketing models don't capture every factor that affects sales. They focus on the factors we can measure and control—like ad spend—while acknowledging that other things matter too.

This perspective is actually freeing: we're not trying to find some "perfect" model that captures all of reality. We're building useful tools for making better decisions, while being honest about what they can and can't tell us.

All Models Are Wrong (And That's OK)

There's a famous saying in statistics: "All models are wrong, but some are useful." This isn't pessimism—it's wisdom.

The weather forecast comparison

Weather forecasts are "wrong" in the sense that they can't predict exactly when and where each raindrop will fall. But they're incredibly useful—you probably check the forecast before planning outdoor activities.

Marketing models work the same way. We won't predict your exact sales next Tuesday, but we can tell you whether TV is likely working better than social media, and give you a realistic range of what to expect.

The key question isn't "Is this model perfect?"

It's "Is this model useful enough for the decision I need to make?" A model that tells you TV's ROI is somewhere between 1.5 and 2.5 is useful for deciding whether to keep investing in TV. It doesn't need to tell you the ROI is exactly 1.87.

The Story-Telling Approach

The best models start with a clear story about how things work. We call this a "generative" approach—we can use the model to generate fake data that should look like real data if our story is correct.

Like a flight simulator

Pilots train in flight simulators before flying real planes. A simulator is a "model" of how flying works—it generates realistic experiences based on the rules of physics and aerodynamics.

If the simulator produces wildly unrealistic results (planes flying backwards, for instance), you'd know something is wrong with the underlying model. Our statistical models work the same way: we can run them "forward" to generate synthetic data, and check whether it looks plausible.

Checking Your Assumptions

Before we even look at real data, we can test whether our assumptions make sense. This is called a prior predictive check—we're checking what our model predicts based only on our assumptions, before learning from the data.

🔬 Interactive: See Your Assumptions in Action

Move the sliders to change how confident we are about media effects. Watch how this changes the range of sales our model thinks are possible—before seeing any real data.

How confident about media effect? 0.30

How confident about baseline sales? 0.20

What you're seeing: Each gray line shows a possible sales trajectory based on our assumptions. If the gray band covers impossible values (like negative sales), our assumptions need adjustment. The green dashed lines show a realistic range for actual sales.

This might seem like extra work, but it catches problems early. If our assumptions imply that weekly sales could be negative or could be $1 billion, we know something is wrong before we waste time fitting the model.

Start With Questions, Not Data

The most common mistake in analysis is jumping straight to the data without first clarifying what question you're trying to answer. The question determines everything else: what data you need, what model is appropriate, and how to evaluate success.

Navigation without a destination

Imagine opening Google Maps before deciding where you want to go. You'd just be zooming around randomly. You need a destination first—then the tool becomes useful.

Similarly, "What does the data say?" is the wrong first question. Better questions are: "Should we increase TV spend?" or "Which channel drives the most incremental sales?"

🧭

Different questions require different approaches. If you want to know whether TV ads cause sales increases (vs. just happening to coincide with them), that requires different methods than simply describing what happened.

Learning Through Iteration

Scientific modeling is inherently iterative. We build a model, check if it makes sense, learn from what doesn't work, and improve. This isn't a bug—it's the process working as intended.

The Scientific Modeling Cycle

flowchart LR Q[🎯 Define Question] --> S[📖 Tell the Story] S --> B[🔧 Build Model] B --> C[✓ Check Assumptions] C -->|Problems?| S C -->|OK| F[📊 Fit to Data] F --> D[🔍 Diagnose Results] D -->|Issues?| B D -->|OK| R[📋 Report Findings] style Q fill:#f0f7e6,stroke:#6d8a4a style S fill:#e6f0f7,stroke:#4a6d8a style B fill:#f7f0e6,stroke:#8a6d4a style C fill:#e6f7f0,stroke:#4a8a6d style F fill:#f0e6f7,stroke:#6d4a8a style D fill:#f7e6f0,stroke:#8a4a6d style R fill:#f0f7e6,stroke:#6d8a4a

Each time through the cycle, we learn something. Maybe our assumptions were too loose (the model thinks sales could be negative). Maybe the model can't capture an important pattern in the data (it misses every December spike). Each failure teaches us what to fix.

Honest Iteration vs. Specification Shopping

Here's the crucial distinction: there's a right way and a wrong way to iterate on models. The difference isn't about the mechanical actions—it's about what's driving the changes.

✓ Honest Scientific Iteration

Changes driven by:

Model produces impossible predictions
Clear patterns the model misses
Domain expertise suggests missing factors
Pre-planned alternatives to test

The goal: Make the model better at capturing reality

✗ Specification Shopping

Changes driven by:

"That coefficient should be positive"
"The ROI needs to be above 1.0"
"This doesn't match our expectations"
"The client won't like this result"

The goal: Get the results you wanted

⚠️ Why Specification Shopping Is Dangerous

When you adjust your model until you get the results you want, you're not discovering anything—you're just manufacturing the answer you already had in mind. The statistics look legitimate, but they've lost their meaning.

It's like weighing yourself on different scales until you find one that shows the number you want. The scale isn't wrong—you've just selected on the outcome.

The Fishing Problem

Imagine fishing in a pond with 100 fishing rods. If you cast all 100 and report only the one that caught a fish, you haven't proven you're a good angler—you've just tried a lot of times. This is the multiple testing problem.

⚠️ Interactive: Watch False Positives Multiply

When analysts test many model specifications and report the best one, the chance of finding something "significant" by pure luck skyrockets. Move the slider to see the effect.

Number of specifications tested: 20

Effective False Positive Rate: 64.2%

What the report claims: 5%

The gap is staggering: When you test 20 specifications and report the best one, your actual false positive rate is about 64%—not the 5% your statistics claim. You're over 12x more likely to be fooled by random noise.

The birthday party trick

At a party with 23 people, there's a 50% chance two people share a birthday. With 50 people, it's 97%. This feels surprising because we think about finding someone with our birthday—but the question is finding any match.

Specification shopping works the same way. Each individual test has a small chance of a false positive. But when you run many tests and pick the best one, you're not asking "did this specification get lucky?" You're asking "did any specification get lucky?" And the answer is usually yes.

The Winner's Curse

Even when there is a real effect, specification shopping makes you overestimate it. This is called the Winner's Curse: the specification that "wins" (looks best) is usually the one that got lucky with random noise.

📊 Interactive: See the Bias Build Up

Each specification estimates the same true effect, but random variation makes some estimates higher and some lower. When you pick the highest one, you systematically overestimate.

True media effect: 0.20

Noise in each estimate: 0.10

Specifications tested: 20

True effect: 0.20

Expected reported effect: 0.29

Overestimation: +45%

This explains failed replications: Effects that were "discovered" through specification shopping often fail to replicate because the original estimate was inflated by lucky noise. The true effect is smaller than what was reported.

The sports tryout analogy

Imagine evaluating basketball players based on a single day's performance. The player who scores most that day isn't necessarily the best—they might have had an unusually good day. If you sign them expecting that performance to continue, you'll be disappointed.

Specification shopping does exactly this with statistical estimates: it selects the estimate that performed best on this particular dataset, which is often unusually good due to chance.

Sensitivity Analysis: Stress-Testing Your Conclusions

The solution to these problems isn't to never make any modeling choices—that's impossible. It's to test how much your conclusions change across reasonable alternatives. This is called sensitivity analysis.

🔄 Interactive: How Robust Are Your Results?

Good results hold up across reasonable alternative assumptions. Fragile results flip-flop. Watch how honest reporting differs from cherry-picked reporting.

Specifications analyzed: 10

Variation across specs: 0.15

Honest reporting: The shaded band shows the full range of results across reasonable specifications. A finding is "robust" if it holds across this range (e.g., the effect is always positive). It's "fragile" if it could flip.

Robust findings vs. fragile findings

Some findings hold up no matter how we specify the model: "TV has a positive effect on sales" might be robust. Others are fragile: "TV's ROI is exactly 2.3" might vary wildly across specifications.

Honest analysis reports both—and clearly distinguishes which conclusions you can trust and which are more speculative.

External Validation: The Ultimate Test

The best way to know if your model works is to test it against reality. This means using it to make predictions and then checking if those predictions come true.

🎯

Predictions create accountability

A weather forecaster who's always wrong eventually loses credibility. Models should face the same test: if they consistently make predictions that fail, we should trust them less.

We recommend running controlled experiments (like geo-tests) to validate model predictions whenever possible. Models that pass these tests earn more trust.

Communicating Uncertainty Honestly

Perhaps the most important skill in statistical modeling is honest communication. This means reporting what you actually know, what you don't know, and how confident you should be in each conclusion.

❌ False Confidence

"TV ROI is 2.3"

Implies precision that doesn't exist. Hides the uncertainty.

✓ Honest Uncertainty

"TV ROI is likely between 1.8 and 2.8"

Shows the actual range. Enables better decisions.

Counterintuitively, honest uncertainty is more useful than false precision. If TV's ROI could be anywhere from 1.5 to 3.0, you know TV is clearly positive—worth investing in. But you also know not to optimize to the third decimal place, because that precision is an illusion.

The GPS analogy

When GPS says "arrive in 45-55 minutes," that's more useful than "arrive in exactly 47 minutes"—because you know to plan for the range. Reporting a single number implies a confidence that doesn't exist and leads to worse planning.

Key Principles to Remember

The Five Pillars of Scientific Modeling

1. Start with questions, not data.

Know what you're trying to learn before you start analyzing. The question shapes everything.

2. Build models as stories.

A good model tells a clear story about how things work. If you can't explain the story, the model isn't ready.

3. Check assumptions before results.

Run prior predictive checks. Make sure your assumptions don't imply impossible things.

4. Iterate honestly.

Change models to fix problems, not to get desired results. Document what you tried and why.

5. Report uncertainty honestly.

Show ranges, not false precision. Distinguish robust findings from fragile ones. Let stakeholders make informed decisions.

These principles might seem like they lead to "less confident" answers—and they do! But they lead to answers you can actually trust. A confident-sounding answer that's wrong is worse than an honest range that captures reality.

The goal of scientific modeling isn't to produce the most impressive-looking numbers. It's to learn what's actually true about your marketing and make better decisions as a result.