Scientific Modeling Made Simple
A practical guide for everyone—no PhD required
Statistical models are tools that help us understand what's really happening with our marketing. This guide explains the core principles in plain language, with visual demonstrations you can explore yourself. By the end, you'll understand why our approach produces more reliable insights than traditional methods.
What is a Statistical Model?
At its core, a statistical model is simply a story about how your data came to be. It's a mathematical way of saying "we think sales happen because of these factors, in this way."
Think of it like a recipe
A cooking recipe tells you: "If you combine these ingredients in these proportions, you'll get this dish." A statistical model does the same thing for business outcomes: "If TV spend is this much, and digital is this much, and it's this time of year, you'll get approximately this many sales."
Just like recipes can be simplified (leaving out optional garnishes) or detailed (specifying exact temperatures), models can be simple or complex. The goal isn't to capture every possible factor—it's to capture the ones that matter for your question.
The Map Analogy
This perspective is actually freeing: we're not trying to find some "perfect" model that captures all of reality. We're building useful tools for making better decisions, while being honest about what they can and can't tell us.
All Models Are Wrong (And That's OK)
There's a famous saying in statistics: "All models are wrong, but some are useful." This isn't pessimism—it's wisdom.
The weather forecast comparison
Weather forecasts are "wrong" in the sense that they can't predict exactly when and where each raindrop will fall. But they're incredibly useful—you probably check the forecast before planning outdoor activities.
Marketing models work the same way. We won't predict your exact sales next Tuesday, but we can tell you whether TV is likely working better than social media, and give you a realistic range of what to expect.
The key question isn't "Is this model perfect?"
It's "Is this model useful enough for the decision I need to make?" A model that tells you TV's ROI is somewhere between 1.5 and 2.5 is useful for deciding whether to keep investing in TV. It doesn't need to tell you the ROI is exactly 1.87.
The Story-Telling Approach
The best models start with a clear story about how things work. We call this a "generative" approach—we can use the model to generate fake data that should look like real data if our story is correct.
Like a flight simulator
Pilots train in flight simulators before flying real planes. A simulator is a "model" of how flying works—it generates realistic experiences based on the rules of physics and aerodynamics.
If the simulator produces wildly unrealistic results (planes flying backwards, for instance), you'd know something is wrong with the underlying model. Our statistical models work the same way: we can run them "forward" to generate synthetic data, and check whether it looks plausible.
Checking Your Assumptions
Before we even look at real data, we can test whether our assumptions make sense. This is called a prior predictive check—we're checking what our model predicts based only on our assumptions, before learning from the data.
🔬 Interactive: See Your Assumptions in Action
Move the sliders to change how confident we are about media effects. Watch how this changes the range of sales our model thinks are possible—before seeing any real data.
This might seem like extra work, but it catches problems early. If our assumptions imply that weekly sales could be negative or could be $1 billion, we know something is wrong before we waste time fitting the model.
Start With Questions, Not Data
The most common mistake in analysis is jumping straight to the data without first clarifying what question you're trying to answer. The question determines everything else: what data you need, what model is appropriate, and how to evaluate success.
Different questions require different approaches. If you want to know whether TV ads cause sales increases (vs. just happening to coincide with them), that requires different methods than simply describing what happened.
Learning Through Iteration
Scientific modeling is inherently iterative. We build a model, check if it makes sense, learn from what doesn't work, and improve. This isn't a bug—it's the process working as intended.
The Scientific Modeling Cycle
Each time through the cycle, we learn something. Maybe our assumptions were too loose (the model thinks sales could be negative). Maybe the model can't capture an important pattern in the data (it misses every December spike). Each failure teaches us what to fix.
Honest Iteration vs. Specification Shopping
Here's the crucial distinction: there's a right way and a wrong way to iterate on models. The difference isn't about the mechanical actions—it's about what's driving the changes.
âś“ Honest Scientific Iteration
Changes driven by:
- Model produces impossible predictions
- Clear patterns the model misses
- Domain expertise suggests missing factors
- Pre-planned alternatives to test
The goal: Make the model better at capturing reality
âś— Specification Shopping
Changes driven by:
- "That coefficient should be positive"
- "The ROI needs to be above 1.0"
- "This doesn't match our expectations"
- "The client won't like this result"
The goal: Get the results you wanted
⚠️ Why Specification Shopping Is Dangerous
When you adjust your model until you get the results you want, you're not discovering anything—you're just manufacturing the answer you already had in mind. The statistics look legitimate, but they've lost their meaning.
It's like weighing yourself on different scales until you find one that shows the number you want. The scale isn't wrong—you've just selected on the outcome.
The Fishing Problem
Imagine fishing in a pond with 100 fishing rods. If you cast all 100 and report only the one that caught a fish, you haven't proven you're a good angler—you've just tried a lot of times. This is the multiple testing problem.
⚠️ Interactive: Watch False Positives Multiply
When analysts test many model specifications and report the best one, the chance of finding something "significant" by pure luck skyrockets. Move the slider to see the effect.
The birthday party trick
At a party with 23 people, there's a 50% chance two people share a birthday. With 50 people, it's 97%. This feels surprising because we think about finding someone with our birthday—but the question is finding any match.
Specification shopping works the same way. Each individual test has a small chance of a false positive. But when you run many tests and pick the best one, you're not asking "did this specification get lucky?" You're asking "did any specification get lucky?" And the answer is usually yes.
The Winner's Curse
Even when there is a real effect, specification shopping makes you overestimate it. This is called the Winner's Curse: the specification that "wins" (looks best) is usually the one that got lucky with random noise.
📊 Interactive: See the Bias Build Up
Each specification estimates the same true effect, but random variation makes some estimates higher and some lower. When you pick the highest one, you systematically overestimate.
The sports tryout analogy
Imagine evaluating basketball players based on a single day's performance. The player who scores most that day isn't necessarily the best—they might have had an unusually good day. If you sign them expecting that performance to continue, you'll be disappointed.
Specification shopping does exactly this with statistical estimates: it selects the estimate that performed best on this particular dataset, which is often unusually good due to chance.
Sensitivity Analysis: Stress-Testing Your Conclusions
The solution to these problems isn't to never make any modeling choices—that's impossible. It's to test how much your conclusions change across reasonable alternatives. This is called sensitivity analysis.
🔄 Interactive: How Robust Are Your Results?
Good results hold up across reasonable alternative assumptions. Fragile results flip-flop. Watch how honest reporting differs from cherry-picked reporting.
Robust findings vs. fragile findings
Some findings hold up no matter how we specify the model: "TV has a positive effect on sales" might be robust. Others are fragile: "TV's ROI is exactly 2.3" might vary wildly across specifications.
Honest analysis reports both—and clearly distinguishes which conclusions you can trust and which are more speculative.
External Validation: The Ultimate Test
The best way to know if your model works is to test it against reality. This means using it to make predictions and then checking if those predictions come true.
Communicating Uncertainty Honestly
Perhaps the most important skill in statistical modeling is honest communication. This means reporting what you actually know, what you don't know, and how confident you should be in each conclusion.
❌ False Confidence
"TV ROI is 2.3"
Implies precision that doesn't exist. Hides the uncertainty.
âś“ Honest Uncertainty
"TV ROI is likely between 1.8 and 2.8"
Shows the actual range. Enables better decisions.
Counterintuitively, honest uncertainty is more useful than false precision. If TV's ROI could be anywhere from 1.5 to 3.0, you know TV is clearly positive—worth investing in. But you also know not to optimize to the third decimal place, because that precision is an illusion.
The GPS analogy
When GPS says "arrive in 45-55 minutes," that's more useful than "arrive in exactly 47 minutes"—because you know to plan for the range. Reporting a single number implies a confidence that doesn't exist and leads to worse planning.
Key Principles to Remember
The Five Pillars of Scientific Modeling
1. Start with questions, not data.
Know what you're trying to learn before you start analyzing. The question shapes everything.
2. Build models as stories.
A good model tells a clear story about how things work. If you can't explain the story, the model isn't ready.
3. Check assumptions before results.
Run prior predictive checks. Make sure your assumptions don't imply impossible things.
4. Iterate honestly.
Change models to fix problems, not to get desired results. Document what you tried and why.
5. Report uncertainty honestly.
Show ranges, not false precision. Distinguish robust findings from fragile ones. Let stakeholders make informed decisions.
These principles might seem like they lead to "less confident" answers—and they do! But they lead to answers you can actually trust. A confident-sounding answer that's wrong is worse than an honest range that captures reality.
The goal of scientific modeling isn't to produce the most impressive-looking numbers. It's to learn what's actually true about your marketing and make better decisions as a result.