Upcoming Free MasterclassRegister now →

Correlation Still Isn't Causation. In 2026, Employers Are Finally Asking for the Difference.

Correlation Still Isn't Causation. In 2026, Employers Are Finally Asking for the Difference.

For years, the standard data scientist job posting asked for Python, ML frameworks, and some vague mention of "statistical knowledge." Causal inference barely appeared. A/B testing was listed occasionally, usually as a nice-to-have buried below fifteen other requirements.

That's changed. A detailed analysis of 101 data science job postings comparing 2025 to 2026 found that A/B testing requirements grew by 14 percentage points year-on-year, and causal inference by 17. Those are among the largest skill-requirement jumps in the entire dataset — bigger than cloud tools, bigger than deep learning frameworks, bigger than LLM-related skills.

The question worth asking is why now.

What's driving it

Data science teams have matured. The first generation of analytics work at most companies was descriptive — what happened, when, how much. The second generation was predictive — what's likely to happen next. The third generation, which is where serious data organisations are now operating, is impact measurement: did this intervention actually cause that outcome, or did it just correlate with it?

That question is harder than it sounds. An ML model can tell you that users who see Feature X have 40% higher retention. It cannot tell you whether Feature X caused the retention lift, whether high-retention users simply happened to engage with Feature X more, or whether some third variable — onboarding quality, user segment, cohort timing — explains both. Conflating correlation with causation at this stage doesn't produce wrong dashboards. It produces wrong product decisions, wrong budget allocations, wrong strategic bets. The stakes are high enough that companies are now explicitly hiring for the ability to tell the difference.

The core toolkit — without the textbook treatment

A/B testing is the cleanest causal tool when you can use it: randomly assign users to a treatment and control group, measure the outcome difference, and the randomisation handles confounding for you. The catch is that most A/B tests at real companies are underpowered — run without a proper power analysis, ended early when results look good, or structured in ways that introduce novelty effects. A data scientist who can design a clean experiment, calculate the required sample size before running it, and interpret the results without p-hacking is genuinely rare and genuinely valuable.

Difference-in-differences (DiD) is what you reach for when you can't randomise. A feature rolled out to one region but not another. A policy change that affected some user cohorts before others. DiD compares the change in outcomes between an affected group and an unaffected group across the before-and-after period. It's powerful precisely because it doesn't require an experiment — but it requires a parallel trends assumption that needs to be tested, not assumed.

Propensity score matching (PSM) handles the problem of non-random selection. If you want to compare users who adopted a feature organically against those who didn't, those groups are almost certainly different in ways that matter. PSM constructs a matched control group by finding similar non-adopters for each adopter, based on observable characteristics. You're not getting randomisation, but you're getting the next best thing.

Synthetic control is the method of choice when your treatment group is a single unit — one country, one city, one business division. It builds a weighted combination of untreated units that mirrors the treated unit's pre-intervention trajectory, then compares what happened after.

The practical gap

Most data scientists know these methods exist. Far fewer have actually implemented them under messy real-world conditions — where the data has gaps, the treatment assignment wasn't clean, the experiment ran for two weeks instead of the required eight, and someone senior is pressuring you to report a positive result.

That gap between knowing the method and knowing how to apply it honestly under pressure is exactly what employers are testing for. The methods are teachable. The judgment about when an experiment's results are trustworthy and when they aren't takes longer to develop — and that judgment is increasingly what separates a strong data scientist from a senior one.

Experimentation and causal reasoning are among the fastest-growing skill requirements in data science. Prioritising A/B test design, power analysis, and causal inference methods now puts you ahead of where the market is heading, not just where it already is.