Tom Webster, writing and speaking

Fishtanks And Tentpoles: A Rant About "Margin Of Error"

Added on by Tom Webster.

Sometimes, what is blandly presented as "fact" can also prove to be a sophist trick, or at the very least not the whole story. This is certainly true of statistics, and one of the more commonly maligned pieces of statistical data is "margin of error." This is not a blog about statistics, but if you are a marketer, you've undoubtedly been presented with marketing research data and been given a whole series of descriptive statistics about that data, including margin of error. Here's what you need to know to interpret that piece of information in the real world of business decision support. Imagine that you are looking at a pre-election poll, and the results indicate that Candidate "A" has 52% of the respondents' support, and candidate "B" has 48%. Your very next question should be, "what is the margin of error?" Let's say that the margin is +/- 3, which is about what you get with a 1000-person sample at a 95% confidence level. What that means is that 95 times out of 100, Candidate A would get between 49% and 55% of the vote, and candidate B would get 45% to 51%. This is a fact, yes, but it's often used in some pretty cringeworthy ways. If you learn about this poll from a supporter of Candidate "A," you are likely to hear it reported as "my guy is winning." If you hear about this poll from a supporter of Candidate "B", however, you can guess what they'll say: it's within the margin of error. In other words, statistically, it's still anybody's race, right? It's entirely possible that a race that polls 52% to 48% could end up 49% to 51%, with Candidate "B" winning, so the argument from margin of error basically maintains that if a race is within the margin of error, that it's anybody's race.

Now, technically, that is within the realm of possibility, but it's a gross oversimplification of what margin of error really is. The Candidate "B" supporter sees the race as a crapshoot, but if I'd bet a dollar on the frontrunner in every 52 to 48 race I'd ever seen with a 3% margin of error, I'd be a centimillionaire. Because here's the truth - if the sampling is competent, Mr 52 is probably gonna beat Mr 48 almost every time.

The folks who make the argument from margin of error typically fall into two camps. The most benign practitioners of this sort of argument are those who have what I would call an academic knowledge of statistics and margin of error, but not necessarily a practitioner's view. In other words, they may know from an academic perspective that a 52 to 48 race could end up as a 49 to 51 race, but they may not see these types of surveys enough to know that it generally goes the way of Mr 52. That's innocent enough. But on the other side, you have the folks who have a dog in the fight somehow. Either they support the underdog, or they have a vested interest in a given number being higher or lower than reported - doesn't matter which. The important thing is that they are emotionally invested somehow in the reported data not being right.

It is in those cases that we see the most sinister uses of the argument from margin of error, because what you are getting is, in fact, spin - even though on the surface it's couched in statistical fact.

Let me give you an example that puts this more in perspective for a marketer. Say I surveyed 200 male Smurfs, and found that 20% of them cheat on their wives (in reality, this number is much higher, but work with me here.) If I report this number, there are bound to be Smurf-lovers out there who either don't believe this, or have some kind of emotional investment in this not being true. With a sample of 200 (at a 95% confidence level) we have a margin of error of about 7%, so this number is likely to fall between 13 and 27. So, our Smurf-lover maintains, this number is meaningless--it could be 13, it could be 27, so it's a bogus result. You just can't tell. In other words, because this number could be anywhere between 13 and 27, it's silly to even come out with a number like 20--you're just throwing a dart, right?

Here is where that isn't exactly true. The argument from margin of error either doesn't understand probabilistic sampling, or is trying to willfully misrepresent probabilistic sampling, in the guise of presenting an apparent "fact." What this argument would have you believe is that the range of answers is a box - let's call it a fish tank. Inside that fish tank we have 15 numbers swimming around (13 to 27, inclusive) and if you sample that fish tank to get a number, you reach your hand in and pull one out - it could be a 14, it could be a 23, it could be any of those numbers, right?

But that's not how margin of error works. Again, with proper sampling, the probability of pulling a given number lies along a curve, not in a big fish tank. The way to think of this is to imagine that the 20 I reported as the number of Smurfs who cheat on their wives is the center pole of a tent. So the tent is highest at the number 20, and the fabric curves down until it gets to the very edges of the tent, where it's staked to the ground, at numbers like 13 and 27. If you were to fill that tent with red rubber balls (and I don't know why you would...), what you would see very quickly is that most of the balls either touched or were just a few balls away from that big center pole - they'd be piled up from the ground to the very top of the tent close to our center pole at 20. Way out at the edges, near the 13 and 27, there is only room all the way around that tent for one ball.

So most of the balls in the tent are a lot closer to 20 then they are 13 or 27, and if you reached into that tent to grab a random ball 100 times, not only would that ball be between 13 and 27 95 times, it would be between, say, 17 and 23 the majority of the time. You don't have an equal chance of pulling a 13 as you do another 20, because most of the responses are going to be closer to the middle of the tent than they will be to the edges.

So when you hear someone say that due to margin of error, you just can't tell what a given number really is, that's factually true because descriptive statistics are estimates of a population. But again, it's a lot smarter to bet with the number than it is to bet against it (especially if you have tracking data and/or corroborative data from other research.) Which is not only the law, it's bad news for Smurfette.