Tom Webster, writing and speaking

Estimates Vs. Assumptions In Social Media Measurement

Added on by Tom Webster.

Blind deconvolution final estimateYesterday I participated in Social Slam, a tremendous event put on by the Social Media Club of Knoxville and Mark Schaefer. It was a very content-rich day, and the sellout crowd of 430 attendees got to hear from the likes of Jay Baer, Trey Pennington and your humble research geek. One of the presentations, though, stuck in my craw a bit, because it contained a piece of "data" that is emblematic of a particularly dangerous line of thinking. The presentation in question tracked the effectiveness of a social media campaign, and made the point that the messaging itself, disseminated mostly over Twitter and Facebook, had a reach of 2.7 million persons. This figure was derived using a tool that looks at retweets, shares, and the sizes of friend/follower audiences. The speaker then went on to disparage traditional media a little, for not being able to provide those kinds of metrics with the same precision.

The problem is that this 2.7 million "reach" figure is ludicrous. Could it potentially have reached that many people? I suppose there is some non-zero probability of that, but I'm still gonna call it zero. The fact is, when I tweet something to my followers, only a handful see it at that time. Others may find it through retweets, or through search, but in no way, shape or form did all of my followers see my message - and in fact, you can't even tell me how many "impressions" my message had. The same is true on Facebook - if someone shares a message there, Facebook's EdgeRank algorithm could very easily deemphasize the message to oblivion if there is little demonstrable engagement between messenger and recipient.

In fact, from this standpoint, social media isn't as "trackable" as you might think. The second I tweet a link out to this post, 5,000 people could see it. Far, far less will see it. Instead, what I will be able to track are retweets and links, which will add up to a number that is, let's just say, markedly under 5,000. Now, those retweets will find their way on to other networks and circles of followers/friends, but all I am really going to know is this: the number of people who actually saw my message lies somewhere between the number of retweets it received, and the number of followers it potentially reached. In the case of the campaign discussed by yesterday's presentation, that would put the "reach" between a couple of thousand, and 2.7 million. I'm taking the "under" on the average of those two.

Still don't buy this? Consider: 98% of America has a TV set. The "reach" of the Super Bowl was not 300 million Americans.

The truth is, measuring "reach" on networks like Twitter involves making a crapton of assumptions. One assumes, for instance, that there is some relationship between the potential reach (adding up all the followers that could have seen a message) and the actual message penetration (those that did see the message). You cannot, however, derive that relationship purely through mining unstructured data. The "reach" being measured by toting up followers is little different from circulation data - in that regard, Twitter is a lot like print, not digital. You know the circulation, but you don't know how many people read an article, or what the pass-along metrics are.

I'll close here by coming back to this particular speaker's disparagement of the measurability of traditional media. Let's set print aside here for a moment, and look at Radio and TV. Do I have a way of tracking "clickstream" information about the reach of a given TV show? No. What I do have is an imperfect measurement system, the Nielsen ratings, that provides estimates based upon sampling. Web analytics gurus disparage these estimates, because in the world of online advertising we can know exactly how many banner ads were served, and how many clicks we generated. The estimates generated by traditional media measurement services are not perfect - but they are imperfect in predictable ways (sampling is magic, remember?)

For instance, if I ask 2,000 properly sampled TV viewers in Los Angeles if they watched the Super Bowl, and 700 said yes, I can produce an estimate that gives the Super Bowl a 35% share. It is an estimate, and the next time I ask the question, I'll likely get a different number. Thanks to the magic of sampling, however, 95% of the time that different number will be +/- 2% of what the actual population number should be. Granted, 5% of the time it could be really way off, but that too can be modeled.

Estimates can vary, but they vary in predictable ways. Assumptions, however, are unpredictable. When we know the potential reach of a tweet, and the number of people who retweeted/clicked on that tweet, we have two numbers: one too high, and the other two low. Currently, the only way to back in to the "right" answer - the number of people who saw the message, is to make an assumption, not an estimate. An estimate is science; an assumption is rectally-derived until you can make it science.