Last night, my company conducted the Iowa Caucus Entrance Poll on behalf of the National Election Pool (NBC, CNN, CBS, FOX, ABC, and the Associated Press.) The results, you may already know: Mitt Romney beat Rick Santorum by eight votes. It was clear from the data, and from the actual vote count, that we were in for a long night - it was an insanely close race. Except, of course, in social media. The "big data" being thrown off by the web told a very different story. Conventional measures of volume showed Ron Paul dominating the social web by as many as five times more mentions than his nearest competitor. A straight-up search for relevant mentions on Topsy shows the race between Romney, Paul and Santorum as…well, not much of a race. If we look at search volume (and even confining it to Iowa, which hardly any of the social measures did) it's still a one horse race - Paul by a landslide. One sentiment measure I saw quoted calculated the volume of Romney's positive sentiment as sixth amongst candidates.
To date, I have never seen a repeatable correlation between social media mentions/sentiment and the actual vote. Sure, you could back-test a model that weighted to correct for the Ron Paul phenomenon, but the real test is to apply that exact same scheme the next time and see what you get - after all, most of these social tote-boards also got the relative gap and order wrong for other candidates, as well. I'm not discounting the possibility that it can be done; however, it hasn't been done and it will be diabolically difficult to do so.
In short, to date I have not been presented with a replicable model which shows, candidate for candidate, that the number of people tweeting about a politician has anything to do with the number of people in Cedar Falls, Iowa who actually got in a car, drove to a high school gymnasium and raised their hand. Can it be done? Maybe. Maybe not. I'm not pessimistic, I'm merely skeptical (in other words, don't prove me wrong - surprise and delight me.) I do know that raw mention-counting is a long hiding to nothing.
My only point here is that in the case of the Iowa Caucus, there is an enormous gap between what people on social media say, and what people in Iowa actually do.
Now, examine that last sentence. Replace "the Iowa Caucus" with the name of your brand, and replace "people in Iowa" with "your customers."
Do you know how big that gap is? It is knowable - as I've often said in this space, it isn't a black box mystery, if you do the work. But if you don't know that gap, you'll never make sound business decisions from social media data.