...is hard. The problem, as ever, is one of sample. 99% of the social media 'data' I have seen is either anecdotal, qualitative (and I take nothing away from good qualitative, so don't get me wrong) or derived from some form of convenience or self-selected sampling. In some cases, the sampling methods are clear, so educated analysts have all the data they need to make the best use of the data. My only issue with some of those cases is not necessarily sample, which is what it is, but reporting. I wrote about this earlier in my take on the Pew numbers on Twitter, which were really numbers on every possible status update service on the web.
Take, for example, this highly readable white paper from Michael Stelzner, "Social Media Marketing Industry Report." Stelzner presents the top ten questions marketers want answered about social media, and the results of a survey of social media marketers about their activities and the benefits thereof. It's worth a read, and is a pretty good primer for marketers looking for the business case for wading into social media.
My only minor quibble with the report is the same persnickety quibble I have with 99% of the work I have seen in this area--reporting. In Stelzner's case, the sample is well-defined and accurately described: roughly 900 marketers who self-selected after responding to various requests distributed through social media channels. Stelzner's clarity there gives the power to the reader to process the data appropriately and understand its limitations and strengths. In the narrative, however, the sample is frequently referred to in shorthand as "marketers." Again, if you read the whole report in context, you understand exactly who these marketers are. But if you lift a quote from the report that talks about what percentage of "marketers" are using social media, you get into trouble. Even calling them "Social Media Marketers" isn't kosher; since the sampling was not probabilistic, you cannot characterize the non-response bias of the social media marketers who chose not to take the survey.
My easy out for this (and I've done self-selected/convenience sampling work as well as random sample work) is simply to always refer to the sample as "respondents," and use no other shorthand. True, a third-party reporter of the data is still very likely to mischaracterize the sample, but at least the researcher has done no harm.
None of this is to take away from the white paper, which again I found highly useful and readable, so I recommend it here. But I will continue to beat the drum for anal-retentive reporting of numbers. When shorthand descriptions for samples are used, the door is open for a well-meaning and conscientious researcher's data to be used in a variety of unintended ways.