BrandSavant

Gaining Insight From Social Media Data

A Practical Sentiment Analysis Alternative For Social Media

by Tom Webster on March 14, 2010

I’ve gotten a ton of response to yesterday’s piece on the hidden bias of sentiment analysis (thank you!) with a lot of folks chiming in one way or the other on the current state of automated sentiment analysis. Some of you, quite correctly, sent me messages that went something like: “OK, smarty-pants, how would YOU do it?” I can’t give you the smarty-pants answer to this, but I CAN give you one alternative to tackle the problem today.

What I am about to describe, it should be noted, does NOT apply to other forms of text analysis–if you want to analyze a massive body of brand mentions for content, or to identify clusters or segments by subject, or customer service opportunities, etc., there are loads of great tools for this, and they all work fine. I’ve used everything from SPSS Text Analytics tools on large datasets, to basic concordance software and a copy of Tinderbox on small sets, and they all do what they say on the tin. There is a great need to automate that kind of work, and great tools to get that particular job done.

No, I’m talking specifically here about sentiment analysis–not what the brand mentions were about, but how the “mentioners” felt about the brand. There is an apocryphal story about the development of the Fisher Space Pen, which was developed literally so astronauts could write in space without their pens exploding or the ink not flowing. Hundreds of thousands of dollars were spent licking this problem, which led to the development of the Fisher Space Pen, a pressurized, sealed pen capable of writing in the most extreme environments. Meanwhile, so the story goes, the Russians licked the same problem by sending their cosmonauts up with pencils.

My method will appeal to those of you who appreciate the latter solution.

Step One: Pick a timeframe, and add up all of the brand mentions you have in your dataset for that timeframe.

Step Two: If there are less than a thousand, brew a pot of coffee, set aside a couple of hours, and sharpen your pencil (I am a big fan of the pencil.) Do it by hand. Find some friends, and it will take less than an hour. You’ll be nearly 100% accurate, it won’t take as long as you think, and you’ll know the data cold. It will feel good. Trust me.

If there are more than a thousand brand mentions (which is likely for a Fortune 1000 company), then you need an intermediary step between the two just mentioned:

Step One-And-A-Half: Take a random sample of 1,000 brand mentions across a proportionate mix of social media channels.

Then, you can proceed to Step Two, brewing the coffee and sharpening the pencil, etc.

See, if you have a few hundred or even a couple of thousand mentions, then by all means, take a census. But if you have more than that–the only census that matters in this country is done every ten years and will cost over 15 billion dollars in 2010. You don’t need a census–lucky you–because sampling works. Sampling is the law (I like to say that in a Sly Stallone voice). If I take 1,000 randomly selected samples from a population of 10,000 mentions, then I could repeat the exercise 100 times, and 95 of those would put my sentiment measure within about 3% either way. Increase the total population to a million, and it’s still about +/- 3%, 95% of the time. It’s reassuringly like magic.

Is it perfect? No. First of all, while it’s easy to generate a random sample within a given social media channel (tweets, for instance), it’s a bit tricker to do the same across all social media channels and ensure that the sample is not only random but representative. Also, five percent of the time (the confidence interval I selected to get you +/- 3% at n=1,000) the data will not be within three percent either way. It could be really, disastrously wrong. But so could the alternatives.

Look, I’m no Luddite. Automated sentiment analysis is getting better–lots of folks trying to crack that nut have left passionate comments on this blog, and I have no doubt that they are getting closer and closer. Some of them are getting very close indeed. There is no perfect solution–I’m merely presenting an alternative.

With that, I throw open the door to you–how does your company measure sentiment today? Are you? What kind of success have you had? And–most importantly–how have you used that data? I’d love to hear your stories.

Be Sociable, Share!
  • http://leverwealth.blogspot.com David Phillips

    Tom, I have been using a semantic engine http://bit.ly/cIB9GB which gives us the key concepts and concept phrases within the texts inside the corpus.
    One of the advantages is that it is possible to weight concepts which gives us the option to create perspectives. This helps to provide that very human of issues the ‘view from where I stand’.
    I have experience of evaluating large numbers of citations and the founder/MD of MediaMeasurement in the UK.
    Getting inter-coder consistency above 85% for all phrases is pretty hard and so both people and machines are not 100% on the money.

  • http://blog.ianlyons.com Ian Lyons

    Hi Tom,

    Have to completely agree with this approach – did it recently for a client who put on an event which was – putting it mildly – not well received. I created a spreadsheet where the columns (23) represented each distinct issue and each row was online content mentioning (300) the event – ranging from articles, tweets, blogs and our own site comments. For each data point I assigned a -3 to +3 sentiment rating.

    When summing vertically, we quickly saw, prioritised the top issues according to our customers and crafted an appropriate response. I ended up adding a weighting multiplier column for an important media company, influential individual or people who had gone to the trouble of analysing the situation. A horizontal summing allowed us to prioritise individual responses.

    This level of analysis was certainly time consuming but also well beyond the capability of any automated tool I’ve seen. I feel we owed it to our customers to do it this way because at the end of it, we really did know all the issues inside out.

  • Anonymous

    Hi Tom,

    Great suggestion to sample data. For some of our clients at Synthesio, they also prefer to work with us to analyze a sample of what people are saying rather than be overwhelmed by every single person that might possibly mention their brand. In theory, answering every single person would be great. In practice, it’s not quite so easy (or necessarily relevant).

    We’re publishing a white paper on sentiment analysis, as well. I’d love to send you a copy and get your feedback :)

    Best,
    Michelle @Synthesio

  • http://jasonkeath.com Jason Keath

    So, to ask a simple question (as I like the pencil approach as well), how exactly do you pull 1,000 random tweets from 10,000? I am sure there is a software solution, but what about a simple excel or counting method? Give me your trick Mr. Webster.

  • http://www.edisonresearch.com Tom Webster

    Many ways. Here’s a simple one: http://www.public.iastate.edu/~vardeman/book_site/excel/random_sample/random_sample.html

    Bigger datasets require more robust software (We are an SPSS shop) but this will get you pretty far.

Previous post:

Next post: