BrandSavant

Gaining Insight From Social Media Data

How To Measure Online Sentiment – A Definitive Guide

by Tom Webster on July 18, 2012

NewImageSeth Grimes posted a great article on Social Media Explorer today about the dubious claims relating to automated sentiment analysis, which has proven to be a popular topic here at BrandSavant. While automated techniques are getting better, they have a long way to go in terms of detecting sarcasm, intent, parsing complex conditional phrases. etc.. I don’t doubt that machines will get there. When that day comes, however, our robot overlords will likely turn on us, rendering our need for sentiment analysis moot as we instead turn to foraging for food, hiding in caves and protecting the life of John Connor.

So what is today’s marketer to do? Well, for a high-level overview of what I’m about to detail, let me direct you to this article I wrote over two years ago on a practical sentiment analysis alternative for social media, but I’ve had requests for a more hands-on guide, so here you go.

What you need:

A pencil (I favor the Blackwing 602, pictured, but any writing instrument will do. If you don’t make mistakes, feel free to use a pen.)

Microsoft Excel, Apple Numbers or some other robust, modern spreadsheet software

Optional: 2-4 friends, martinis (see? My method is not only more accurate, it’s more fun).

Instructions:

1. Gather up all the snippets of text–tweets, blog posts, comments, whatever you have–that you want to analyze.

2. Open a spreadsheet. Dump the snippets into Column B, one snippet/piece of text data per row. If your text data is comma/tab/paragraph delimited, this is easy. (If it isn’t, then you’ve got a lot of copying and pasting to do, but you’d have to do that for automated sentiment analysis, too. Shame on you, non-delimited data hoarder!)

3. Scroll down to the bottom. If the total number of snippets you want to analyze is under 1,000, then grab your pencil, and any optional friends/coworkers who can pitch in for half an hour, and scan each snippet. If it’s positive, then put a ‘+’ next to it in Column D. If it’s negative, a ‘-’. And if it is neutral, then a period or whatever other punctuation mark you want to use will do nicely. The percentage of positive snippets is then simple math.

4. If the number of snippets exceeds 1,000, then you take a random sample. Sampling, as I often say, is magic. Sampling is why my airplane didn’t fall apart this week, why my 28-story apartment building in Boston doesn’t topple into the street, and why M&M’s don’t cost $500 a bag. Sampling works in matters of life and death; it’ll do fine on your tweets. If all you are looking to do is characterize sentiment, you don’t need a census, you need a sample.

5. While we use sledgehammer-y tools like SPSS here at Edison, you can do this with Excel easily. Highlight all the cells in Column A that have a corresponding snippet in Column B (so, if you have snippets in cells B1:B5000, you’ll highlight cells A1:A5000.) With those cells highlighted, type the following into the formula box

=RAND()

…and hit [Ctrl] + [Enter]. You’ll get a random number between 0 and 1 in every highlighted cell.

6. Now, you sort Column A. It doesn’t matter if you sort A-Z (ascending) or Z-A (descending)–whatever is easiest. You can use “Sort” under the data menu or just the big A-Z or Z-A button on your toolbar.

7. You can now return to step 3, and code the first 1,000 snippets. Disregard the rest. You can delete the excess snippets if you wish, and also Column A–you won’t need it anymore. You’ve just taken a thoroughly random sample of 1,000 data points, which is what makes this whole process work.

8. The magic of sampling: if you did the exercise above 100 times, 95 of those times your calculation of % positive will be about the same, give or take three points. If you do 2,000 snippets, you shave nearly a point of error off, but let’s call that diminishing returns.

9. Of course, there is always the possibility of human error. You, or one of your helpers, might have coded something wrong. C’est la vie–error is error. In sentiment analysis, machines make plenty of errors, so don’t sweat it if you are off on a few.

10. For each person who helped with this task (which, I’m going to say, took an hour), finish the task in this fashion:

  • Fill a martini glass with crushed ice, and add water to the brim. Set aside.
  • Fill a cocktail shaker with cubed ice.
  • Into that shaker, add 4 oz. of Plymouth Gin, and 1/2 oz. of St. Germain.
  • Stir for 30 seconds. Do not shake. James Bond was a Philistine.
  • Dump the ice/water out of the martini glass.
  • Using a citrus stripper, peel a long ribbon from a lemon. For extra style points, twirl that ribbon around a cocktail stick. Do the peeling and twirling right over the glass, to catch all the lemon oil.
  • Place the peel/twist in the martini glass. Strain the cocktail into the glass.

You’ve just taken a pretty credible measure of sentiment with a pencil and Excel, and as a bonus, you’ve actually read through the text, which I hope you will agree was not only painless, but kinda valuable. If you do this exercise for the text generated around your brand during a one-month period and repeat the exercise monthly, you can trend this data, which is where it really becomes meaningful.

Helpful? Other questions? The comments, as always, are yours.

Be Sociable, Share!
  • http://www.facebook.com/people/Josh-Franklin/528236081 Josh Franklin

    A great read as always, Tom!

  • http://twitter.com/webby2001 Tom Webster

    Thanks, Josh!

  • Martin Weinberg

    The ideas in this article have left me stirred, but not shaken.

  • http://www.3hatscommunications.com/blog/ Davina K. Brewer

    IIRC, the “West Wing” called James Bond was a snob about his weak martinis. Anyway.. Love the measured details you go to, in order to illustrate that accurate, usable data will take time and work to achieve. Not to mention a team of analysts, though their judgement might be suspect towards the end of the hour. FWIW. 

  • http://twitter.com/theelusivefish Rob Clark

    Heh … this is almost exactly the process that I use (right down to mandatory drinks afterwards).  I generally use a 390 sample which provides around a 5% margin of error – but I save my full random sample instead of discarding so that if further precision is shown to be needed, we can keep going.

    The only addition I would suggest is that you need to firmly define your terms for the coders.  In other words, what does ‘Positive’ and ‘Negative’ mean?

    They said they were having a good time and also visited our store.  Positive!
    They mentioned one of our product’s features.  Positive!
    They only slightly misspelled the brand name … and that’s it.  Just the brand name.  But they used an exclamation mark.  Positive!
    They didn’t mention us at all but they trashed our competitor.  Positive!

    If you aren’t clear from the outset you’re going to get skewed and varied results depending on the reader’s own bias.  I tend to use the following definitions.

    Positive – a stated willingness to purchase or do business with (e.g. Widgets are great, I use mine all the time!)
    Mixed / Unclear – is not clear in terms of intent towards purchase or conducting business with.  Any item you internally debate on between pos/neg for more than 30 seconds goes here.  (e.g. Widgets – really?!)
    Neutral -  no indication one way or the other towards an intent to purchase or do business with (e.g. Widgets are a device utilized in hypothetical examples for more than 20 years)
    Negative – a stated unwillingness to purchase or do business with (e.g. Widgets are stupid and people should use real products as examples in their hypothetical scenarios instead).
    and of course  ‘False Positive’ for when your keywords suck in content that clearly has no connection to your brand (e.g. Widgets Widgets Widgets buy Viagara now Widgets!)

    ProTip:  so long as they’re reading 1000 items, have them code for common themes and conversation drivers.  Be able to identify WHAT people are talking about when they are positive, and what things lead to negative discussion.

    - Rob Clark
    Director, Insights & Measurement
    Edelman  

  • http://www.edisonresearch.com Tom Webster

    Nice tips, Rob!

    We once did a project at Edison where we had to code 12,000 20-minute snippets of audio for content, topics, sentiment, etc. For real. It took me years of therapy to live that project down. Martinis weren’t enough.

    Thanks for reading.

Previous post:

Next post: