Seth Grimes posted a great article on Social Media Explorer today about the dubious claims relating to automated sentiment analysis, which has proven to be a popular topic here at BrandSavant. While automated techniques are getting better, they have a long way to go in terms of detecting sarcasm, intent, parsing complex conditional phrases. etc.. I don't doubt that machines will get there. When that day comes, however, our robot overlords will likely turn on us, rendering our need for sentiment analysis moot as we instead turn to foraging for food, hiding in caves and protecting the life of John Connor. So what is today's marketer to do? Well, for a high-level overview of what I'm about to detail, let me direct you to this article I wrote over two years ago on a practical sentiment analysis alternative for social media, but I've had requests for a more hands-on guide, so here you go.
What you need:
A pencil (I favor the Blackwing 602, pictured, but any writing instrument will do. If you don't make mistakes, feel free to use a pen.)
Microsoft Excel, Apple Numbers or some other robust, modern spreadsheet software
Optional: 2-4 friends, martinis (see? My method is not only more accurate, it's more fun).
1. Gather up all the snippets of text--tweets, blog posts, comments, whatever you have--that you want to analyze.
2. Open a spreadsheet. Dump the snippets into Column B, one snippet/piece of text data per row. If your text data is comma/tab/paragraph delimited, this is easy. (If it isn't, then you've got a lot of copying and pasting to do, but you'd have to do that for automated sentiment analysis, too. Shame on you, non-delimited data hoarder!)
3. Scroll down to the bottom. If the total number of snippets you want to analyze is under 1,000, then grab your pencil, and any optional friends/coworkers who can pitch in for half an hour, and scan each snippet. If it's positive, then put a '+' next to it in Column D. If it's negative, a '-'. And if it is neutral, then a period or whatever other punctuation mark you want to use will do nicely. The percentage of positive snippets is then simple math.
4. If the number of snippets exceeds 1,000, then you take a random sample. Sampling, as I often say, is magic. Sampling is why my airplane didn't fall apart this week, why my 28-story apartment building in Boston doesn't topple into the street, and why M&M's don't cost $500 a bag. Sampling works in matters of life and death; it'll do fine on your tweets. If all you are looking to do is characterize sentiment, you don't need a census, you need a sample.
5. While we use sledgehammer-y tools like SPSS here at Edison, you can do this with Excel easily. Highlight all the cells in Column A that have a corresponding snippet in Column B (so, if you have snippets in cells B1:B5000, you'll highlight cells A1:A5000.) With those cells highlighted, type the following into the formula box
…and hit [Ctrl] + [Enter]. You'll get a random number between 0 and 1 in every highlighted cell.
6. Now, you sort Column A. It doesn't matter if you sort A-Z (ascending) or Z-A (descending)--whatever is easiest. You can use "Sort" under the data menu or just the big A-Z or Z-A button on your toolbar.
7. You can now return to step 3, and code the first 1,000 snippets. Disregard the rest. You can delete the excess snippets if you wish, and also Column A--you won't need it anymore. You've just taken a thoroughly random sample of 1,000 data points, which is what makes this whole process work.
8. The magic of sampling: if you did the exercise above 100 times, 95 of those times your calculation of % positive will be about the same, give or take three points. If you do 2,000 snippets, you shave nearly a point of error off, but let's call that diminishing returns.
9. Of course, there is always the possibility of human error. You, or one of your helpers, might have coded something wrong. C'est la vie--error is error. In sentiment analysis, machines make plenty of errors, so don't sweat it if you are off on a few.
10. For each person who helped with this task (which, I'm going to say, took an hour), finish the task in this fashion:
- Fill a martini glass with crushed ice, and add water to the brim. Set aside.
- Fill a cocktail shaker with cubed ice.
- Into that shaker, add 4 oz. of Plymouth Gin, and 1/2 oz. of St. Germain.
- Stir for 30 seconds. Do not shake. James Bond was a Philistine.
- Dump the ice/water out of the martini glass.
- Using a citrus stripper, peel a long ribbon from a lemon. For extra style points, twirl that ribbon around a cocktail stick. Do the peeling and twirling right over the glass, to catch all the lemon oil.
- Place the peel/twist in the martini glass. Strain the cocktail into the glass.
You've just taken a pretty credible measure of sentiment with a pencil and Excel, and as a bonus, you've actually read through the text, which I hope you will agree was not only painless, but kinda valuable. If you do this exercise for the text generated around your brand during a one-month period and repeat the exercise monthly, you can trend this data, which is where it really becomes meaningful.
Helpful? Other questions? The comments, as always, are yours.