BrandSavant

Gaining Insight From Social Media Data

The Hidden Bias Of Social Media Sentiment Analysis

by Tom Webster on March 11, 2010

Many of the leading social media monitoring suites come with some form of sentiment analysis technology, and this technology is used for a number of applications to track buzz, or measure crisis management, or the gauge the efficacy of a campaign. I’ve hinted here in the past that I remain unsure about what to do with sentiment analysis, which has of course prompted a number of folks in the space to drop by and comment, for which I am grateful. I do, however, want to elaborate on my “discomfort” about automated sentiment analysis here, because it is something to which I’ve given a fair amount of thought, and it deserves an equally fair shake here.

One issue, of course, is the open question of what you actually do with sentiment data. I am sure there is some relationship between social media sentiment and other key business metrics, but the onus is on the sentiment analysis folks to show that. When social media sentiment goes up or down, what if anything does that translate to? For example, when the “Motrin Moms” controversy raged across the Twittersphere, social media sentiment for Motrin likely plummeted. But did sales go down? A survey found that actual opinion of average moms was markedly less negative (in fact, most knew nothing about it) so what would you do with this information if you were Motrin? The right answer is not “nothing,” I’ll grant you, but without some way to square the disconnect between social media analysis and other business measures, what do you then do with sentiment analysis? I’m not saying the sentiment analysis got this wrong–I’m merely saying that it got this different. Without knowing the cause of the delta or the extent of the correlation, I honestly don’t know what to do with the data!

Let me state this up front – sentiment analysis is getting better. As processing power and algorithms grow increasingly more powerful and sophisticated, this will naturally happen. Still, every recent comparison I’ve seen in print between man and machine for determining sentiment has machine losing by enough of a gap that I don’t feel I could ever look at the results of an automated sentiment analysis and not feel like I have to go back and check it again–which defeats the purpose, I suppose.

Here’s what it’s good at: let’s say I post this on Twitter:

“I love my Toyota.”

I suspect that’s a no-brainer for any of the leading sentiment analysis tools. Where they have been getting better lately is with natural language processing and learning how consumers actually talk about specific categories and brands. So if I also post:

“Toyota FTW!”

…many of the leading tools will get this one right, as well. The next frontier for sentiment analysis is doing better with complex phrases that are comparative or conditional, like these:

“If my Toyota would stop when I pressed the brake, I’d LOVE it!”

or

“I love Toyota, but it’s pretty tough to beat my Yugo on looks alone!”

At this point, I expect some of the sentiment analysis folks to chime in and comment that yes, their tool can handle these as well. I’m not a computer scientist, so I won’t dispute any of that. But consider this: I actually do own a Toyota hybrid, and I am assuming that the accelerating problem and the braking problem will cancel out somehow and make my “average” ride safe. I don’t personally believe any of the examples I’ve just listed. In fact, I believe the opposite. If you’ve made it this far, you now know my sentiment about the brand in question, but how would a computer handle this particular blog post in an automated sentiment analysis? Am I expressing a sentiment about the car in question, or not–and if so, what?

The answer for many systems would be to take the logical step of avoiding sins of commission (labeling this as positive or negative) and instead risking the sin of omission–not categorizing it at all. In fact, that’s exactly what Ignite Social Media’s Brian Friedlander found when he examined a Radian6 sentiment analysis and found that 77% of the brand mentions he looked at were tagged as “neutral;” in other words, the algorithm didn’t make the wrong choice (labeling a positive as negative), rather in close or complex cases, it defaulted to neutral. From a computer science perspective, that’s probably the right choice. Again, I’m no computer scientist, and I am absolutely not picking on Radian6 here.

What I am, however, is a survey research guy, and it is when I put on my sampling methodology hat that I see the hidden bias inherent in this approach–the non-response bias. What Brian’s analysis also uncovered makes a lot of sense–while only 28% of the brand mentions tested came from microblogging (like Twitter), 61% of the posts marked with a positive or negative sentiment came from microblogs. This makes total sense–it is much easier for a computer to make the right call on 140 isolated characters of “Toyota FTW!” than it would in this blog post, which will absolutely show up on Toyota’s social media monitoring radar by dint of the number of times I’ve mentioned the brand. A computer scientist would rather have the machine make no choice than make the wrong choice–and that’s fair. But consider then what your sentiment profile sample looks like.

If 60% of your identified sentiment comes from 28% of brand mentions, and those mentions are weighted towards Twitter and other microblogging solutions because it’s easier to be accurate, than the majority of your sentiment profile is being determined by a tiny universe of unrepresentative consumers (the small percentage of online users with Twitter profiles) and not by the significantly larger sample of consumers on Facebook, leaving comments on blogs or posting to message boards. Now, you can weight the responses derived from Twitter down in the mix, which would mitigate the impact of microblogging on your overall sentiment profile, but determining those weights is tricky, and even then you are left with the non-response bias of all the untagged/”neutral” mentions in other platforms, making comparisons difficult. I believe I have seen Conversition talk about weighting their data by source, so clearly I am not the only one thinking along these lines, but know that this expertise comes from a human, not a machine. And weighting by source to account for differential response rates is not the same thing as weighting to account for a differential in sentiment identification rates.

All of this is one man’s roundabout way of saying that I don’t have the computer science expertise to challenge the accuracy of sentiment analysis, so I won’t. But when you look at your sentiment analysis data, also consider the sources of that analysis. The iceberg analogy probably works best here–some small percentage of your sentiment is “visible” (i.e., easily categorizable) by a machine, while the rest lies submerged under the ocean. But allowing your sentiment profile to be disproportionately weighted by microbloggers, who are the few, and not adequately represented by other social media users, who are the many, may lead you to draw conclusions about the iceberg from the tip that aren’t, in fact, accurate. Sample is everything.

Your take? Believe me when I say this–I want to be proven wrong. But I want to be proven wrong by something that doesn’t involve a proprietary, black box solution, because that will only be the exception that proves the rule. What say you?

Buffer
  • http://www.symscio.com/ Mike Layton

    Hi Tom, Sentiment has its place if you can trust the data. It serves as a *starting point* in better understanding how a brand is being positioned. Optimally, you want to compare this against the competition to better understand it in relation to the industry and know why a mention is positive or negative. While I’m not a CS major either, I do know that sentiment is not determined by an equation, as elaborate as it may be. If you take your Toyota example, a blog may state several positive things about them such as “have been known for product quality”, “an industry leader”, “responsive” and so on and so on. The problem is if that same blogs discusses how a defect resulted in injury or death, I don’t care how many favorable things are said, it is negative. Does Toyota want this post to exist? Are readers more or less likely to do business with Toyota? These are some of the questions that automation would need to answer and for better or worse, the answer does not lie in a formula.

    Thanks for sharing, Mike.

  • Tom Webster

    Right on, Mike–thanks for your comment!

    I wonder if anyone from Toyota is reading this one :)

  • http://www.symscio.com/ Mike Layton

    I hope so and for the record, I also own a Toyota hybrid and am hoping that your cancellation theory has some truth to it. :)

  • http://www.manhattanmarketingmaven.com Danny Flamberg

    The key to the Twitter mining concept is defining terms and setting the filters to collect, sort and assess millions of conversations. At present everyone from PV to Radian6 to Dan Zarella does it their own way.

    The next great leap will be to create common understandings about how to analyze the Twitter stream. This will all just be alchemy until the social media and marketing community establish common definitions for tone, establish hypothetical thresholds for frequency, discuss ways to measure intensity and put forward best practices for weighing and reading the online tea leaves.

    Here are some of the outstanding questions I am sharing with those who are working with me in attempting to monitor, measure and respond to online conversations:

    1. If Jeff Jarvis tweets that your brand sucks. Is that better or worse than 50 unknown tweets explicitly or implicitly expressing anger, disappointment, a sense of being ripped off or detailing service shortfalls?

    2. How do we weight intensity? It is specific language, overall vehemence of the tweet or should it also account for resonance (was it retweeted)?

    3. How do we understand or process authority; some people know much more than others or have experience or insights that would give their opinion more credence or more weight?

    4. If 50 or 100 of these tweets come thru over a week, a month or a quarter, how serious should a brand take it and what level of action or intervention is required? How much tweeting action over what time period helps or hurts you?

    5. How do we separate out frequent tweeters and blabbermouths from thoughtful or opinion-leading tweeters?

    6. Is frequency enough of an indicator. If not how can we mine and measure the content of tweets?

    7. How much negative feedback is enough to cause genuine concern and prompt action? In highly transactional businesses someone is always complaining. Assuming that every brand has a few detractors, how much bad news or how many bad raps are necessary to call out the customer service or PR firemen?

    8. What is the interplay between brand advocates and loyalists tweeting in opposition to detractors? Is there a baseline balance of online commentary that brands should expect? How much frequency or intensity is needed to prompt a specific response? How do you know when you’re really in trouble?

    9. How do you weight tweets? And is it really in the best interests of a brand to air or fix these situations in a public forum or should brand tweets direct disgruntled customers offline?

    Few marketers doubt the ultimate value of Twitter and other social media for uncovering customer sentiment and improving customer engagement. But we are in the early pioneer days of data-mining and everyone should be conscious that what we are dredging up might or might not be a clear or accurate reflection of what’s really going on. For the near term –stay paranoid.

  • Tom Webster

    Danny, that’s an excellent set of questions to ask–you’ve taken this to the next level, clearly. For the near term, stay paranoid indeed.

    Still, I look at all of the questions you raise, and wonder if there were a computer system that could do all that, would it pass the Turing test? Do we need to build SkyNet for this? Would there be a nobler calling for such a machine than the processing of Tweets? :)

  • http://www.ignitesocialmedia.com/ Brian Friedlander

    Tom, I’m glad you got something out of my post. I agree wholeheartedly that current sentiment analysis is only the tip of the iceberg, and unfortunately, that tip still isn’t very accurate.

    I’ve spoken with a few companies recently who are taking interesting approaches toward achieving sentiment accuracy. Some are utilizing “artificially intelligent” learning engines to try to incorporate more colloquialisms and demographic-specific language. Although they are claiming very high accuracy rates (~90%), I am still very skeptical since many of the current sentiment tools advertise ~70% accuracy, and I still haven’t even seen that. Another approach that I think is more promising is crowd sourcing the actual reading and sentiment scoring of posts (through Amazon Turk or another crowd sourcing solution) – the questions here are how big it can scale and the cost. Of course, as Katie Paine has expressed on her blog, you don’t necessarily need to read *every* post. A good, random sampling (which makes the actual reading and scoring by a human possible) is still likely the most accurate way to go.

    Also, as Danny points out with the complexity of his questions, it really depends on what you’re trying to test by monitoring sentiment. All the work is for naught if it doesn’t provide actionable data.

  • http://ci.biz360.com Kevin Peterson

    Tom,

    I’ve seen the same thing in our competitors. From an engineering point of view, it’s an easy trap to fall into because a majority of mentions are actually neutral (around 60% in our content). This means that if your system is biased towards neutral, it makes the naive measurement of accuracy higher.

    When I tuned the sentiment engine for Biz360, I was going for maximum information. My training data was annotated by humans, and every item was marked positive, negative, neutral or mixed, and the system was trained to best match what the humans said. I used the F1 scores (that is, false positives and false negatives are equally bad) as my yard stick, and it turned out that training to make negative and mixed acceptable decreased the accuracy of neutral, but that’s what I went with because I’d rather err in assigning the occasional item that should be negative as positive or mixed, rather than play it safe and score huge swathes of content as neutral, which as you point out isn’t of much value.

    Now, some of the examples you give are likely to be misclassified. Subtle sarcasm relying on things like a Yugo being an undesirable car isn’t something I’d expect the system to catch. I’ll be writing up some details on how we do sentiment soon. I’m sure our community manager @themaria

  • http://ci.biz360.com Kevin Peterson

    Whoops, hit tab and submitted. My post should end

    I’m sure our community manager @themaria can get you access to a demo account if you’d like to do some testing.

    Kevin Peterson
    Biz360

  • http://ci.biz360.com Maria Ogneva

    Thanks for raising a great issue! So much to respond to here!

    Sentiment analysis is a hotly debated and (sometimes) divisive issue. To be honest, even though our platform measures sentiment, I’d never advise any of our clients to base an entire strategy around it. It’s a metric that should be in your toolbox as you are taking a big-picture snapshot of your brand and the market. It is helpful when you have thousands of items that you are looking at, it’s simply not possible to sentiment them with your eye. OTOH, if you are trying to figure out sentiment of an article, you can simply read it.

    Listening and measuring in aggregate is just one side of the issue. Interpreting the data and taking steps to engage (one on one) or redesign the product, pricing or messaging strategy (one on many) – this is where the art of this science comes in. For a community manager like myself, it’s helpful to isolate positive vs. negative and then dive in and in turn sort by importance / impact of each occurrence (you are right, not all tweets are created equal).

    To address the issue of accuracy, various vendors are doing different things. I can speak to what we do (and we are improving it over time, and our sentiment engineer Kevin should be providing his feedback on this article as well): we offer sentiment on the basis of the topic and the article. Sentiment is more accurate when calculated on the basis of the topic topic because we can account for proximity of the qualifier word to the topic word.

    Danny raises excellent points. The industry is still figuring out how to deal with relative importance of tweets; built-in influencer analysis helps to separate impactful thought leaders vs. “blabbermouths”. A lot of Twitter chatter is also an amplification (via retweets) of the one tweet, so providing a threaded environment and an indication of velocity of a tweet is definitely helpful to understanding the “trajectory” of each tweet and impact of each tweeter.

    How much negative impact should a brand worry about? That depends on the brand and product category, and it’s all relative, so you probably want to track your sentiment over time and vs. other competitors.

    And finally (because this comment is getting way too long :) , you are right, because there are many more tweets than blogposts and other SM articles, Twitter can skew counts. This is why it’s important for the platform you are using to slice and dice everything by source. If you want to look just at sentiment in blogs or discussion boards, without Twitter, that’s definitely helpful.

    Thanks for a great discussion and for opening this forum for us to participate.

    Maria Ogneva, Biz360
    @themaria @biz360

  • Pingback: Your turn: Can you trust automated social data? | SmartBlog On Social Media

  • http://ecgridos.com Alan Wilensky

    The fact is, the computational linguists have missed, big time, the key linguistic value marker called, “redress seeking phrase markers”.

    I can’t fight this one again and again, sentiment is a useless marker. When it comes to analysis of customer service and brand impressions, the weighting of words is purely influenced by generational idioms.

    However, redress seeking is unerringly accurate (if properly parsed and contextualized) for the true outcomes of a consumers impression of a product, a brand, or a services outcome .

    I leave you with: http://bizcast.typepad.com/clients/brandmonitoring/
    http://www.scribd.com/document_collections/2294726
    http://www.scribd.com/doc/907473/Master-Analysts-Report-of-Consumer-Generated-Media-Metrics-

  • http://www.bravenewme.com Magnus Nilsson

    Very good post, I’ve come across some odd results on this topic also: http://www.bravenewme.com/2010/02/social-media-sentiment-analaysis-accuracy/

  • Laura Book

    My two cents and experience testing sentiment with a few social monitoring tools out there. Of the few that I tested: Radian6, SM2, Wool.labs, I found that probably Wool.labs had the best sentiment feature because of its algorithm and human intervention, after that I liked SM2 because I could use their dictionary to reteach the tool (for example, by identifying certain keywords that were seen as negative to be treated as neutral from now on). Radian6 sentiment I found to be more challenging as even a lot of my posting returns showed up as both negative and positive even when there were no keywords or adjectives that indicated sentiment associated with them.

  • http://ecgridos.com Alan Wilensky

    I don’t want to hit this too hard, but there is a major and continuing disconnect in the business of mining meaning out of just about any corpus – but when analyzing a Facebook, twitter, or a blog / forum for real, action market in wild text stream, the science is fine, mathematically – the problem stems (get it?) from the lack of the practitioner’s understanding of how consumers of services and buyers of goods express their intent and impressions in an on-line forum or Social media mode. And this is the whole thing.

    The most elementary research shows that people do not express negative or positive language in the mention tally as the companies who pitch these quite flawed services would have you think.Why is this. So, One, we have the misapplication of text and corpus analytics to a bad metric – ok, we can fix that, these are smart people.

    Two, we have entrepreneurs from all kinds of previous ventures with success productizing, oh I dont know…..grocery delivery…. selling, pitching, and funding these “magic text moosh” services.

    In the world 0f big dollar capital equipment services, or high end consumer durable goods, a very strict set of metrics can be defined that returns real actionable data – and not just for marketing and customer response, but for inventory forecasting and product planning down to the RRJIT floor (Rapid Replenishment Just in Time Manufacturing Floor).

    My surprise is that the buyers of these services, besides polling negative as a group, are not disparaging of them, or a just looking to spend on anything that might give an edge.

    Now, they are not all bad, dont get my wrong, but they cant and have not really delivered on the promise – but they could if smart people would turn the science over to the product managers that a) understand and communicate with the comp linguists, and b) refocus the current analysis mess from sentiment to redress.

  • http://tritondigitalmedia.com Jim Kerr

    Your initial point about relevance is well-taken, but this is more an issue of penetration, not actual relevance. Twitter is still relatively small, and it is quite possible that a significant meme on Twitter may not be representative of mass-appeal awareness. That will change, and it certainly not true of Facebook. Of course, all these social media metrics comnpanies fail to access profile streams on Facebook, so it’s a non-starter in that world. But with Twitter growing and streams being public, we’re getting closer.

    As to the quality of sentiment data, it’s dependent a lot on which company/platform you use, of course. I spent two days meeting with CEOs of social metrics companies in January, and at least one company’s sentiment data was extraordinarily accurate (Peoplebrowsr). Biz360 CEO Brad Brodigan admitted that algorithmic sentiment analysis is difficult to get close to 70% accurate, let alone 90%, but it’s still impressive. His platform had to assess this tweet:

    I hate O’Hare. Southwest Airlines needs more flights from here.

    I daresay many algorithms would mark this as negative with “I hate” coming only one word away from “Southwest Airlines,” but his assessed it as “neutral.” That’s damn good.

    This “neutral” assessment is taken as a non-response bias, and I just don’t get that at all. First of all, the pure volume of activity on Twitter dwarfs blog and microblog posts, so implying that weighting toward Twitter is bad is not necessarily true. Now I DO agree that not having Facebook is an issue, but no one is measuring Facebook private stream data, so it’s not exactly a solvable problem at this point. You work with what you have. Secondly, the vast majority of brand mentions ARE neutral, so noting that a tiny universe of opinions determining a sentiment profile is an issue is not necessarily accurate as it is actually the reality.

    Ultimately, I think that sentiment and social media analysis is in its infancy and has a very long way to go (hence my CEO visits), but, that said, I am quite impressed with the quality of sentiment analysis we are seeing. The elephant in the room is that the biggest social network out there is not being measured at all.

  • http://ecgridos.com Alan Wilensky

    Meanwhile, back at the ranch, yeoman’s work is being done in customer service and warranty work by the low profile text analytics vendors, with streams coming from OCR forms, IM support streams, CS reps, some Twitter and even Facebook (at the account level).

    Could Facebook or Twitter make billions if they undertook a massive brand monitoring productization? Oh, no doubt, especially in consumer durable goods reliability and failure term data, which is easier (er), to analyze.

  • Tom Webster

    Jim–if Tweets were assessed as “neutral” to the same proportion as other online brand mentions, then I think your point would be dead on. But that isn’t the case–in the example I referenced, Tweets were overrepresented in the corpus of “tagged” mentions by over 200%. You can either interpret that to mean that Tweets are more likely to express sentiment than other online brand mentions, which I would view unlikely, or that Tweets are being overrepresented in the corpus and other online brand mentions are being correspondingly underrepresented due to a higher proportion of mentions not being tagged as positive or negative due to their complexity. Statistically, there’s just no other way to look at it. It’s a subtle point, but a valid one nonetheless. The non-response bias I am talking about isn’t the mentions that are tagged as “neutral” because they actually are neutral (as in your one-sentence example) but with the posts that are tagged as “neutral” or just not counted because the program couldn’t ascertain sentiment one way or the other, and a higher percentage of those are contained in comments, message boards and blog posts, where statistically more people are sharing more content. It doesn’t invalidate the process, it’s just a warning to consider how you weight the data.

  • http://tritondigitalmedia.com Jim Kerr

    Tom, I would argue this point with you: “comments, message boards and blog posts, where statistically more people are sharing more content.” That may be true on a volume basis, but I must say that at the per user level, status updates are made more often than blog comments, blog posts, and message board posts.

    My educated guess is that the number one engagement online in terms of creating content is updating a Facebook or Twitter status.

  • Tom Webster

    You are dead on about Facebook, Jim–one thing that would make for an interesting study though would be exactly what we are missing there in terms of brand impact. True, Facebook streams are missing from sentiment analysis, but they are also missing from Google, and the platform’s symmetrical network nature means that most brand mentions don’t play as big as they might on other, asymmetrical platforms. But matter, they do–and it is an elephant in the room, especially as a component in word-of-mouth and how friends in your network on Facebook may be more likely to give such mentions more weight than your “friends” on Twitter. Love to tackle that problem!

  • http://tritondigitalmedia.com Jim Kerr

    re. Facebook.

    I told someone the other day that Social Metrics without Facebook is like doing a research study on what sports fans in the United States think but excluding NFL fans.

  • http://www.collectiveintellect.com/blog Laura Carroll

    Tom,

    You make some great points in your article. From the perspective of a company in this space; sample is everything. Using SM monitoring tools, it’s important to know the collection methods of the company creating the tool in order to understand what your sample will consist of. If the company collects all of Twitter and 20% of blogs, that’s going to bias your sample just as sentiment algorithms can.

    By listening in to online conversations we are inherently eliminating a certain amount of bias that traditional market research methods such as focus groups and panels can’t avoid. That doesn’t mean SM Research is bias-free- which you have certainly addressed in your post and comments.

    It’s important for researchers to stress transparency with their vendors, so that they can learn where the biases are in their research and how that will affect the outcome. This applies to sentiment as well as the source of the conversations. In such a new space, vendors must also be sure to prioritize the movement towards reducing bias in their data wherever possible. Just like a brand to a consumer, vendors need to mature with the needs of social media research space.

    This is a really great article. Look forward to more writing from you.

    Best,
    Laura

  • http://www.clarabridge.com Justin Langseth

    We at Clarabridge find that the application of NLP full linguistic deep sentence/clause parsing is required to disentangle sentiments from each other and properly attribute them.

    For example a sentence “I love your comfortable beds and sheets, but your front desk staff are rude, not welcoming, and extremely unfriendly.”

    There is very positive sentiment about comfort of beds and sheets, and extreme negative sentiment about friendliness of staff. The only way to properly determine the sentiments in this sentence and attribute them properly to the correct objects and concepts is through a sentiment analysis engine that uses a full NLP linguistic deep sentence parse as one of its inputs.

    Also users need to be able to tune sentiments based on the topics they are looking at. If I said “the sheets, walls, and soup were thin” that is negative in the hospitality industry, but “my new iPad is thin” is a positive in the electronics industry.

    And correction for various forms of negation is critical. If my “iPad is too thin, I think it may snap in half”, that is negative… too much of a good thing is negative. If a hotel “used to be great” that means it probably isn’t anymore.

    For social media in particular, this is critical. A blog posting may express a whole variety of sentiments about a variety of topics, positive, neutral, and negative. If you make Corn Flakes, you only care about the sentiments relating to Corn Flakes (and maybe your competitors) in a blog, but not about the rest of the blog that may talk about something entirely unrelated.

    I don’t know how anyone is getting any decent precision and recall on sentiment without using full NLP deep parsing…

    - Justin Langseth, President & CTO, Clarabridge, Inc.

  • Pingback: A Practical Sentiment Analysis Alternative For Social Media | BrandSavant

  • http://blog.ecairn.com dominic

    Hi
    Sentiment analysis 1) doesn work 2) is a wrong approach both in marketing and in social media

    For the 1- no need to elaborate. People get barely 70% with a huge level of training up to the point that using sampling ans human rating is more cost effective. And most of conversations are stated neutral. They would be as close saying all conversations are neutral.

    As for 2
    - Take a brand like Benetton. In such fragemented industy they are ok to piss off 90% of the world if the 10 other % become fans.
    This all comes to positioning. If the positive sentiment comes from the wrong people or on the wrong attribute then it’s not positive.
    It’s good that exclusive brand are viewed as too pricey and discounted airlines are not expected to give champagne for free. -it shows they don’t cut on security …

    As for the “social” part. When consumer say they want brands to engage, they mean people, not algorithms. You can’t develop trust and relationship with a machine.

    Best

  • Tom Webster

    Dominic–As to being the wrong approach, sentiment analysis is certainly no panacea. And, in the Benetton example you give, a snapshot is irrelevant. But trending that data over time is never irrelevant. Genuinely knowing that you are moving the needle over time after some kind of reputation-killing crisis is certainly worth monitoring, for example. Again the trend is your friend, even if the snapshot is worthless.

    Obviously people want to develop relationships with people-but having a metric for the overall impact of those efforts over time is not a bad thing–it helps justify the effort to the C-level and encourage more companies to be good actors in the space. That ain’t all bad.

  • http://www.mattsnod.com Matthew Snodgrass

    Tom, thank you for raising this issue. I use one of the social media monitoring tools and have found huge inconsistencies with its automated sentiment tracking. On a small data sample, I found that it had accurately assigned posts as positive or negative only 11% of the time. With the set of computer-assigned positive and negative posts, I went back to manually assess them. All subjectivity aside, I found that the results were horrible, rendering the automated sentiment tracking virtually useless.

    The problem is, computers still can’t “understand” the nuance of human speech. People don’t communicate — especially online — in a concise, measurable fashion.

    Great piece, Tom.

  • Pingback: The value of the human touch | It's Open - Social Media Strategy Consultancy

  • http://www.theradicalear.com Thompson Morrison

    You’re very wise to be skeptical about sentiment analysis – it strikes me as a very sketchy solution to a complex problem – how to know what’s on your customers’ minds. Much more important is engagement with the customer, but that takes effort and strategy. Here are my recent comments on the overselling of social media.

  • http://upprdwnr.com Jared Macke

    Hi Tom,

    Excellent article! Myself and a few colleagues *do* have Computer Science degrees and *still* share your skepticism. Our answer to it is a service we’ve launched called “upprdwnr” (http://upprdwnr.com). Rather than automatically monitor everything that is said, we ask a little help of the community (just Twitter right now) – tell us if you’re thinking positive or negative.

    So, in your example above – if “Toyota FTW!” is tweeted instead as: “#Toyota FTW! #uppr” upprdwnr sees this as a positive thing for #Toyota. Similarly, “If my #Toyota would stop when I pressed the brake, I’d LOVE it! #dwnr” would be seen as a negative thing for #Toyota.

    Blunt? Sure. Old school? Kind of. But we think it addresses a lot of your concerns (even if it introduces a new one – mass adoption), and we think it is more in the spirit of the social space. These hashtags aren’t exactly asking you to do backbends in your tweets (in fact, we’ve found many situations where using them actually made our tweets shorter), and as a result we’ve got humans deciding what humans feel instead of robots in the ether.

    Would love to hear your take on our approach given your skepticism of the automated flavor. We’re just getting rolling, so we’ll take any feedback or advice to work toward that little mass adoption issue! :)

  • Pingback: It’s Official « upprdwnr blog

  • http://www.vquence.com/ Silvia Pfeiffer

    Jim Kerr: I have another addition to your list: “at the per user level, status updates are made more often than blog comments, blog posts, and message board posts.” You should not forget about comments on video sites either – there is huge number of them.

    There is enormous brand exposure on YouTube and the comments provide valuable feedback from engaged users. I’ve just undertaken an analysis of the demographics of YouTube commenters and it’s most instructive, see http://www.vquence.com.au/2010/04/25/youtube-commenters-demography/ .

  • dt

    I do work in the algorithmic (applied research) side of sentiment analysis. It’s true that we are far from HAL 9000 and passing the turing test and I agree about the current state of sentiment analysis. It will be obviously difficult to identify Jorge Luis Borges style prose, including irony, behind opinions, but the major part of opinions are similar in some way, and if you have a big amount of data to boost your algorithms you’re in a good situation.

    Then you need data, and exploratory tools to analyze new kind of texts rather than spending time in individual items.
    Who is in a better position for that? I think it’s Google, they outperform in machine learning research too.

    BTW, that was an opinion, now is near an assertion: Google’s New Review Search Option and Sentiment Analysis http://www.seobythesea.com/?p=1488

  • Tom Webster

    The Google sentiment analysis is performed on reviews, which is vastly different from attempting the same on random, unstructured text. I’m sure it is easier to gauge sentiment from explicit reviews!

    Big thumbs-up for the Borges reference, though – a big fave.

  • dt

    Hi Tom,

    What I mean is: Google surely is working in sentiment analysis beyond product reviews because they are in the best position to do it.

  • APL

    Hi Tom,

    Interesting post and quite thought-provoking. I think that it’s actually incorrect to look at twitter posts and blogs as “unstructured” or “random” text. Most text and sentiment analysis is done on a topic, or brand mention. That brand or theme is the linguistic frame. The posts and tweets around a particular theme are structured in reference to a frame as I see it.

    Further, sentiment analysis needs more linguistic grounding to become more accurate. I have seen and treat blogs and tweets as two different types of discourse.

    My point? You need to think about the function and structure of the text being analyzed.

  • Tom Webster

    The only thing that I would add to your excellent comment, Adam, is that the devil lies at the end of your first paragraph: “…as I see it.” Just so – it is the human element that makes sentiment analysis worth considering. I think my point here was more in reference to automated sentiment analysis measures, which can’t (yet) reliably make the kinds of assumptions about frames and context to which you allude.

  • Pingback: “Unaided Recall” in Social Media Research | BrandSavant

  • Pingback: Paradigm Shift: Trends in Discourse Analysis (aka Text Analytics) | Perspectives on Consumers

Previous post:

Next post: