Tom Webster, writing and speaking

Google Zeitgeist, Twitter Trends and Consumer Behavior

Added on by Tom Webster.

Twitter's Chief Scientist recently released a snapshot of the Top Twitter Trends of 2009, and if you are a regular denizen of Twitterville, the names and hashtags will all be fairly familiar to you. I'm already in the habit of regularly consulting Google's Zeitgeist page, which aggregates and codes the billions of search queries Google processes and provides a similar snapshot of search behavior to reflect the "spirit of the times." While there have been a few comparisons made of Twitter search and Google's new real-time search, I find comparing their 2009 trends data far more interesting--not as an historical record of our times, but a glimpse into the differences in their user behaviors, and where those behaviors differ from other empirical data. Some of the cleanest data for comparative purposes is in the entertainment industry. For 2009, here were the top 10 most-searched movie trailers according to Google:

  1. New Moon
  2. Transformers 2
  3. Bruno
  4. Avatar
  5. Star Trek
  6. Twilight
  7. GI Joe
  8. 2012
  9. Paranormal Activity
  10. Watchmen

Besides the potential duplication of New Moon and Twilight (search can't read minds, after all), a fairly unremarkable list--or is it? There are some interesting omissions, which I'll get to in a moment. First, here's what Twitter says were the top 10 trending movies (and I realize that movie titles and movie trailers are apples and oranges, but even if you don't accept trailer interest as a proxy for movie interest, there are still insights to be had here):

  1. Harry Potter
  2. New Moon
  3. District 9
  4. Paranormal Activity
  5. Star Trek
  6. True Blood (sadly, Twitter didn't catch this one, as TB is a TV show)
  7. Transformers 2
  8. Watchmen
  9. Slumdog Millionaire (a 2008 release, but probably chatter from Oscar night twitterers)
  10. GI Joe

A few obvious similarities (Star Trek, Watchmen, New Moon) and some oddly idiosyncratic differences (Bruno for Google, District 9 for Twitter). Certainly both are heavy on Sci-Fi/Fantasy, which might have been a predictable outcome for Twitter with a user base that still relies heavily on early adopters, but less obvious for Google. Here is where you might throw the confounding variable of movie trailer vs. movie, since the movies on Google's list had some fantastic trailers, but both lists actually look pretty reasonable when you put them side-by-side.

Except, that is, when you look at what actually happened--what actually were the most popular movies in 2009, according to box office data. Here's what Nielsen has to say:

  1. Transformers
  2. Harry Potter
  3. Up
  4. The Hangover
  5. Star Trek
  6. Twilight
  7. Monsters Vs. Aliens
  8. Ice Age: Dawn of the Dinosaurs
  9. X-Men Origins
  10. Night At The Museum: Battle of the Smithsonian

The scorecard: Google got three, Twitter four. There are problems with a nitpicker's analysis of this data (Paranormal Activity and District 9 didn't have the distribution that Harry Potter did, for instance) but there are at least two things I'd point out here. First, The Hangover was a phenomenon, yet it didn't crack either list from Google or Twitter. Second, and here is the only deductive conclusion I'd venture here, anlalyzing unstructured data online from sources like Google and Twitter has some pretty glaring holes, like kids (and possibly moms). Neither of the online data sources captured the true popularity of Up, Monsters Vs. Aliens, Ice Age or Night At The Museum, all family films that struck box-office gold.

Those biases are also evident in a look at the top 10 television shows of 2009. Here's what Google had:

  1. Glee
  2. Bones
  3. Fringe
  4. NCIS
  5. Castle
  6. House
  7. Wipeout (though it's hard to see how Google corrected for false positives/negatives with some of these...)
  8. Medium
  9. Leverage
  10. Kings

An odd list, with some hugely popular shows (Idol, NCIS) and some that missed the mark (Kings). Here's Twitter's take:

  1. American Idol
  2. Glee
  3. Teen Choice Awards
  4. Saturday Night Live
  5. Dollhouse
  6. Grey's Anatomy
  7. Video Music Awards
  8. Battlestar Galactica
  9. BET Awards
  10. Lost

Fascinating differences here! Certainly, Twitter's popular usage as a play-by-play companion to live events is most obvious here, with three awards programs in the top 10. Twitter has developed a thriving use-case as a real-time companion to television programming, a very different usage scenario to Google's.

However, before we complete that mental leap, there are some noticable examples of live programming not covered by either list...

Nielsen's Top 10 (Regularly Scheduled) TV Shows for 2009

  1. American Idol - Wednesday
  2. American Idol - Tuesday
  3. Dancing With The Stars
  4. NBC Sunday Night Football
  5. Dancing With The Stars Results Show
  6. NCIS Los Angeles
  7. NCIS
  8. NFL Regular Season (ESPN)
  9. Sunday Night NFL Pre-Kick/li>
  10. The Good Wife

Like the movie lists, above, the online TV lists seem to have missed Mom a bit (The Good Wife, Dancing With The Stars). But they sure as heck missed sports. In fact, to make a potentially better comparison with the Twitter list and its assortment of awards programs, here are the top ten single telecast ("events") TV shows, according to Nielsen:

  1. Football
  2. Football
  3. Football
  4. Football
  5. Football
  6. Football
  7. Academy Awards
  8. Football
  9. Football
  10. Football

I took a few shortcuts, but you get the picture.

All of this comes down to a recurring theme in this blog: non-response bias. It is tempting to want a transition to analyzing unstructured online data as a replacement for other, more "flawed" measurement methodologies. In some cases, that's warranted, believe me--and even measurement titans like Nielsen and Arbitron have some work to do to catch up. But, as your financial advisor insists (I hope!), past performance is not indicative of future behavior. In the case of some of the Sci-Fi/Fantasy films, search/tweet behavior was in fact a precursor to mainstream popularity. In the cases of children's programming, some female-targeted shows/movies, and sports, actual behavior differed significantly from the unstructured data sources online I looked at.

Non-response bias is the bias introduced into data not by those who provide data, but by those who don't. It's a stronger source of bias in Twitter-mining, certainly, since we have no idea who is and isn't using Twitter, only that there aren't as many of them as there are Google users. But it's always there, and it always serves as a reminder: unstructured online data mining is an additional input, another filter to enrich our ever-deepening view of the consumer. It augments, but it doesn't replace, the other sources.