Influence scores, as we know them today, are all based upon algorithms. Algorithms are commonly confused with formulae, but they are surely two different things. The volume of a circle is a formula – it’s math. That x number of retweets has y effect on your influence score, however, is an algorithm. There might be some math in there, but I like to think of algorithms as math plus assumptions.
An influence score makes assumptions about the value of your follower count, how many people click on your links, etc., and then bashes those assumed values together with yet another set of assumptions – their supposed relationship to each other. Yes, there are mathematical functions involved, but just as the “likely voter model” many pollsters use for pre-election polls can never predict whether or not a specific individual will actually vote, the influence score will never be able to predict the impact of an individual on the behavior(s) you are trying to influence.
And that’s really the biggest issue with these scores, isn’t it? All of the algorithms being used by these services are amalgamating the behaviors of the many, and attempting to assign a value to the individual. This kind of inductive reasoning is always problematic. Here’s why:
Measure Three Times, Cut Once
There are, broadly, three kinds of measures: descriptive, diagnostic, and predictive (and these aren’t mutually exclusive – the best measures have elements of two or three of these all rolled into one.) Descriptive measures tell us what happened. Diagnostic measures tell us why it happened. And predictive measures help us make good guesses about what might happen in the future. The modern crop of influence scores (and I’m talking specifically about the single, reductive and non-context-specific number from 1-100 most of these sites spit out) are, I would argue, purely descriptive measures.
What Klout scores (or those from PeerIndex, or TweetLevel) can fairly be said to reflect is this: activity. It’s demonstrably true that increased activity on social networks (particularly Twitter) has a correlation with higher scores. Activity is not “influence,” of course, but it is something, and I’m not prepared to dismiss that something out of hand. So my influence score may in fact reflect some measure of my activity online, and my ability to encourage some form of activity in others. Thus, my score is descriptive of that activity level. It is not diagnostic of that level, however.
The scores, as they are presented, are inscrutable. My Klout score has fluctuated a fair amount in the past 60 days. I’m not sure why. I’m sure there are some very defensible assumptions for that fluctuation built in to Klout’s algorithm, but the point is that the reasons for that variance are entirely opaque to me. In other words, my score, and even the peripherals around it to which I have access, do not tell me why the fluctuation occurred. Thus, influence scores can not be used as diagnostic measures. (My topics, however, are right on the money. Klout is nailing this lately.)
A Cosmetic Problem
Similarly, the scores are predictive of nothing, which actually makes them very difficult to use. For example, I’m fond of comparing my Klout score with Snooki’s Klout score. After several months of concentrated effort, I have finally pulled ahead of Snooki (see, Mom? I told you I’d eventually make you proud.) But if you represented a cosmetics company trying to launch a new brand of sub-premium skin bronzer, who would you target – me, or Snooki? The answer is obvious, of course, but consider this: if my Klout is 68, and Snooki’s is 65, how much worse would I be at pushing bronzer? Would Snooki be twice as effective? Three times? A thousand times? There are two answers to this, of course. One is that as I am just one shade darker than an albino, the right answer is probably one million. The other answer is – you cannot possibly tell, and the scores obfuscate this, if anything.
So we have a purely descriptive measure – the influence score – but we lack the diagnostic and predictive measures that would allow us to do what every organization should be doing: learning, optimizing, and getting better. How can your company or brand take a flawed measure – the influence score – and make it better?
What Are We Really Trying To Measure?
Well, since the various influence measures are based upon a series of assumptions, let’s make a few of our own, here. First of all, most popular influence measures are heavily, if not entirely, based upon Twitter activity. Twitter’s asymmetric nature essentially means it functions as a broadcast platform – the few, reaching the many – so let’s start with something we can sink our analytical teeth into: reach and frequency. When an individual tweets out a link to some kind of content or offer, they do so with two hopes: that their followers will click on the link, and that their followers will retweet or otherwise disseminate the link to their networks, thereby increasing the potential reach of the message. So, when someone solicits, either explicitly or craftily, one of the various social media power users to help disseminate a message, the clear hope is that their message will be spread to as many people as possible using network effects.
While the exact relationship between followers and impressions is nearly impossible to calculate using clickstream measures (you have no way of knowing, after all, how many of your followers actually had the opportunity to see your message, let alone read it), it’s safe to say that more is better; in other words, there is undoubtedly a positive correlation between follower count and the number of people who interact with a given message to those followers. So, let’s assume that the behavior you are measuring for is retweets: tacit endorsements of your message, and increased exposure. Again, this is a pure reach and frequency game, and far easier to measure than “influence,” per se.
Here is a thing you can know: the average number of retweets per follower on Twitter. If you sifted through all that clickstream data from Twitter and examined tweets that contained links (we’ll exclude “conversational” tweets,) you could come up with the number of people who retweeted a given message, and then compare that to the number of followers to the original tweeter. In other words, if I had 5000 followers, and my typical links are retweeted by an average of 20 people, then I have a concrete number to look at: I can generate one retweet for every 250 followers, or 4 for every 1000. This smells suspiciously like a CPM number, doesn’t it? But to be cute, let’s call it “APM,” or actions-per-thousand. If my average link tweet gets retweeted 20 times, and I have 5000 followers, I can generate 4 APM.
With me so far? Now, let’s say that we do this for all Twitter users over a period of time to come up with an “average” APM. It won’t look as linear as the graph below suggests, but roughly let us assume that the average tweeted link is retweeted 10 times for every 1000 followers of the original tweeter. So, as the graph below shows, 20,000 followers would get me 200 retweets, 30,000 would elicit 300, and so on. So, the “Twitter average” APM is 10 (it isn’t, by the way ) .
So now I have a benchmark by which to measure my influencer campaign. Back to my original example, suppose my sub-premium bronzer brand (Ecruage, by CASPER) used Klout Perks to identify people with Klout scores above 65 to target. Now, since neither Snooki nor I have “Cosmetics” as a topic, this requires a bit of a leap of faith on the part of our brand, but not the worst one I’ve seen. So, Snooki and I each get sent a crate of bronzer, and we go to town on the Twitters. Snooki has a lot more followers than I do, of course, but we can both fairly be graded on the APM scale I’ve outlined above.
So I try this crappy bronzer, and I tweet about it. My followers expect me to talk about social media research, consumer behavior, bad music and gin, so my crappy bronzer message comes off as a bit of a non sequitur, as the graph below illustrates:
So while the average Twitter user might generate an APM of 10 (10 actions per 1,000 followers), on this particular message I only got an APM of 4.2. Not so good, CASPER! Snooki, however, gets all serious about this bronzer, and tweets the crap out of it. On an apples-to-apples, retweets-per-follower basis, her graph might look like this (Snooki is the top line):
So, on the topic of crappy bronzer, Snooki might have initiated an APM of 15. There is a clear delta between Snooki’s effectiveness in disseminating this message (the top line) and mine (the bottom line). Two things about this delta: first, it’s endlessly reassuring to me (this is not a contest I’d care to win.) Second – that delta between the expected value (10 APM, or retweets-per-thousand-followers) and Snooki’s (15 APM) can fairly be described by one word:
This is influence, folks. Whatever magical power Snooki worked on this crappy bronzer message (a likely mixture of the relevance of her message to her audience, her perceived authority on the topic, and the actual logical content of her tweet) she was simply better at disseminating this message than I was – and not by a little. The variance shown between her APM and the expected APM IS influence – it’s the mojo she worked using the same system as everyone else, measured like-for-like, that made her far more effective at getting people to spread her message. More message dissemination = more awareness = more trial = more usage. The circle of marketing life goes ever on and on.
The APM Index
Now, if you’d really like to wow your CMO, you could convert Snooki’s effectiveness and my (in)effectiveness into indices, which allows you to compare all of the “influencers” whom you targeted relative to the average. Here’s a primer on calculating index scores if you need one, but essentially all you do is divide the average for the category into the number you are comparing it to, and multiply by 100. This means that the average for ANY index is 100 (in essence, if you divide the average into the average, you get 1, which multiplied by 100 = 100.) Snooki’s APM of 15 equates to an index of 150 ((15/10) x 100), while my paltry effort comes out to an index of 42.
So, to close the loop on this, we started with two similar Klout scores:
…and we end up with our own, topic-specific measure of actual, observed influence – as expressed by the differential in message dissemination:
In my example, there is considerable difference between the original descriptive statistic (the Klout score) and this statistic, which moves us much more in the direction of a predictive statistic (at least on the topic of bronzer, and perhaps the category of cosmetics) that the learning organization can use to make the next “influencer” campaign even better. The influence score helped to make the initial cut, perhaps, but the only way for your company or brand to truly gauge influence is to do the work, and determine which individuals outperformed the average, and which underperformed.
Caveats, Carefully Considered
Now, there are a couple of things (at least) that one might take issue with here – both of which could fairly be described as oversimplifications on my part. The first, obviously, is that the mystical force that allowed Snooki to generate an APM of 15 compared to the average of 10 might not wholly be attributable to “influence.” But if it ain’t an answer, I don’t care – it at least serves as a handy heuristic for the nearly unmeasurable constellation of circumstances between the original tweeter and his/her audience that caused the message to mysteriously do better than the average would have predicted. Influence? Yeah, I think so. It’s at least behavioral, relevant, and a lot closer to “influence” than the activity-based scores we currently have – with the bonus of being relevant to your brand.
The other bone you might pick with me here is that my calculation – and reducing the whole model to differential message dissemination – is also overly reductive. I’ve taken what is surely a complex system and turned it into a back-of-the-envelope calculation. You’re right – it is a back-of-the-envelope calculation. That’s why companies might actually do it. You don’t need an analytics whiz on your staff to take this first pass at measuring your influencer campaigns, and until everybody catches up with you, this’ll do. Master this first, then break out the HAL 9000 when it’s time to make finer distinctions. (I also know a really smart social media research company that could help. Just sayin’.)
The bottom line is this – let’s say you actually use influence scores as some kind of crude segmentation – how will you test your work? How will you know, in other words, if your efforts were successful – and more importantly – what you can learn from them to make them better? The answer, I would submit, is to start with the current crop of popular influence measures as a first pass, but remember that they will never be as accurate as your own performance measures, even as crude as the one I’ve suggested here. There is nothing wrong with Klout, PeerIndex or any of these measures. There are only lazy marketers. And if you are reading this far, my friend, at word 2,200, you are surely not that.