Here’s the idea: sometimes you want to optimize for engagement, rather than something more concrete like revenue or conversion rate. Measuring engagement is a trickier task, (one you should use Mixpanel for), but that just makes it more interesting.
Let’s use Posterous (the service that hosts this blog) as an example startup that might want to optimize for engagement. Perhaps they want to increase the number of comments received on blog posts, so they drum up an A/B test between their current layout (the control group) and one with a greater emphasis on commenting.
Once it’s been running for a while we can start to analyze our hypothetical A/B test. We can approach it in a few ways:
Average actions per user
Ratio of active to inactive users
User activity distribution
Comparing average actions per user
The first and most naive approach is to simply compare the average number of actions per user from each group in your A/B test. This is really simple to do, and it looks something like this:
Hypothetical Posterous A/B Test: Average comments per visitor
If we use this metric, we see that the experimental design is clearly winning, with approximately 30% (0.0296/0.0222) more comments per user. The trouble with this technique, though, is that we don’t know anything about the distribution of users who commented. For all we know, all of the comments in the experimental group could have been posted by a single user – and that wouldn’t be optimal. This is an obvious exaggeration, but it leads nicely to our next option:
Comparing active user ratios
We can avoid the issues with the previous method by ignoring the actual number of comments and just looking at our visitors. We classify each visitor as Active or Inactive – those who post at least one comment and those who don’t. This lets us ignore any outliers, such as a visitor who posts a thousand times. Now our table looks like this:
Hypothetical Posterous A/B Test: Active visitor ratio
When we look at the proportion of visitors who posted at least one comment, we can see that the control group is beating the experimental group by around 30% – the complete reverse of our last conclusion.
It’s interesting that the two metrics we’ve used so far to measure user engagement can give entirely different results – it shows that we really need to look at the underlying distribution.
User activity distribution
The most likely outcome of an A/B test like this is a couple of differently shaped distributions. They will still be quite similar, and in all likelihood will be power-law shaped (as the vast majority of visitors don’t post at all). So, without further ado, here are our distributions:
This graph may be a bit difficult to interpret. It shows the frequency of visitors with different comment counts – for example, there might be 1,000 visitors who left 2 comments, 345 who left 3 comments, and so on. This means that a point (X, Y) on the curve tells you that there were Y visitors who left X comments. Because this is just an example, the specific numbers don’t really matter. The most important part is the overall shape of the curve.
We can see that most users don’t comment at all, and that there are very different behaviors between groups. The control group (green line) has more users that write a small number of comments each, while the experimental group (red line) has fewer active users who comment frequently.
The question becomes ‘do you want a smaller, highly engaged community, or a larger, less engaged community?’ There’s no easy answer here; it’s more of a think-long-and-hard sort of situation that greatly depends on individual aspects of your startup.
Ultimately, the distribution method is the most powerful, but it’s also the most difficult to implement and analyze – especially since your results will likely be less clear-cut than the contrived examples I’ve given here. If anyone has some hard data, I’d love to hear about it – it would be great to have a case study. Please email me at email@example.com if you’re interested in sharing.
This post is based off of a conversation I had with Jesse Farmer a few weeks ago. If you haven’t read his blog, you should – there are some real gems.