Go back to mixpanel.com

A/B Testing to increase user engagement

Posted

Here’s the idea: sometimes you want to optimize for engagement, rather than something more concrete like revenue or conversion rate.  Measuring engagement is a trickier task, (one you should use Mixpanel for), but that just makes it more interesting.

Let’s use Posterous (the service that hosts this blog) as an example startup that might want to optimize for engagement.  Perhaps they want to increase the number of comments received on blog posts, so they drum up an A/B test between their current layout (the control group) and one with a greater emphasis on commenting.

Once it’s been running for a while we can start to analyze our hypothetical A/B test.  We can approach it in a few ways: 

  1. Average actions per user
  2. Ratio of active to inactive users
  3. User activity distribution
Comparing average actions per user

The first and most naive approach is to simply compare the average number of actions per user from each group in your A/B test.  This is really simple to do, and it looks something like this:

Hypothetical Posterous A/B Test: Average comments per visitor

Group

Visitors

Comments

Average

Control

334690

7459

0.0222

Experimental

322784

9567

0.0296

If we use this metric, we see that the experimental design is clearly winning, with approximately 30% (0.0296/0.0222) more comments per user.  The trouble with this technique, though, is that we don’t know anything about the distribution of users who commented.  For all we know, all of the comments in the experimental group could have been posted by a single user – and that wouldn’t be optimal.  This is an obvious exaggeration, but it leads nicely to our next option:

Comparing active user ratios

We can avoid the issues with the previous method by ignoring the actual number of comments and just looking at our visitors.  We classify each visitor as Active or Inactive – those who post at least one comment and those who don’t.  This lets us ignore any outliers, such as a visitor who posts a thousand times. Now our table looks like this:

Hypothetical Posterous A/B Test: Active visitor ratio

Group

Unique visitors   

Active visitors

     Ratio

Control

334690

4533

     0.0135

Experimental

322784

3357

     0.0104

When we look at the proportion of visitors who posted at least one comment, we can see that the control group is beating the experimental group by around 30% – the complete reverse of our last conclusion.  

It’s interesting that the two metrics we’ve used so far to measure user engagement can give entirely different results – it shows that we really need to look at the underlying distribution.

User activity distribution

The most likely outcome of an A/B test like this is a couple of differently shaped distributions.  They will still be quite similar, and in all likelihood will be power-law shaped (as the vast majority of visitors don’t post at all).  So, without further ado, here are our distributions:

Picture_15

This graph may be a bit difficult to interpret. It shows the frequency of visitors with different comment counts – for example, there might be 1,000 visitors who left 2 comments, 345 who left 3 comments, and so on.  This means that a point (X, Y) on the curve tells you that there were Y visitors who left X comments.  Because this is just an example, the specific numbers don’t really matter.  The most important part is the overall shape of the curve.

We can see that most users don’t comment at all, and that there are very different behaviors between groups.  The control group (green line) has more users that write a small number of comments each, while the experimental group (red line) has fewer active users who comment frequently.

The question becomes ‘do you want a smaller, highly engaged community, or a larger, less engaged community?’ There’s no easy answer here; it’s more of a think-long-and-hard sort of situation that greatly depends on individual aspects of your startup.

Ultimately, the distribution method is the most powerful, but it’s also the most difficult to implement and analyze – especially since your results will likely be less clear-cut than the contrived examples I’ve given here.  If anyone has some hard data, I’d love to hear about it – it would be great to have a case study.  Please email me at tim@mixpanel.com if you’re interested in sharing.

—–

This post is based off of a conversation I had with Jesse Farmer a few weeks ago.  If you haven’t read his blog, you should – there are some real gems.

 

 

45 Comments

  1. jessefarmer
    Reply

    I was going to say, this looks familiar! Glad our conversation was useful. :)

  2. Ben Tilly
    Reply

    One of the tricks with A/B testing is finding the right question to ask. Let me give a realistically hard example. Suppose you’re running a web 2.0 company whose revenue is advertiser driven. Is it better to get people engaged with your site so you get more high quality content, or get people to click on an ad as fast as possible so you make money? Ideally, of course, you want some balance of both. But the changes that improve the one are likely to make the other one worse.There is no easy answer to that question. And furthermore I submit that the expertise of knowing which questions you believe drive your business belong in house, and not with an external consultant.Moving on, when you get results you have the question of how to analyze them for statistical significance. Your first approach with raw counts is amenable to a chi-square test.The graph is obviously trickier. What I would suggest is coming up with some function that maps each user to a value. This is called a *metric* because it is something you’ll measure. One reasonable possibility is revenue earned. Another is log(actions + 1). An infinite number of other possibilities exist, and you should make a choice that fits your needs. Then you can apply a significance test to figure out which version drives the most improvement in the average value of the metric you chose.http://elem.com/~btilly/effective-ab-testing/ has explanations of how you can set up those significance tests.Good luck!

  3. RexDixon
    Reply

    So what do you think of such services such as www. performable. com or places like www. abtests. com where you can share your tests? Do you think they are worthwhile, and what services have you tried if any? What do you like or dislike about these a/b testing services? Are there too many?

  4. Tobias Mitter
    Reply
  5. mickenny
    Reply

    this is a useful test for website effectiveness..actually i have one essay site with two different designs. I’m using the http://www.google.com/websiteoptimizer for the ab testing

  6. virtualbusiness
    Reply

    Is it like google analytics?business travel tips

  7. professional seo
    Reply

    One reasonable possibility is revenue earned. Another is log(actions + 1). An infinite number of other possibilities exist, and you should make a choice that fits your needs.

  8. lauraklein
    Reply

    Great post and very nice visualization of the problem with the graph. I’m a huge fan of A/B testing, but I’m an even bigger fan of it when it’s combined intelligently with traditional qualitative testing. For example, I’d say you have one other option beyond choosing between a small, highly involved group of people and a larger, less involved group. Once you have your A/B test results, you could do a few qualitative research sessions where you talk to people from each group about their behavior to try to understand why there are differences. Then you could generate some ideas about how to optimize for both groups (which you would, of course, then A/B test!).

  9. realtor liability insurance
    Reply

    For example, I’d say you have one other option beyond choosing between a small, highly involved group of people and a larger, less involved group. Once you have your A/B test results, you could do a few qualitative research sessions where you talk to people from each group about their behavior to try to understand why there are differences. Then you could generate some ideas about how to optimize for both groups (which you would, of course, then A/B test!).

  10. learn affiliate marketing
    Reply

    It seems like this model for testing could be refined and refined over time to produce something very specific. Thanks for the clearly laid out explanation.

  11. Loft Conversions London
    Reply

    What was the main difference between the control and experimental that was supposed to increase engagement. A simple “please feel free to leave your feedback” text or something a little bit more passive. I’d really like to be able to increase engagement levels, some pages get lots of comments per visitor some do not but as yet I haven’t tested specific methods ot increase them.Great info. Thanks

  12. QualityResearch
    Reply

    Seems like a good idea! We will certainly try to use this info for our Essay Site.Wonder if we could put this to use for our complete chain of Quality Researchsites.http://www.marketingwebbusiness.comhttp://www.customtermpaper.org

  13. resume writing service
    Reply

    Ben Tilly,Thanks for your helpful addition. After reading the article I still didn’t understand it clearly and now I do get the point.

  14. Loft Conversions London
    Reply

    What was the main difference between the control and experimental that was supposed to increase engagement. A simple “please feel free to leave your feedback” text or something a little bit more passive. I’d really like to be able to increase engagement levels, some pages get lots of comments per visitor some do not but as yet I haven’t tested specific methods ot increase them.Great info. Thanks

  15. Misel Stics
    Reply

    As a youngster, watching his teams win it all, year after year, I became a huge fan of John Woodenthanks for sharing

  16. pass4sure 000-908
    Reply

    So what do you think of such services such as www. performable. com or places like www. abtests. com where you can share your tests? Do you think they are worthwhile, and what services have you tried if any? What do you like or dislike about these a/b testing services? Are there too many?

  17. pass4sure 000-085
    Reply

    So what do you think of such services such as www. performable. com or places like www. abtests. com where you can share your tests? Do you think they are worthwhile, and what services have you tried if any? What do you like or dislike about these a/b testing services? Are there too many?

  18. 642-661 dumps
    Reply

    Do you think they are worthwhile, and what services have you tried if any? What do you like or dislike about these a/b testing services? Are there too many?

  19. 642-262 dumps
    Reply

    As a youngster, watching his teams win it all, year after year, I became a huge fan of John Wooden

  20. christina12
    Reply

    It just like a google analytics, http://www.studentloaninfo.org/blog/

  21. startbattery
    Reply

    That was nice. Thank you for sharing this one. http://www.start-battery.com/blog/

  22. Robert
    Reply

    It seems like a google analytics.

  23. alymcdowel
    Reply

    Good article

  24. alymcdowel
    Reply

    term paper help

  25. johnkerry
    Reply

    Mera PakistanGood Site………

  26. johnkerry
    Reply

    Great Post Click Here

  27. PollPolPolls
    Reply

    what a wonderful world!

  28. PollPolPolls
    Reply

    amazingTenant Screening

  29. PollPolPolls
    Reply

    Wow I was fascinated

  30. PollPolPolls
    Reply

    keep up the good workbankruptcy

  31. jessiegetts
    Reply

    I found your analysis very interesting and I think the ratios are important.

  32. PollPolPolls
    Reply

    I tried this and it really worked. Nice job!Tenant Background Check Cheap Bankruptcy

  33. deanameske
    Reply

    I impressed greatly the way you did the analysis, I got many reference about this topic but your one is wonderful, Thanks a lot for sharing this!Brochure Design

  34. trevordoe
    Reply

    Yeah indeed very considerate for the lector! Most of the posts in the blog excite with proficient comprehension… Thanks for the counsel! None of the writing services could arrange such an occasion, all they can do is to make me buy custom essay papers which are not incomparable.write my essay

  35. lauradw
    Reply

    Tracking readers and comments is vital to the marketing of your blog.custom QR codes Thanks for breaking down the numbers and analyzing the data.

  36. Christina_Gomes
    Reply

    How to use average action per user? DWI laws

  37. OnlineCNAClass
    Reply

    I think the trash feature is great, I haven’t used it yet but when I do delete a blog post by mistake at least I know I can recover it.Online CNA Classes | CNA Classes Online , CNA Certification

  38. dandaly3
    Reply

    A/B testing is a critical part of improving performance. ABT = Always be testing!—–driving instructor training

  39. freegooglesms
    Reply

    I wanted to thank you for this great read!! I definitely enjoying every little bit of it Smile I have you bookmarked to check out new stuff you post. best seo services , resources , funny sms

  40. iziel
    Reply

    Gettting visitors to your site is hard but getting them engaged with you or your product must be a lot harder. Internet users have a very short attention span. It only takes few second before they click away and gone. Either you will need to offer something good or be satisfied with engaging with core visitors only not the chance visitors. cheap auto insurance quotes

  41. Business Directory
    Reply

    you done a great work.If we use this figure, we see that the experimental design is clearly to win, by about 30% (0.0296/0.0222) for other user comments.

  42. maria11
    Reply

    you done a great work.If we use this figure, we see that the experimental design is clearly to win, by about 30% (0.0296/0.0222) for other user comments.Business Directory

  43. wire spool
    Reply

    I can’t say that entirely agree with you, but there are some good points about this.

  44. Qualir
    Reply

    The test of optimize for engagement is very good!, let me know many tings !

  45. jennalawless
    Reply

    Split testing is very important in any website. I use it extensively in my best electric toothbrush site – great idea.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>