Go back to mixpanel.com

Data Inconsistency on January 16

Posted

On January 16th, from 6:00 – 6:20 pm Pacific Time there was an error in our production system, causing every event sent to Mixpanel to be counted four times. All of our customers, if looking at an hourly view of their data, will see an artificial spike for the hour between 6:00pm and 7:00pm, similar to the one pictured below:

Spike Image

We are very proud of the accuracy of our data, and are extremely sorry that this error occurred. We know that you make key business decisions based on the data you see in Mixpanel, and even though the miscounting only lasted for twenty minutes, we wanted to make you immediately aware of it. The rest of this blog post will discuss the full details of this error and possible ramifications for your decision-making.

The over-counting impacts all reports on Mixpanel.com or any API calls that request a total count for any event and include 6pm PST on 1/16/12 within the time period of the query. However, this over counting does not impact queries for unique event counts. This means that the Funnels report and the Retention report, which are entirely based on uniques, are completely unaffected. In addition, any queries in the Segmenation or Trends report in uniques mode are not impacted either.

How can I adjust my data to account for this?
In most cases, we do not recommend trying to adjust your data to account for this error. For daily reports, the difference will be trivial. For the day of January 16th total event counts will be roughly 5% higher than they were in actuality. For monthly reports, the total event counts will only be 0.1% higher than they were in actuality. The only case where you might want to adjust the data coming out of Mixpanel is in the case of hourly reports. If you are basing a decision on an hourly report, then divide the count for your total events that happened during the hour of 6 PM by 2. The result will be very close to the true event count for that hour.

Will I be billed for these data points?
Absolutely not. You will not be charged for any overages to your plan caused by this error.

I’m a geek – tell me what really happened
First, it’s necessary to describe a small part of our infrastructure. When a user sends an event to Mixpanel, we do a small amount of validation — mostly checking for syntactical correctness — and then immediately put the event on a queue. Under normal circumstances, the number of items on the queue stays very close to zero, meaning that within seconds of sending an event it should show up in your reports. However, decoupling receiving events from processing them allows us to easily perform server maintenance that would otherwise require significant downtime.

For a long time now, we’ve had multiple queue servers so we aren’t reliant on a single machine, but we haven’t had automated failover. In practical terms, that means that if a queue server goes down in the middle of the day we can do a manual failover within minutes, but if it goes down in the middle of the night, it could be quite a bit longer before we can switch everything over. The change we pushed out Monday was intended to remedy this situation. Basically, when an event comes in, we try each queue server that we currently think is up one at a time until we successfully enqueue an item. Unfortunately, our code to check whether putting an item on a queue was successful or not was incorrect and consequently each event was added to each queue server (currently, there are four). We noticed the problem almost immediately and had the fix within 20 minutes.

What we are doing to keep this from happening again
Unfortunately, although we have queueing related tests, in our test environment there is only one queue server and so none of our tests caught this particular problem. That particular hole will be fixed in the coming days.

Once again, our most sincere apologies and regrets for this error. If you have any questions please do not hesitate to reach out to us at support@mixpanel.com.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>