METRICS: How choosing an average-based metric vs a percentile one would impact product development?




The two metrics might look similar. After all, they both try to increase view time. However, they are radically different and optimizing for one vs the other would lead to totally different product development. It is very easy to come up with possible tests that would win for either one of those two metrics, but not for the other one.

The main difference is purely statistical. The avg metric is affected by outliers, where outliers here means power users, users who spend a lot of time on Youtube. If a user spends 4 hours a day on Youtube and increases that to 5 hours, this change will move the avg-based metric. The percentage of users within a given threshold is not affected by outliers. Assuming the threshold is 1 hour per day, whether a user spends 2, 3, or 23 hours per day on Youtube has no impact on the metric.

In practice, the avg-based metric will lead to product optimization that focuses on power users. That’s because those are the users that have more impact on the metric. It is typically easier to increase a power user by, say, 1 hour per day vs someone from 10 minutes per day to 70 minutes. In the second case, you dramatically need to change a given user behavior. For the power user, you just need to figure out how to make them stick around a bit longer, but you already know they love your product. Often, this means adding new or more complicated features. Power users are already expert users of the product and confusing them is not that big of a risk. Also, you really don’t want to make a change that will make power user usage drop because that would make the metric drop dramatically. No change will ever take the risk to piss off power users. In practice, this means that major redesigns are extremely unlikely to happen.

The percentage of users above a certain threshold will lead to focusing on those users who are right below the threshold. Usually the threshold is chosen to separate good users vs bad users (as described here). So that metric will lead to focusing on users who are not quite happy with the product, but they showed some interest. In this case, it could be, for instance, users who come to Youtube weekly, but not daily. Optimizing on this metric typically leads to simplifying the product and making sure they get relevant content right when they land on the site. After all, these are users who spend little time on the site, so it is within that short time frame that their attention needs to be captured.

As usual, it is not that one metric is always better. It mainly depends on the business model. Is the company trying to make a lot of money from relatively few users or little money from a ton of users? In the first case, the avg is a good choice. Gaming companies typically fall into this bucket as well as most luxury/high-end brands. In the other case, the threshold metric is better. This is the case of most consumer tech companies, and especially ads-based ones like Youtube. Also, businesses with network effects will tend to choose the threshold metric. If there are network effects, growth comes from increasing the number of good users, who will then attract new users in a virtuous cycle. The benefits of increasing the number of good users compound. Making power users even more addicted has much less of an overall benefit for the business.

Finally, which metric to choose depends also on the stage at which a given company is. Very early stage start-ups often try to optimize the average-based metric regardless of the business model. The main goal of a small company is to find a small group of users who like their product a lot. And the average is perfect for that. Growing the user base will come later. In any case, this is obviously not the case of Youtube.



For a tool that can answer any custom product case study question, check out the DataScienceGPT app.

Complete and Continue