The new focus of data science - Trade-offs - FB example

The answer below perfectly summarizes a key concept in recent data science, perhaps the most important concept of all.

During the first phase of product data science (say up to 2020), the goal was pretty straightforward. The goal of a data scientist was finding opportunities to increase a key metric. E.g. improve engagement on FB -> people who get at least 7 friends in 10 days have higher engagement -> build a product to incentivize that behavior. People who tweet within one week from creating an account have higher retention -> incentivize that behavior, and so on.

However, after 10 years or so of data science optimization, those clear-cut opportunities are incredibly rare in large tech companies. Instead, over the last few years, most data science work has been about trade-offs.

The answer below summarizes this concept:


Simplifying what's happening:


  • Longer sessions/power users start from text and then move to pics


  • Shorter sessions/casual users just focus on pictures (and focusing on casual users is typically the #1 goal of data science)


On one hand, incentivizing posting pictures seems like a good idea: it would increase engagement from casual users + it might make longer sessions even longer, since power users shift to pics as the session progresses.

On the other hand though, longer sessions start from consuming text. So, if the goal is incentivizing a behavior predictive of long term engagement, the most obvious test here would be making it easier for users to consume text at the beginning of a session. The goal would be to try to make casual users become power users as well.

Personalization also has downsides. If, at the beginning of a session, you give more text to power users and pics to casual users, it becomes very hard to convert casual users into power users. And pretty much all users start as casual users, so you wouldn't have new power users in future.

No matter what you do here, you are taking the risk of losing engagement from either power users or casual users. And these kinds of trade-offs are super common. Youtube pushing shorts vs long videos, Airbnb pushing rooms vs entire houses, etc.

From a data scientist standpoint, this implies that estimating the long term effects of a change has become even more important. Because of the multiple trade-offs, it will be unclear whether a change is positive or negative only based on standard short term metrics (pct of users with at least one action, time watched per user, revenue per user, etc.). Instead, accurately predicting the long term effects is what allows to clearly understand if a change is positive or negative, and that's become the #1 focus of data scientists (e.g. shorts might be losing money in the short term because they take views away from the more profitable long videos, but can we predict the long term impact on revenue?).

Also, these scenarios often mean that a local optimum has been reached and that trying something significantly different could be a way to generate new growth (e.g. Can we look at past data and figure out a different monetization strategy for shorts that will improve long term revenue?). Indeed, data science work in large tech companies has become much more focused on new products/trying radically new things vs how it was just a few years ago, when it used to be almost exclusively about minor, but constant, optimization. This has made the work more complicated, but arguably more exciting.

The Netflix lesson later in this section gives a concrete example of a similar scenario and how it was dealt with.

Complete and Continue