LinkedIn has launched its first version of the People You May Know Feature. How would you isolate the impact of the algorithm behind it w/o considering the UI change effect?
LinkedIn has launched its first version of the People You May Know Feature. How would you isolate the impact of the algorithm behind it w/o considering the UI change effect?
Answer:
Whenever you launch a first version of a data product, i.e. a new product powered by machine learning, you are making a lot of changes on the site. Let's consider the People You May Know Feature. The first time it was launched, it implied adding to the user newsfeed a new box with clickable links. That new box with additional links by itself has high chances of moving the target metric, regardless of how good was the algorithm used to suggest people.
It is therefore hard to understand in which proportion the metric change was driven by the algorithm behind the new feature vs the UI change needed to accommodate the new feature. 
In these cases, you need to test each component separately. After all, the whole point of A/B testing is to isolate the effect of just one change. A way to exactly isolate the two components is to run 3 versions of the site at the same time:
- Version 1 is the old version 
- Version 2 is the site with the People You May Know Feature, where suggestions are based on the machine learning model that was developed 
- Version 3 is the site with the People You May Know Feature, but suggestions are random 
The difference between version 2 and 3 will tell you the gains coming only from the model. 
This approach is risky though cause users in version 3 might lose faith in the feature, and once users decide a new feature is bad, it is really hard to make them change their mind.
 
A milder approach would be to replace random suggestions with something super basic, which machine learning should easily beat, but that still makes sense. For instance, you can use a history-based model (suggest users whose profiles were visited in the past by that user) or simply suggest users with the highest number of shared connections. These versions will still give a baseline to compare your model to, without giving users in one test group the impression that your new feature is very stupid.
Many additional product questions and case studies are included in the full course in product data science.
Also, if interested in Gen AI DS case studies, you might want to check out the brand new
 Gen AI Data Science course.