Airbnb wishlist feature
This is an example of the interview prep approach described in the previous lesson.
That is, browse through a company app where you plan to interview and try to reverse-engineer the product data science steps that led to that feature release. For instance, let's open the Airbnb app and one of the first features we see is wishlist, that heart in the top-right of a listing picture. So let's use it as an example.
Wishlist feature
[Insights] Why did that company want to test that feature in the first place? What was the hypothesis? Which data would support that hypothesis?
Airbnb booking process is time consuming since it involves checking and comparing a lot of unique properties. Streamlining the booking process would lead to a better user experience which would then lead to higher conversion rate. If people can save and organize the listings they like, it will be easier for them to compare them and choose their favorite -> higher conversion rate.
Possible data supporting this hypothesis (just one of them is more than enough to justify
the test):
1. Looking at user click-stream data, the same user is visiting the same listing multiple times and each time they revisit the same listing, it requires several steps to find it again (search again, filter, next page results, etc). If there is a large number of cross-device sessions
(mobile, home laptop, work laptop), the problem would be even worse since it would be very hard to find the same listing on a new device.
If the number of steps required to perform a given action is negatively correlated to conversion rate (very likely),
less steps to do the same thing would lead to higher conversion rate. Wishlist would make it very easy to revisit the same listing.
2. Users are already essentially doing what the new feature does, but in a convoluted way. Data supporting this hypothesis would be, for instance: the same user is visiting the same listings multiple times coming from gmail as source (-> they saved them on their email) or direct traffic (they bookmarked them on Chrome or in any other way). Or many tabs open at the same time during their search (-> they are using Chrome tabs as a way to collect the listings they are interested in).
Simplifying something they are already doing is the best proxy for demand
and it almost always leads to higher conversion.
3. Users visiting more than X pages or spending more than X days when looking for a specific trip have lower conversion rate. And this is particularly true in markets with very large supply. There is a sort of optimal threshold for pages visited/days spent, above which the experience is so poor that the user gives up (-> lower conversion rate). Wishlist could act as a filter that allows users to only focus on few listings.
[Metrics] Which metric did they choose for the test?
From the previous points, number of pages visited to book (or time to book) should go down. As usual, it is better if it is a threshold-based metric (percentage of bookings with < X pages visited). And conversion rate should go up.
If the test hypothesis turns out to be wrong, we should be able to also see it from those two metrics. If pages visited go up and conversion goes down, it is probably a sign that the new feature is widely used, but unfortunately it is distracting users from the main goal of the site (conversion), instead of facilitating it. If users don't care about the new feature, the two metrics should simply be flat.
[A/B Testing] How was the test designed?
I skip the usual stuff about sample size here because it is always the same. Beside that, as long as one of the two metrics is significantly better and the other is not worse, we can call this test a winner. That is, reducing pages visited to book with at least the same conversion rate OR better conversion rate with at least the same pages visited are both positive scenarios for the business.
It is always hard to test by randomly splitting users at Airbnb because users would access the same supply. If conversion rate goes up for the test group and/or they book much faster, they are also impacting the control group in the same market by taking away supply from them -> the two groups are not independent which invalidates the most important t-test assumption. So, most likely, this is a test by market.