Meta AI inside Whatsapp is currently implemented as a normal chat (reverse chronological order based on the last interaction). What data would you use to support an A/B test where it's implemented as a pinned chat by default (always at the top, no matter the last interaction timestamp)?

Meta AI inside Whatsapp is currently implemented as a normal chat (reverse chronological order based on the last interaction). What data would you use to support an A/B test where it's implemented as a pinned chat by default (always at the top, no matter the last interaction timestamp)?



Meta AI inside Whatsapp implementation is very smart considering the goal of firstly focusing on power users/tech savvy people. Its default position is not prominent at all, but if a user interacts with it (aka self-select themselves based on AI interest), it then shows up in the normal chats just like if it were a friend chat, which are ordered in reverse chronological order.

So, effectively, the more you use it, the more prominent it is going to be at the top of the screen. As discussed extensively in the previous questions, this is meant to maximize questions per user from a small subset of power/tech savvy users.

Implementing it as a pinned chat would remove most of the current friction and shift the focus towards maximizing adoption from casual users. Obviously, there are many things to consider before doing something of that magnitude (e.g. AI quality metrics, management willingness to push AI, PR impact on very rare but very bad answers, etc.). However, here we are looking at it from a purely DS perspective: is there at least demand already from the broader user base to attempt such a bold test?

This test would make sense if we could find in the current data proxies for the fact that our AI product is so good that it's able to convert casual AI users into repeat AI users. If that's true, then any change that facilitates its adoption by casual users could make sense.

This is not going to be easy because, based on the current UI design, there won't be many casual users interacting with the feature in the first place. However, this is not causal inference and we don't need to be statistically sound. This is the insight step, where we just need to find enough evidence to build a hypothesis that will be then validated via a statistically sound test.

Firstly, find casual users. A way to identify them could be by looking at the way they interact with Meta AI at first. That is, casual users tend to ask simple and short questions, they ask questions about many different topics, and in general their behaviour is more a proxy for curiosity with the new product vs actually trying to get the best out of it.

Now track them for a couple of weeks and check which percentage of them turn into power users. It's incredibly unlikely that they will keep asking the same sorts of questions for weeks, since by definition they are getting little value out of it. So, see what percentage of them start using AI as a power user after a couple of weeks, e.g. they ask rare or complicated questions, specific topics, follow up questions, etc. If that percentage is high enough, you can make a case that there is going to be value in removing friction since your product is already good enough to turn a relevant percentage of casual users into power users.

Also, you want to look at the current interaction between the pinned chat Whatsapp feature and Meta AI. How many users decided to pin on their own Meta AI (after all they can already do it now if they want to)? What percentage of users, after pinning, decided to unpin? Did their engagement increase even more after pinning? By how much? How many of them had started as casual users? Obviously, if a significant percentage of users started as casual users -> became AI power users -> decided to pin Meta AI, the test is a no-brainer (but this is really unlikely).

All this pin-related data will likely be very biased. After all, for a user to choose to pin Meta AI on their own already, they must be super power users. But still, it can be very useful data, especially to prove the opposite, e.g. that it is too early for such a test. If even power users aren't pinning Meta AI, it's a strong sign that it might be too early to push AI so much. Or, even more extremely, if you notice that many users are archiving the Meta AI chat, then it's definitely too early too.

Complete and Continue