Overview of probability coding questions
Probability coding questions
Beside SQL-like questions, it might be possible to be asked probability coding questions. That is, a probability puzzle to be solved via coding. This is not frequent, but not impossible either. I'd say that there is a 10-15% chance of encountering this kind of questions while going through a data science interview at a large tech company.
These questions are very different from the previous ones. Firstly, they can only be solved in R or Python, not SQL. And most importantly, they essentially check your ability to code after being told what to do. You don't have to think about how to get the answer. The question itself has already all the steps and you need to translate that into coding. A classical question could be:
- There are 100 candies and you are playing the following game: you pick one coin and flip it. If you get head, you eat two candies. If you get tail, you eat one candy. What's the expected number of times you need to flip the coin before eating all the candies?
As you can see, the question itself is always telling you already how to get to the answer. The challenge is translating that into coding. And almost always these questions involve four things:
A for loop repeating the game multiple times, at each loop you save the result, and finally take the average. This will give you the final answer
A while loop that defines when the game stops
A function that randomly samples at some point, which is needed to simulate the randomness in the game
A sequence of if-elses. The rules of the game are almost always just a bunch of if-elses. If this happens -> do this, else -> do this
So, despite the fact that these are probability questions, there is essentially no probability knowledge required to solve them. A way to look at them is: they almost give you pseudo-code and you need to translate that into a given programming language. Btw actual probability questions to be solved mathematically are even rarer and almost completely out of fashion at large tech companies (as of 2022).
Whether you should spend time beforehand on preparing for these questions depends on several factors. Personally, I think a sound approach is to go through this section of the course and get a sense of how they work. No need to spend too much time on these though. Then, when you get an interview at a given company, ask HR about what to expect, which you should always do anyway. If HR says that these questions are included in the recruiting process, spend some time specifically preparing for them.
Note that from a coding syntax perspective, these questions are completely different from the common data science coding questions. There is no dplyr/groupby and mostly they are about very foundational coding concepts, again loops and if-else mostly. So preparing for the standard data science questions won't necessarily prepare you for these ones too. At the same time though, these questions check for very basic CS knowledge. If you are already familiar with basic CS concepts, it won't be too hard to prepare for them. Again, no matter what, coding questions in data science interviews will never be too hard.