Recurring themes - Growth metrics

Growth metrics and questions about the previous value in general




By far, the most common coding questions involve LAG (R/SQL) or Shift (Python). Nowadays, it is almost impossible to go through a job interview at a top tech company and never been asked a question that requires LAG/Shift.


Growth Metrics


Growth metrics are typically defined as

(given month metric value - previous month metric value) / previous month metric value

or some small variation of that (such as different time window, making it a percentage, etc.). All year_over_year (or month_over_month) metrics also fall in this category since they are typically defined as: given month metric value / previous month metric value.

As soon as you hear a growth metric, you should know that at some point you will have to use the LAG window function (in SQL and R) and shift() in Python. The goal being having the value of that metric for the current time range (month or year typically) in one column and in another column the corresponding lagged value. Once you have these, you can create any growth metric.

There are obviously many exercises on this in the course, but just to give an idea, in its simplest version it would be something like below.


Example Growth Metrics




The subquery creates the lagged value via a window function. So then we have our metric (number of users here) for both the current month and the previous month in two different columns. At that point, we can calculate any growth metric in the outer query. If the lag had to be calculated by group, as usual you would need to add partition by inside the window function.



SELECT month, 
       100*(total_unique_users-total_unique_users_previous_month)/total_unique_users_previous_month AS growth_rate_pct
FROM (
      SELECT month, 
             total_unique_users, 
             LAG(total_unique_users, 1) OVER (ORDER BY month) AS total_unique_users_previous_month 
      FROM data
);





The first mutate creates the lagged value via the lag function. So then we have our metric (number of users here) for both the current month and the previous month in two different columns. At that point, we can calculate any growth metric in the following mutate. If the lag had to be calculated by group, as usual you would just need to group by before applying the lag function inside mutate.



#this calculates the lagged value by month
mutate(total_unique_users_previous_month = lag(total_unique_users, order_by = month)) %>%

#this calculates the actual growth metric according to the question, using current and lagged value
mutate(growth_rate_pct = 100*(total_unique_users-total_unique_users_previous_month)/total_unique_users_previous_month) 





Shift() creates the lagged value. So then we have our metric (number of users here) for both the current month and the previous month in two different columns. At that point, we can calculate any growth metric in the following step. If the shift had to be calculated by group, as usual you would just need to group by before applying the shift function.



#this calculates the lagged value by month
data = data.sort_values(by=['month'])
data['total_unique_users_previous_month'] = data['total_unique_users'].shift(1)

#this calculates the actual growth metric according to the question, using current and lagged value
data['growth_rate_pct']=100*(data['total_unique_users']-data['total_unique_users_previous_month'])/data['total_unique_users_previous_month']



Time difference by group



A variation of the growth metric question is when they ask for the time difference between two given actions by group (usually by user). It really could be the difference between anything, but the time difference is by far the most popular version of this. It is really the same as the growth metric question. You still need to create the lagged version of that metric (in this case time) and then the final result will be given by the difference between current and lagged column.



Keywords for this category of questions



  • Calculate Airbnb monthly revenue growth rate

  • Calculate Amazon month-over-month percentage change in revenue

  • Calculate the time difference between when a user signs-up up and buys for the first time

  • Calculate the time difference between when a user buys the first product and the tenth




By far, the most common coding questions involve LAG/Shift. Nowadays, it is almost impossible to go through a job interview at a top tech company and never been asked a question that requires LAG/Shift.




Complete and Continue