Solution: Ads Analysis
Ads Analysis - Solution
Data
Let’s read the dataset and see the variables we have
require(dplyr)
require(ggplot2)
#read the file
data = read.csv("https://drive.google.com/uc?export=download&id=1wB3x3bPIX7C19Rmse1bpZtYBmVcrPDGa")
dim(data)
[1] 2115 7
head(data)
date shown clicked converted avg_cost_per_click total_revenue ad
1 2015-10-01 65877 2339 43 0.90 641.62 ad_group_1
2 2015-10-02 65100 2498 38 0.94 756.37 ad_group_1
3 2015-10-03 70658 2313 49 0.86 970.90 ad_group_1
4 2015-10-04 69809 2833 51 1.01 907.39 ad_group_1
5 2015-10-05 68186 2696 41 1.00 879.45 ad_group_1
6 2015-10-06 66864 2617 46 0.98 746.48 ad_group_1
#make it a date
data$date=as.Date(data$date)
summary(data)
date shown clicked converted avg_cost_per_click total_revenue ad
Min. :2015-10-01 Min. : 0 Min. : 0 Min. : 0.0 Min. :0.000 Min. : -200.2 ad_group_1 : 53
1st Qu.:2015-10-14 1st Qu.: 28030 1st Qu.: 744 1st Qu.: 18.0 1st Qu.:0.760 1st Qu.: 235.5 ad_group_11: 53
Median :2015-10-27 Median : 54029 Median : 1392 Median : 41.0 Median :1.400 Median : 553.3 ad_group_12: 53
Mean :2015-10-27 Mean : 68300 Mean : 3056 Mean : 126.5 Mean :1.374 Mean : 1966.5 ad_group_13: 53
3rd Qu.:2015-11-09 3rd Qu.: 97314 3rd Qu.: 3366 3rd Qu.: 103.0 3rd Qu.:1.920 3rd Qu.: 1611.5 ad_group_15: 53
Max. :2015-11-22 Max. :192507 Max. :20848 Max. :1578.0 Max. :4.190 Max. :39623.7 ad_group_16: 53
(Other) :1797
Data looks weird. For instance, there is negative revenue that doesn’t make much sense. Let’s clean the data a bit. Here we will remove impossible data. In a real world situation, we would try to get to the bottom of this to figure out where the bad data are coming from.
#Revenue cannot be negative
paste("There are", nrow(subset(data, total_revenue<0)), "events with negative revenue")
[1] "There are 4 events with negative revenue"
#Remove those
data = subset(data, !total_revenue<0)
#Also, clicked should be >= shown and converted should be >= clicked. Let's see:
paste("There are", nrow(subset(data, shown<clicked | clicked<converted)), "events where the funnel doesn't make any sense")
[1] "There are 0 events where the funnel doesn't make any sense"
#Finally, there are a few zeros that seem weird, considering that avg values are very high. Let's plot and see:
ggplot(data, aes(y=shown, x=date, colour=ad, group=ad)) +
geom_line(show.legend = FALSE) +
ggtitle("Ad impressions")
Those sudden zeros definitely look weird. Let’s get rid of them and then check ads clicks.
#remove zero impression data
data = subset(data, shown>0)
#now check clicks
ggplot(data, aes(y=clicked, x=date, colour=ad, group=ad)) +
geom_line(show.legend = FALSE) +
ggtitle("Ad clicks")