R based marketing attribution and basic budget optimization using Markov Chains
R based marketing attribution and campaign budget optimization using Markov Chains
All of the concepts described shortly below have a corresponding links in the reference section. You can find there all articles and papers that I think are doing particularly great job in introducing and explaining this topic.
Marketing attribution is trying to answer one of the key questions in advertisement world: how precisely my campaigns and actions contributed to the results I see? In more formal definition:
attribution is the identification of a set of user actions (“events” or “touchpoints”) that contribute in some manner to a desired outcome, and then the assignment of a value to each of these events. Marketing attribution provides a level of understanding of what combination of events in what particular order influence individuals to engage in a desired behavior, typically referred to as a conversion.
Original, popular approach tries to solve this problem with set of heuristics: attributing all the conversions to the last or first touchpoint (last-touch or first-touch), dividing the glory equally among channels (linear attribution) or giving more credit to more recent one (time decay attribution). Below you can find a short visual summary and a reference in the last section discussing them in detail.
While all of them are relatively easy to understand and implement (and above all better than nothing) they are also too simple to be true. Attribution problem can however be answered with more accurate data-driven models including:
Following repository answers the attribution challenge using the popular Markov Chain.
Markov Chain essentially translate series of events into set of states (events itself) and transition probabilities between them (chance of moving from one event to another or staying in the current event).
In our marketing attribution problem Markov Chain relates to the concept of the customer journey. Each touchpoint (online ad, landing page etc.) represents the state with the conversion or no-conversion being the final outcome of the journey. Based on the cookie level data tracking the customer actions online we can calculate the transition probabilities between each touchpoint. Final outcome of this transition matrix can be represented as a markov graph.
Finally attribution itself is calculated within Markov Chain by removal effect. Explaining it simply:
Removal effect principle says that if we want to find the contribution of each channel in the customer journey, we can do so by removing each channel and see how many conversions are happening without that channel being in place
Greatest article describing implementation of Markov Chain in R is definitely coming from Sergey’s Bryl Analyze Core. My main code in markov_chain_attribution.R is building on his main steps. You can test it out yourself using sample databases - description for them is included in their separate folder.
Markov Chain for attribution is calculated on cookie level dataset containing time information. There are two most important distinct part of the code:
Firstly after loading the campaign data we have to create a paths (touchpoints journey) for each distinct cookie that have ended with conversion or not - which is important information to include. The end results will create a string of touchpoints (ex. Channel1 > Channel5 > Channel7 > Conversion).
### Prepare the files - Split Paths ----
df_split = campaign_data %>%
group_by(cookie) %>%
arrange(time) %>%
mutate(path_no = ifelse(is.na(lag(cumsum(conversion))), 0, lag(cumsum(conversion))) + 1) %>%
ungroup() %>%
mutate(path_id = paste0(cookie, path_no))
### Prepare the file - Create the paths ----
df_paths = df_split %>%
group_by(path_id) %>%
arrange(time) %>%
summarise(path = paste(channel, collapse = ">"),
total_conversions = sum(conversion)) %>%
ungroup() %>%
mutate(null_conversion = ifelse(total_conversions == 1, 0, 1)) # path that have not led to conversion
Then we can calculate the actual Markov Chain results using Channel Attribution package available in R. There are several information that we need to indicate within a function.
We can also calculate heuristics based results in order to compare them with Markov Chain attribution.
### Markov Chain and Heuristic Models ----
markov_attribution <- markov_model(df_paths,
var_path = "path",
var_conv = "total_conversions",
var_value = NULL,
order = 2, # higher order markov chain
var_null = null_conversion,
out_more = TRUE)
heuristic_attribution <- heuristic_models(df_paths,
var_path = "path",
var_conv = "total_conversions")
Following graphs show conversion attributed to channels by Markov model and comparison of its results with heuristics approach. All visualizations code can be found in results_visualization together with dataset necessary to run them.
Based on Markov Chain attribution results we can calculate not only attributed conversion but also more better allocation of the campaign budget. In order to do so we will calculate Return on Ad Spend using the following formula
ROAS = Channel Conversion Weight / Channel Budget Weight
Channel Conversion Weight and Channel Budget Weight are ration of Channel’s Conversion to Total Conversions and Channel Cost to total Campaigns Cost. ROAS > 100% indicates that channel is undervalued. Having this metric in place we can simply move to campaign budget recommendation through:
Proposed budget = Current budget x ROAS
All of this we calculate by merging aggregated budget data with the results of markov chain modelling:
# Aggregate budget
campaign_budget_total = as.data.table(
campaign_budget_daily %>%
group_by(channel) %>%
summarise(total_cost = round(sum(cost), 1))
)
# Join into final results
campaign_attribution = merge(all_model_results, campaign_budget_total,
by.x = "channel_name", by.y = "channel")
#### Calculate ROAS and CPA
campaign_attribution =
campaign_attribution %>%
mutate(chanel_weight = (total_conversions / sum(total_conversions)),
cost_weight = (total_cost / sum(total_cost)),
roas = chanel_weight / cost_weight,
optimal_budget = total_cost * roas,
CPA = total_cost / total_conversions)
Graph below will show us the comparison of current versus recommended budget based on the markov-driven allocation. Instagram seems to be undervalued as a channel contrary to Online Video and Paid Search.
Customer journey not ending with conversion can last for dozens of days (limited by cookie lifetime) and have many different touchpoint. We may want to decide to break it after X touchpoints or days following specific business logic.
You may want try to validate the Markov results through accuracy measures based on prediction results to compare it with other allocation measurement methods.
As Sergey’s mention in his post the unique channel paths are undervalued by default in current calculation method. You may want to double check this impact and calculate markov results separately for one and multi-channel path.
Above all - if you should read only one thing it would be two posts from Analyze Core blog by Sergey Bryl
Part 1 - Introduction to the topic in digital marketing context
Part 2 - Great R implementation of Markov Chain
R Libraries that can be applier to this problem
Additional Resources
Whitepapers