- Nov 2020
We then estimate the relative weight each touch played in leading to a conversion. This estimation is done by allocating “points” to touches: each conversion is worth exactly one point, and that point is divvied up between the customer’s touches. There are four main ways to divvy up this point:First touch: Attribute the entire conversion to the first touchLast touch: Attribute the entire conversion to the last touchForty-twenty-forty: Attribute 40% (0.4 points) of the attribution to the first touch, 40% to the last touch, and divide the remaining 20% between all touches in betweenLinear: Divide the point equally among all touches
[[positional attribution]] works by identifying the touch points in the lifecycle, and dividing up the points across those touches.
There are four main ways to divvy up this pointing
[[question]] What are the four main ways to divvy up positional attribution]]
- [[first touch]]
- [[last touch]]
Once you have pageviews in your warehouse, you’ll need to do two thingsSessionization: Aggregate these pageviews into sessions (or “sessionization”) writing logic to identify gaps of 30 minutes or more.User stitching: If a user first visits your site without any identifying information (typically a `customer_id` or `email`), and then converts at a later date, their previous (anonymous) sessions should be updated to include their information. Your web tracking system should have a way to link these sessions together.This modeling is pretty complex, especially for companies with thousands of pageviews a day (thank goodness for incremental models 🙌). Fortunately, some very smart coworkers have written packages to do the heavy lifting for you, whether your page views are tracked with Snowplow, Segment or Heap. Leverage their work by installing the right package to transform the data for you.
[[1. Gather your required data sources]] - once we have data, we need to do two things [[sessionization]] - the aggregation of pageviews / etc into a session
and [[user stitching]] - when we have a user without any identifying information, and then converts - kind of like the anonymous users / signups - and trying to tie them back to a source
1. Gather your required data sourcesSessions:Required dbt techniques: packagesWe want to use a table that represents every time a customer interacts with our brand. For ecommerce companies, the closest thing we can get to for this is sessions. (If you’re instead working for a B2B organization, you should consider using a table of interactions between your sales team and a potential customer from your CRM).Sessions are discrete periods of activity by a user on a website. The industry standard is to define a session as a series of activities followed by a 30-minute window without any activity.
[[1. Gather your required data sources]]
How to build an attribution model
[[How to build an attribution model]]
- [[1. Gather your required data sources]]
- [[2. Find all sessions before conversion]]
- [[3. Calculate the total sessions and the session index]]
- [[3. Allocate points]]
- [[4. Bonus Join in revenue value]]
- [[5. Bonus Join with ad spend data]]
- [[6. Ship it!]]
The attribution data modelIn reality, it’s impossible to know exactly why someone converted to being a customer. The best thing that we can do as analysts, is provide a pretty good guess. In order to do that, we’re going to use an approach called positional attribution. This means, essentially, that we’re going to weight the importance of various touches (customer interactions with a brand) based on their position (the order they occur in within the customer’s lifetime).To do this, we’re going to build a table that represents every “touch” that someone had before becoming a customer, and the channel that led to that touch.
One of the goals of an [[attribution data model]] is to understand why someone [[converted]] to being a customer. This is impossible to do accurately, but this is where analysis comes in.
There are some [[approaches to attribution]], one of those is [[positional attribution]]
[[positional attribution]] is that we are weighting the importance of touch points - or customer interactions, based on their position within the customer lifetime.
transparent attribution model. You’re not relying on vendor logic. If your sales team feels like your attribution is off, show them dbt docs, walk them through the logic of your model, and make modifications with a single line of SQL
[[transparent attribution model]]
The most flexible attribution model. You own the business logic and you can extend it however you want, and change it easily when you business changes
[[flexible attribution model]]
hat’s it. Really! By writing SQL on top of raw data you get: The cheapest attribution model. This playbook assumes you’re operating within a modern data stack , so you already have the infrastructure that you need in place: You’re collecting events data with a tool like Snowplow or Segment (though Segment might get a little pricey) You’re extracting data from ad platforms using Stitch or Fivetran You’re loading data into a modern, cloud data warehouse like Snowflake, BigQuery, or Redshift And you’re using dbt so your analysts can model data in SQL
[[cheapest attribution model]]
So what do you actually need to build an attribution model?Raw data in your warehouse that represents customer interactions with your brand. For ecommerce companies, this is website visits. For B2B customers, it might be conversations with sales teams.SQL
to build an [[attribution model]] we need the raw data - this raw data should capture the [[customer interactions]], and in our case - also partner interactions, or people working with the partner?