Skip to content

Commit

Permalink
Add unique media identifier (close #59)
Browse files Browse the repository at this point in the history
  • Loading branch information
georgewoodhead committed Nov 9, 2023
1 parent dfb1e8b commit ef4b3f0
Show file tree
Hide file tree
Showing 25 changed files with 536 additions and 499 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ The package contains multiple staging models however the mart models are as foll
|------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| snowplow_media_player_base | A table summarizing media player events by media and pageview including impressions. |
| snowplow_media_player_plays_by_pageview | A view summarizing media plays by media on a pageview level. |
| snowplow_media_player_media_stats | An aggregated table of media metrics on a media_id level. |
| snowplow_media_player_media_stats | An aggregated table of media metrics on a media_identifier level. |
| snowplow_media_player_media_ad_views | A view summarizing each ad viewed within a media playback (only for v2 schemas, see above). |
| snowplow_media_player_media_ads | An aggregated table of ad metrics for each ad played within each media content (only for v2 schemas, see above). |

Expand Down
6 changes: 5 additions & 1 deletion docs/markdown/snowplow_media_player_common_cols.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@
A UUID for each event e.g. `c6ef3124-b53a-4b13-a233-0088f79dcbcb`.
{% enddocs %}

{% docs col_media_identifier %}
The surrogate key generated from `media_id`, `media_label`, `media_type` and `media_player_type` to create a unique media element identifier.
{% enddocs %}

{% docs col_media_id %}
The unique identifier of a specific media element. It is the `player_id` in case of YouTube and `html_id` in case of HTML5.
{% enddocs %}

{% docs col_play_id %}
The surrogate key generated from `page_view_id` and `media_id `to create a unique play event identifier.
The surrogate key generated from `page_view_id`, `media_id`, `media_label`, `media_type` and `media_player_type` to create a unique play event identifier.
{% enddocs %}

{% docs col_page_view_id %}
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"media_ad_id","platform","media_id","media_label","ad_id","name","creative_id","duration_secs","skippable","pod_position","views","clicked","skipped","percent_reached_25","percent_reached_50","percent_reached_75","percent_reached_100","views_unique","clicked_unique","skipped_unique","percent_reached_25_unique","percent_reached_50_unique","percent_reached_75_unique","percent_reached_100_unique","first_view","last_view"
"a81d5e4d9d7a690ae0fc5290d1ec9e47","web","79f2746f6bdb98f5a33f8085fc4f0eb1","For bigger fun","a825644f-3882-8dcc-5e20-c38007bbf5b4","Ad 1","ad-1",4,TRUE,1,45,4,15,44,37,34,29,12,4,7,12,12,12,12,"2022-10-08 11:09:04.093","2023-08-04 13:47:32.374"
"4798308e0b04b09b20d5aacda0e8bb05","mob","79f2746f6bdb98f5a33f8085fc4f0eb1","For bigger fun","a825644f-3882-8dcc-5e20-c38007bbf5b4","Ad 1","ad-1",4,TRUE,1,45,4,15,44,37,34,29,12,4,7,12,12,12,12,"2022-10-08 11:09:04.093","2023-08-04 13:47:32.374"
"0010be8c76d6c38c7848ed734fb35c2e","mob","79f2746f6bdb98f5a33f8085fc4f0eb1","For bigger fun","0c446fce-b41a-f88d-81b8-e73774bec5fe","Ad 2","ad-2",4,TRUE,2,44,4,26,34,21,18,18,12,4,10,11,9,8,8,"2022-10-08 11:09:05.102","2023-08-04 13:47:33.383"
"4c219b7a1d4ef3402d139434b0acee3d","web","79f2746f6bdb98f5a33f8085fc4f0eb1","For bigger fun","0c446fce-b41a-f88d-81b8-e73774bec5fe","Ad 2","ad-2",4,TRUE,2,44,4,26,34,21,18,18,12,4,10,11,9,8,8,"2022-10-08 11:09:05.102","2023-08-04 13:47:33.383"
"media_ad_id","platform","media_identifier","media_id","media_label","ad_id","name","creative_id","duration_secs","skippable","pod_position","views","clicked","skipped","percent_reached_25","percent_reached_50","percent_reached_75","percent_reached_100","views_unique","clicked_unique","skipped_unique","percent_reached_25_unique","percent_reached_50_unique","percent_reached_75_unique","percent_reached_100_unique","first_view","last_view"
"9f908a0c2b3bb2fd51cdc1a520fba611","web","27b2f50383eb0e8faf585690ace0183d","79f2746f6bdb98f5a33f8085fc4f0eb1","For bigger fun","a825644f-3882-8dcc-5e20-c38007bbf5b4","Ad 1","ad-1",4,TRUE,1,45,4,15,44,37,34,29,12,4,7,12,12,12,12,"2022-10-08 11:09:04.093","2023-08-04 13:47:32.374"
"332af56fd8310d89ece56017630f2f94","mob","27b2f50383eb0e8faf585690ace0183d","79f2746f6bdb98f5a33f8085fc4f0eb1","For bigger fun","a825644f-3882-8dcc-5e20-c38007bbf5b4","Ad 1","ad-1",4,TRUE,1,45,4,15,44,37,34,29,12,4,7,12,12,12,12,"2022-10-08 11:09:04.093","2023-08-04 13:47:32.374"
"023a782dee3ff73e552d7db33b884e7f","mob","27b2f50383eb0e8faf585690ace0183d","79f2746f6bdb98f5a33f8085fc4f0eb1","For bigger fun","0c446fce-b41a-f88d-81b8-e73774bec5fe","Ad 2","ad-2",4,TRUE,2,44,4,26,34,21,18,18,12,4,10,11,9,8,8,"2022-10-08 11:09:05.102","2023-08-04 13:47:33.383"
"adbe6f518cd4a97a770bc0ec5f34b1fd","web","27b2f50383eb0e8faf585690ace0183d","79f2746f6bdb98f5a33f8085fc4f0eb1","For bigger fun","0c446fce-b41a-f88d-81b8-e73774bec5fe","Ad 2","ad-2",4,TRUE,2,44,4,26,34,21,18,18,12,4,10,11,9,8,8,"2022-10-08 11:09:05.102","2023-08-04 13:47:33.383"
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"media_id","media_label","duration_secs","media_type","media_player_type","first_play","last_play","plays","valid_plays","complete_plays","impressions","last_base_tstamp","percent_reached_10","percent_reached_25","percent_reached_50","percent_reached_75","percent_reached_100","play_time_mins","avg_play_time_mins","avg_content_watched_mins","avg_playback_rate","avg_percent_played","play_rate","completion_rate_by_plays","avg_retention_rate"
"html-dbt","dbt Coalesce 2021 - Data modeling at Scale",1887,"video","org.whatwg-media_element","2022-01-18 11:56:27.815","2022-01-18 12:11:12.69",4,2,1,4,"2022-01-20 19:17:13.125",1,1,1,1,2,39.333,9.833,,1,0.313,1,0.25,0.25
"yt-dbt-coalesce-2022","dbt Coalesce 2022 - Data modeling at Scale 2",1889,"video","com.youtube-youtube",,,0,0,0,1,"2022-01-20 19:17:13.125",0,0,0,0,0,0,,,,,0,,
"yt-dbt-coalesce-2021","dbt Coalesce 2021 - Data modeling at Scale",1887,"video","com.youtube-youtube","2022-01-18 21:23:57.381","2022-01-20 19:13:21.293",37,5,0,39,"2022-01-20 19:17:13.125",2,2,1,0,4,55.067,1.488,,1.027,0.047,0.949,0,0.003
"79f2746f6bdb98f5a33f8085fc4f0eb1","For bigger fun",60,"video","html5","2022-10-08 11:09:03.425","2023-08-04 13:47:32.066",40,18,14,25,"2023-08-04 13:47:32.066",34,24,16,14,14,24.043,0.601,0.488,1.047,0.488,1.6,0.35,0.355
"media_identifier","media_id","media_label","duration_secs","media_type","media_player_type","first_play","last_play","plays","valid_plays","complete_plays","impressions","last_base_tstamp","percent_reached_10","percent_reached_25","percent_reached_50","percent_reached_75","percent_reached_100","play_time_mins","avg_play_time_mins","avg_content_watched_mins","avg_playback_rate","avg_percent_played","play_rate","completion_rate_by_plays","avg_retention_rate"
"ae9ad2ba4e69068fa1807c9f41d9235a","html-dbt","dbt Coalesce 2021 - Data modeling at Scale",1887,"video","org.whatwg-media_element","2022-01-18 11:56:27.815","2022-01-18 12:11:12.69",4,2,1,4,"2022-01-20 19:17:13.125",1,1,1,1,2,39.333,9.833,,1,0.313,1,0.25,0.25
"9b71f6dbf74a346e7da60a5a744b362d","yt-dbt-coalesce-2022","dbt Coalesce 2022 - Data modeling at Scale 2",1889,"video","com.youtube-youtube",,,0,0,0,1,"2022-01-20 19:17:13.125",0,0,0,0,0,0,,,,,0,,
"68efef79c990d2f2542070769fbda051","yt-dbt-coalesce-2021","dbt Coalesce 2021 - Data modeling at Scale",1887,"video","com.youtube-youtube","2022-01-18 21:23:57.381","2022-01-20 19:13:21.293",37,5,0,39,"2022-01-20 19:17:13.125",2,2,1,0,4,55.067,1.488,,1.027,0.047,0.949,0,0.003
"27b2f50383eb0e8faf585690ace0183d","79f2746f6bdb98f5a33f8085fc4f0eb1","For bigger fun",60,"video","html5","2022-10-08 11:09:03.425","2023-08-04 13:47:32.066",40,18,14,25,"2023-08-04 13:47:32.066",34,24,16,14,14,24.043,0.601,0.488,1.047,0.488,1.6,0.35,0.355
3 changes: 3 additions & 0 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,7 @@ seeds:
+column_types:
play_id: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
page_view_id: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
media_identifier: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
media_id: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
media_label: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
session_identifier: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
Expand Down Expand Up @@ -261,6 +262,7 @@ seeds:

snowplow_media_player_media_stats_expected:
+column_types:
media_identifier: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
media_id: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
media_label: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
duration_secs: float
Expand Down Expand Up @@ -291,6 +293,7 @@ seeds:
+column_types:
media_ad_id: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
platform: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
media_identifier: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
media_id: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
media_label: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
domain_userid: '{{ "string" if target.type in ["bigquery", "databricks", "spark"] else "varchar" }}'
Expand Down
2 changes: 2 additions & 0 deletions models/base/scratch/base_scratch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ models:
description: '{{ doc("col_session_identifier") }}'
- name: domain_userid
description: '{{ doc("col_domain_userid") }}'
- name: media_identifier
description: '{{ doc("col_media_identifier") }}'
- name: media_id
description: '{{ doc("col_media_id") }}'
- name: media_label
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,8 +170,9 @@ with prep as (
select
coalesce(
p.media_session_id,
{{ dbt_utils.generate_surrogate_key(['p.page_view_id', 'p.media_id' ]) }}
{{ dbt_utils.generate_surrogate_key(['p.page_view_id', 'p.media_id', 'p.media_label', 'p.media_type', 'p.media_player_type']) }}
) as play_id,
{{ dbt_utils.generate_surrogate_key(['p.media_id', 'p.media_label', 'p.media_type', 'p.media_player_type']) }} as media_identifier,
p.*,

coalesce(cast(piv.weight_rate * p.duration_secs / 100 as {{ type_int() }}), 0) as play_time_secs,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -156,8 +156,9 @@ with prep AS (
select
coalesce(
p.media_session_id,
{{ dbt_utils.generate_surrogate_key(['p.page_view_id', 'p.media_id' ]) }}
{{ dbt_utils.generate_surrogate_key(['p.page_view_id', 'p.media_id', 'p.media_label', 'p.media_type', 'p.media_player_type']) }}
) as play_id,
{{ dbt_utils.generate_surrogate_key(['p.media_id', 'p.media_label', 'p.media_type', 'p.media_player_type']) }} as media_identifier,
p.* except (percent_progress),

cast(p.percent_progress as integer) as percent_progress,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -363,10 +363,11 @@ where
)

select
coalesce(
coalesce(
p.media_session_id,
{{ dbt_utils.generate_surrogate_key(['p.page_view_id', 'p.media_id' ]) }}
) play_id,
{{ dbt_utils.generate_surrogate_key(['p.page_view_id', 'p.media_id', 'p.media_label', 'p.media_type', 'p.media_player_type']) }}
) as play_id,
{{ dbt_utils.generate_surrogate_key(['p.media_id', 'p.media_label', 'p.media_type', 'p.media_player_type']) }} as media_identifier,
p.*,
coalesce(cast(round(piv.weight_rate * p.duration_secs / 100) as {{ type_int() }}), 0) as play_time_secs,
coalesce(cast(case when p.is_muted then round(piv.weight_rate * p.duration_secs / 100) end as {{ type_int() }}), 0) as play_time_muted_secs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,9 @@ with prep as (
select
coalesce(
p.media_session_id,
{{ dbt_utils.generate_surrogate_key(['p.page_view_id', 'p.media_id' ]) }}
{{ dbt_utils.generate_surrogate_key(['p.page_view_id', 'p.media_id', 'p.media_label', 'p.media_type', 'p.media_player_type']) }}
) as play_id,
{{ dbt_utils.generate_surrogate_key(['p.media_id', 'p.media_label', 'p.media_type', 'p.media_player_type']) }} as media_identifier,
p.* exclude (percent_progress),

cast(p.percent_progress as integer) as percent_progress,
Expand Down
4 changes: 2 additions & 2 deletions models/custom/snowplow_media_player_session_stats.sql
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ with prep as (
session_identifier,
domain_userid,
count(*) as impressions,
count(distinct case when media_type = 'video' and is_played then media_id end) as videos_played,
count(distinct case when media_type = 'audio' and is_played then media_id end) as audio_played,
count(distinct case when media_type = 'video' and is_played then media_identifier end) as videos_played,
count(distinct case when media_type = 'audio' and is_played then media_identifier end) as audio_played,
sum(case when media_type = 'video' and is_played then 1 else 0 end) as video_plays,
sum(case when media_type = 'audio' and is_played then 1 else 0 end) as audio_plays,
sum(case when media_type = 'video' and is_valid_play then 1 else 0 end) as valid_video_plays,
Expand Down
2 changes: 2 additions & 0 deletions models/media_ad_views/media_ad_views.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ models:
- not_null
- name: platform
description: '{{ doc("col_platform")}}'
- name: media_identifier
description: '{{ doc("col_media_identifier") }}'
- name: media_id
description: '{{ doc("col_media_id") }}'
- name: media_label
Expand Down
2 changes: 2 additions & 0 deletions models/media_ad_views/scratch/base_scratch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ models:
- not_null
- name: platform
description: '{{ doc("col_platform")}}'
- name: media_identifier
description: '{{ doc("col_media_identifier") }}'
- name: media_id
description: '{{ doc("col_media_id") }}'
- name: media_label
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,17 @@ with
events_this_run as (

select * from {{ ref('snowplow_media_player_base_events_this_run') }}
where ad_id is not null and media_id is not null
where ad_id is not null and media_identifier is not null

)

, prep as (

select
{{ dbt_utils.generate_surrogate_key(['ev.platform', 'ev.media_id', 'ev.ad_id']) }} as media_ad_id,
{{ dbt_utils.generate_surrogate_key(['ev.platform', 'ev.media_identifier', 'ev.ad_id']) }} as media_ad_id,

ev.platform,
ev.media_identifier,
ev.media_id,
max(ev.media_label) as media_label,
ev.domain_userid,
Expand Down Expand Up @@ -66,7 +67,7 @@ events_this_run as (

from events_this_run as ev

group by 1, 2, 3, 5, 6, 7, 8, 9, 12
group by 1, 2, 3, 4, 6, 7, 8, 9, 10, 13

)

Expand Down
2 changes: 2 additions & 0 deletions models/media_ads/media_ads.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ models:
- not_null
- name: platform
description: '{{ doc("col_platform")}}'
- name: media_identifier
description: '{{ doc("col_media_identifier") }}'
- name: media_id
description: '{{ doc("col_media_id")}}'
- name: media_label
Expand Down
7 changes: 5 additions & 2 deletions models/media_ads/snowplow_media_player_media_ads.sql
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ new_media_ad_views as (
a.media_ad_id,

a.platform,
a.media_identifier,
a.media_id,
max(a.media_label) as media_label,

Expand Down Expand Up @@ -93,7 +94,7 @@ new_media_ad_views as (

from new_media_ad_views a

group by 1, 2, 3, 5
group by 1, 2, 3, 4, 6

)

Expand Down Expand Up @@ -139,6 +140,7 @@ new_media_ad_views as (
a.media_ad_id,

a.platform,
a.media_identifier,
a.media_id,
max(a.media_label) as media_label,

Expand Down Expand Up @@ -170,7 +172,7 @@ new_media_ad_views as (

from all_data a

group by 1, 2, 3, 5
group by 1, 2, 3, 4, 6

)

Expand All @@ -180,6 +182,7 @@ new_media_ad_views as (
a.media_ad_id,

a.platform,
a.media_identifier,
a.media_id,
a.media_label,

Expand Down
2 changes: 2 additions & 0 deletions models/media_base/media_base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ models:
- not_null
- name: page_view_id
description: '{{ doc("col_page_view_id") }}'
- name: media_identifier
description: '{{ doc("col_media_identifier") }}'
- name: media_id
description: '{{ doc("col_media_id") }}'
- name: media_label
Expand Down
2 changes: 2 additions & 0 deletions models/media_base/scratch/base_scratch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ models:
- not_null
- name: page_view_id
description: '{{ doc("col_page_view_id") }}'
- name: media_identifier
description: '{{ doc("col_media_identifier") }}'
- name: media_id
description: '{{ doc("col_media_id") }}'
- name: media_label
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_utils.get_value_by_target_type(bigquery_val=["media_id"]),
cluster_by=snowplow_utils.get_value_by_target_type(bigquery_val=["media_identifier"]),
sort = 'start_tstamp',
dist = 'play_id',
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
Expand All @@ -35,6 +35,7 @@ events_this_run as (
select
i.play_id,
i.page_view_id,
i.media_identifier,
i.media_id,
i.media_label,
i.session_identifier,
Expand Down Expand Up @@ -67,7 +68,7 @@ events_this_run as (

from events_this_run as i

group by 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18, 19
group by 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 20

)

Expand Down Expand Up @@ -140,7 +141,7 @@ events_this_run as (
, duration_fix as (

select
f.media_id,
f.media_identifier,
max(f.duration_secs) as duration_secs

from events_this_run as f
Expand All @@ -156,6 +157,7 @@ events_this_run as (
select
d.play_id,
d.page_view_id,
d.media_identifier,
d.media_id,
d.media_label,
d.session_identifier,
Expand Down Expand Up @@ -237,7 +239,7 @@ left join retention_rate as r
on r.play_id = d.play_id

left join duration_fix as f
on f.media_id = d.media_id
on f.media_identifier = d.media_identifier

left join media_sessions as s
on s.media_session_id = d.play_id
Expand Down
2 changes: 1 addition & 1 deletion models/media_base/snowplow_media_player_base.sql
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ You may obtain a copy of the Snowplow Personal and Academic License Version 1.0
"field": "start_tstamp",
"data_type": "timestamp"
}, databricks_val='start_tstamp_date'),
cluster_by=snowplow_utils.get_value_by_target_type(bigquery_val=["media_id"]),
cluster_by=snowplow_utils.get_value_by_target_type(bigquery_val=["media_identifier"]),
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt')),
tblproperties={
'delta.autoOptimize.optimizeWrite' : 'true',
Expand Down
2 changes: 2 additions & 0 deletions models/media_plays/media_plays.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ models:
- not_null
- name: page_view_id
description: '{{ doc("col_page_view_id") }}'
- name: media_identifier
description: '{{ doc("col_media_identifier") }}'
- name: media_id
description: '{{ doc("col_media_id") }}'
- name: media_label
Expand Down
4 changes: 3 additions & 1 deletion models/media_stats/media_stats.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@ models:
- name: snowplow_media_player_media_stats
description: '{{ doc("table_media_stats") }}'
columns:
- name: media_id
- name: media_identifier
description: The primary key of this table
tags:
- primary-key
tests:
- unique
- not_null
- name: media_id
description: '{{ doc("col_media_id") }}'
- name: media_label
description: '{{ doc("col_media_label") }}'
- name: duration_secs
Expand Down
Loading

0 comments on commit ef4b3f0

Please sign in to comment.