Skip to content

Commit

Permalink
Add support for new Snowplow media event and entity schemas on Web an…
Browse files Browse the repository at this point in the history
…d mobile (close #49)
  • Loading branch information
matus-tomlein committed Aug 21, 2023
1 parent c854310 commit b553a65
Show file tree
Hide file tree
Showing 74 changed files with 15,567 additions and 1,954 deletions.
34 changes: 33 additions & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,15 +56,30 @@ vars:
snowplow__enable_whatwg_media: false
# set to true if the HTML5 video element context schema is enabled
snowplow__enable_whatwg_video: false
snowplow__enable_media_player_v1: false
snowplow__enable_media_player_v2: true
snowplow__enable_media_session: true
snowplow__enable_media_ad: false
snowplow__enable_media_ad_break: false
snowplow__enable_web_events: true
snowplow__enable_mobile_events: true
snowplow__enable_ad_quartile_event: true
snowplow__app_id: []

# Variables - Warehouse Specific
snowplow__media_player_event_context: 'com_snowplowanalytics_snowplow_media_player_event_1'
snowplow__media_player_context: 'com_snowplowanalytics_snowplow_media_player_1'
snowplow__media_player_v2_context: 'com_snowplowanalytics_snowplow_media_player_2'
snowplow__media_session_context: 'com_snowplowanalytics_snowplow_media_session_1'
snowplow__media_ad_context: 'com_snowplowanalytics_snowplow_media_ad_1'
snowplow__media_ad_break_context: 'com_snowplowanalytics_snowplow_media_ad_break_1'
snowplow__media_ad_quartile_event: 'com_snowplowanalytics_snowplow_media_ad_quartile_event_1'
snowplow__youtube_context: 'com_youtube_youtube_1'
snowplow__html5_media_element_context: 'org_whatwg_media_element_1'
snowplow__html5_video_element_context: 'org_whatwg_video_element_1'
snowplow__context_web_page: 'com_snowplowanalytics_snowplow_web_page_1'
snowplow__context_screen: 'com_snowplowanalytics_mobile_screen_1'
snowplow__context_mobile_session: 'com_snowplowanalytics_snowplow_client_session_1'
snowplow__derived_tstamp_partitioned: true
snowplow__query_tag: 'snowplow_dbt'
snowplow__enable_load_tstamp: true
Expand All @@ -86,7 +101,15 @@ models:
+materialized: view
base:
manifest:
+schema: 'snowplow_manifest'
+schema: "snowplow_manifest"
bigquery:
+enabled: "{{ target.type == 'bigquery' | as_bool() }}"
databricks:
+enabled: "{{ target.type in ['databricks', 'spark'] | as_bool() }}"
default:
+enabled: "{{ target.type in ['redshift', 'postgres'] | as_bool() }}"
snowflake:
+enabled: "{{ target.type == 'snowflake' | as_bool() }}"
scratch:
+schema: 'scratch'
+tags: 'scratch'
Expand Down Expand Up @@ -114,3 +137,12 @@ models:
+schema: 'scratch'
+tags: 'snowplow_media_player_incremental'
+enabled: false
media_ad_views:
+schema: 'derived'
+tags: 'snowplow_media_player_incremental'
scratch:
+schema: 'scratch'
+tags: 'scratch'
media_ads:
+schema: 'derived'
+tags: 'snowplow_media_player_incremental'
24 changes: 24 additions & 0 deletions docs/markdown/snowplow_media_player_atomic_docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@
This context table contains the `page_view_id` associated with an event.
{% enddocs %}

{% docs table_screen_context %}
This context table contains the screen view ID associated with a mobile event.
{% enddocs %}

{% docs table_client_session_context %}
This context table contains user and session identifiers associated with mobile events.
{% enddocs %}

{% docs table_media_player_event %}
The table specifying the media player event type (e.g. playing, seek) and the label given for the media for user friendly identification.
{% enddocs %}
Expand All @@ -10,6 +18,22 @@ The table specifying the media player event type (e.g. playing, seek) and the la
This context table contains a set of entities that are common between media events across platforms.
{% enddocs %}

{% docs table_media_session_context %}
This context table contains context entities for media player events that track sessions of media player usage (a media session is one video playback).
{% enddocs %}

{% docs table_media_ad_context %}
This context table contains context entities with information about the currently played ad.
{% enddocs %}

{% docs table_media_ad_break_context %}
This context table contains context entities that are added to all ad events belonging to an ad break.
{% enddocs %}

{% docs table_media_ad_quartile_event %}
This table contains self-describing event data fired when a quartile of ad is reached after continuous ad playback at normal speed.
{% enddocs %}

{% docs table_youtube_context %}
The context table with data specific to embedded YouTube videos.
{% enddocs %}
Expand Down
212 changes: 202 additions & 10 deletions docs/markdown/snowplow_media_player_common_cols.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Name of operating system e.g. `Android`.
Client operating system timezone e.g. `Europe/London`.
{% enddocs %}

{% docs col_duration %}
{% docs col_duration_secs %}
Total length of media in seconds e.g. it's a 5:32 youtube video so the duration is 332 seconds.
{% enddocs %}

Expand Down Expand Up @@ -98,16 +98,82 @@ The `derived_tstamp` denoting the time when the event started.
The `derived_tstamp` denoting the time when the last media player event belonging to the specific level of aggregation (e.g.: page_view by media) started.
{% enddocs %}

{% docs col_play_time_sec %}
Estimated duration of play in seconds. It is calculated using the percent_progress events that are fired during play. In case such an event is fired, it is assumed that the total section of the media in between the previous and current percent_progress is played through, even if the user seeks to another point in time within the audio / video. The more often these events are tracked (e.g. every 5% of the media's length) the more accurate the calculation becomes.
{% docs col_play_time_secs %}
Total seconds user spent playing content (excluding linear ads). If the media session entity is tracked with media events, the information is read from there. This is an accurate measurement provided by the tracker.

If the media session entity is not tracked, the value is estimated. It is calculated using the percent_progress events that are fired during play. In case such an event is fired, it is assumed that the total section of the media in between the previous and current percent_progress is played through, even if the user seeks to another point in time within the audio / video. The more often these events are tracked (e.g. every 5% of the media's length) the more accurate the calculation becomes.
{% enddocs %}

{% docs col_paused_time_secs %}
Total seconds user spent with paused content (excluding linear ads).

This information is provided by the tracker in the media session context entity. If the entity is not available, it will not be computed.
{% enddocs %}

{% docs col_buffering_time_secs %}
Total seconds that playback was buffering.

Calculated on the tracker and provided through the media session context entity.
{% enddocs %}

{% docs col_ads_time_secs %}
Total seconds that ads played.

Calculated on the tracker and provided through the media session context entity.
{% enddocs %}

{% docs col_ads %}
Number of ads played.

Calculated on the tracker and provided through the media session context entity.
{% enddocs %}

{% docs col_ads_clicked %}
Number of ads that the user clicked on.

Calculated on the tracker and provided through the media session context entity.
{% enddocs %}

{% docs col_ads_skipped %}
Number of ads that the user skipped.

Calculated on the tracker and provided through the media session context entity.
{% enddocs %}

{% docs col_ad_breaks %}
Number of ad breaks played.

Calculated on the tracker and provided through the media session context entity.
{% enddocs %}

{% docs col_content_watched_secs %}
Total seconds of the content played. Each part of the content played is counted once (i.e., counts rewinding or rewatching the same content only once).

Calculated on the tracker and provided through the media session context entity.
{% enddocs %}

{% docs col_content_watched_percent %}
Percentage of the content played.

Calculated on the tracker and provided through the media session context entity.
{% enddocs %}

{% docs col_avg_play_time_min %}
{% docs col_avg_play_time_mins %}
Estimated average duration of plays in minutes.

If the media session context entity is tracked with events, the information is taken from there which makes it more accurate since it is calculated on the tracker.
{% enddocs %}

{% docs col_avg_content_watched_mins %}
Average duration of the content played in minutes. Each part of the content played is counted once (i.e., counts rewinding or rewatching the same content only once).

Calculated on the tracker and provided through the media session context entity.
{% enddocs %}

{% docs col_avg_play_time_sec %}
Estimated average duration of plays in seconds.

If the media session context entity is tracked with events, the information is taken from there which makes it more accurate since it is calculated on the tracker.
{% enddocs %}

{% docs col_first_play %}
Expand Down Expand Up @@ -166,16 +232,18 @@ Time stamp for the event recorded by the collector e.g. `2013-11-26 00:02:05`.
The weight given for each percent progress reached used for the calculation of the play_time_sec_estimated field. It is based on the difference of the current and preciding percent_progress rate.
{% enddocs %}

{% docs col_play_time_sec_muted %}
Calculated duration of muted play in seconds. It is based on the percent_progress event and whether the user played it on mute during this event or not.
{% docs col_play_time_muted_secs %}
Total seconds user spent playing content on mute (excluding linear ads). If the media session entity is tracked with media events, the information is read from there. This is an accurate measurement provided by the tracker.

If the media session entity is not tracked, the value is estimated. It is based on the percent_progress event and whether the user played it on mute during this event or not.
{% enddocs %}

{% docs col_is_played %}
Pageviews with at least one play event.
{% enddocs %}

{% docs col_is_valid_play %}
A boolean value to show whether the duration of the play (`play_time_sec`) is bigger than or equal to the variable given in `snowplow__valid_play_sec` (defaulted to 30).
A boolean value to show whether the duration of the play (`play_time_secs`) is bigger than or equal to the variable given in `snowplow__valid_play_sec` (defaulted to 30).
{% enddocs %}

{% docs col_is_complete_play %}
Expand All @@ -202,11 +270,11 @@ The sum of all video plays that exceed the limit set within the variable `snowpl
The sum of all audio plays that exceeded the limit set within the variable `snowplow__valid_play_sec`, it is defaulted to 30 seconds.
{% enddocs %}

{% docs col_play_time_min %}
{% docs col_play_time_mins %}
Calculated duration of play in minutes.
{% enddocs %}

{% docs col_play_time_min_muted %}
{% docs col_play_time_muted_mins %}
Calculated duration of muted play in minutes. It is based on the percent_progress event and whether the user played it on mute during this event or not.
{% enddocs %}

Expand Down Expand Up @@ -238,7 +306,7 @@ The playback position of a specific media in seconds whenever a media player eve
The optional, human readable name given to tracked media content.
{% enddocs %}

{% docs col_avg_session_play_time_min %}
{% docs col_avg_session_play_time_mins %}
Estimated average duration of plays in seconds within a session.
{% enddocs %}

Expand Down Expand Up @@ -833,3 +901,127 @@ User-set “true timestamp” for the event e.g. ‘2013-11-26 00:02:04’
{% docs col_event_in_session_index %}
The index of the event in the corresponding session.
{% enddocs %}

{% docs col_media_ad_id %}
Generated identifier that identifies an ad (identified using the ad_id) played with a specific media (identified using the media_id) and on a specific platform (based on the platform property).
{% enddocs %}

{% docs col_ad_id %}
Unique identifier for the ad taken from the ad context entity.
{% enddocs %}

{% docs col_name %}
Friendly name of the ad taken from the ad context entity.
{% enddocs %}

{% docs col_creative_id %}
The ID of the ad creative taken from the ad context entity.
{% enddocs %}

{% docs col_ad_duration_secs %}
Length of the video ad in seconds as reported in the ad context entity.
{% enddocs %}

{% docs col_skippable %}
Indicating whether skip controls are made available to the end user (reported in the ad context entity).
{% enddocs %}

{% docs col_avg_pod_position %}
The average position of the ad within the ad break, starting with 1 (reported in the ad context entity).
{% enddocs %}

{% docs col_views %}
Number of total views on the ad (repeated views are counted as well).
{% enddocs %}

{% docs col_clicks %}
Number of total clicks on the ad (repeated clicks are counted as well).
{% enddocs %}

{% docs col_skips %}
Number of total times users skipped the ad (repeated skips are counted as well).
{% enddocs %}

{% docs col_25_percent_reached %}
Number of times users reached 25% of the ad playback (repeated views are counted as well).
{% enddocs %}

{% docs col_50_percent_reached %}
Number of times users reached 50% of the ad playback (repeated views are counted as well).
{% enddocs %}

{% docs col_75_percent_reached %}
Number of times users reached 75% of the ad playback (repeated views are counted as well).
{% enddocs %}

{% docs col_100_percent_reached %}
Number of times users watched the whole ad (repeated views are counted as well).
{% enddocs %}

{% docs col_first_view %}
Datetime of the first ad view.
{% enddocs %}

{% docs col_last_view %}
Datetime of the last ad view.
{% enddocs %}

{% docs col_ad_break_id %}
An identifier for the ad break (reported in the ad_break context entity).
{% enddocs %}

{% docs col_ad_break_name %}
Ad break name (e.g., pre-roll, mid-roll, and post-roll), reported in the ad_break context entity.
{% enddocs %}

{% docs col_ad_break_type %}
Type of ads within the break: linear (take full control of the video for a period of time), nonlinear (run concurrently to the video), companion (accompany the video but placed outside the player). Reported in the ad_break context entity.
{% enddocs %}

{% docs col_pod_position %}
The position of the ad within the ad break, starting with 1 (reported in the ad context entity).
{% enddocs %}

{% docs col_clicked %}
Whether the ad was clicked during this ad view.
{% enddocs %}

{% docs col_skipped %}
Whether the ad was skipped during this ad view.
{% enddocs %}

{% docs col_viewed_at %}
Datetime when the ad was viewed.
{% enddocs %}

{% docs col_last_event %}
Datetime of the last event.
{% enddocs %}

{% docs col_views_unique %}
Number of users that viewed the ad (identified by their domain_userid).
{% enddocs %}

{% docs col_clicked_unique %}
Number of users that clicked on the ad (identified by their domain_userid).
{% enddocs %}

{% docs col_skipped_unique %}
Number of users that skipped the ad (identified by their domain_userid).
{% enddocs %}

{% docs col_25_percent_reached_unique %}
Number of users that watched 25% of the ad (identified by their domain_userid).
{% enddocs %}

{% docs col_50_percent_reached_unique %}
Number of users that watched 50% of the ad (identified by their domain_userid).
{% enddocs %}

{% docs col_75_percent_reached_unique %}
Number of users that watched 75% of the ad (identified by their domain_userid).
{% enddocs %}

{% docs col_100_percent_reached_unique %}
Number of users that watched 100% of the ad (identified by their domain_userid).
{% enddocs %}
Loading

0 comments on commit b553a65

Please sign in to comment.