-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace snowplow_web with a base that can be compatible with mobile events (close #45) #44
Replace snowplow_web with a base that can be compatible with mobile events (close #45) #44
Conversation
dce41d8
to
5641277
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just skimmed a few things for normal gotchas, can you also make sure to update the changelog as part of this PR please?
models/base/manifest/snowplow_media_player_base_quarantined_sessions.sql
Outdated
Show resolved
Hide resolved
models/base/manifest/snowplow_media_player_base_sessions_lifecycle_manifest.sql
Show resolved
Hide resolved
models/base/scratch/bigquery/snowplow_media_player_base_events_this_run.sql
Outdated
Show resolved
Hide resolved
@@ -1,4 +1,4 @@ | |||
"media_id","media_label","duration","media_type","media_player_type","play_time_min","avg_play_time_min","first_play","last_play","plays","valid_plays","complete_plays","impressions","avg_playback_rate","play_rate","completion_rate_by_plays","avg_percent_played","avg_retention_rate","last_base_tstamp","_10_percent_reached","_25_percent_reached","_50_percent_reached","_75_percent_reached","_100_percent_reached" | |||
html-dbt,dbt Coalesce 2021 - Data modeling at Scale,1887.0,video,org.whatwg-media_element,39.333333333333336,9.833333333333332,2022-01-18 11:56:27.815,2022-01-18 12:11:12.690,4,2,1,4,1.0,1.0,0.25,0.3126656067832538,0.25,2022-01-20 19:13:21.293,1,1,1,1,2 | |||
yt-dbt-coalesce-2021,dbt Coalesce 2021 - Data modeling at Scale,1887.0,video,com.youtube-youtube,55.06666666666667,1.4882882882882882,2022-01-18 21:23:57.381,2022-01-20 19:13:21.293,37,5,0,39,1.0270270270270272,0.9487179487179487,0.0,0.04732236210773571,0.002702702702702702,2022-01-20 19:13:21.293,2,2,1,0,4 | |||
yt-dbt-coalesce-2022,dbt Coalesce 2022 - Data modeling at Scale 2,1889.0,video,com.youtube-youtube,,,2999-01-01 00:00:00.000,2000-01-01 00:00:00.000,0,0,0,1,2.0,,,,,2022-01-20 19:17:13.125,0,0,0,0,0 | |||
yt-dbt-coalesce-2022,dbt Coalesce 2022 - Data modeling at Scale 2,1889.0,video,com.youtube-youtube,,,,,0,0,0,1,2.0,,,,,2022-01-20 19:17:13.125,0,0,0,0,0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has the expected value changed here? If this is a replacement for the existing method should it not result in the same output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a mystery to me, I don't understand why the expected output was different for the different warehouses. Now all the warehouses have the same expected data so I removed the distinction between expected outputs by warehouse.
It must have been a side effect of using the snowplow_web
as the base before compare to the custom base now. But I don't know what specifically could have caused it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The different output is usually down to just slight differences between the processing approaches by warehouse (particularly when it comes to rounding or formatting). The loss of these values feels like something we should dive into and understand though, I suspect they are default values from somewhere (either a min/max default based on the warehouse, or we provided some defaults).
I'd be wary to release with these removed as many people may be filtering on these already and rely on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have solved this mystery on Slack, it was a problem with the least
and greatest
functions giving different output (null or the max/min value) depending on whether null was present in the incremental run. I have added a nullif
around the least
and greatest
functions to always return null in case there are no matched events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good, mostly just a few nitpicky things, and some possible options to simplify/unify some stuff. Let me know if anything I've said is unclear because it might only make sense to me!
macros/snowplow_media_player_base_events_this_run/playback_quality_col.sql
Show resolved
Hide resolved
macros/snowplow_media_player_base_events_this_run/media_id_col.sql
Outdated
Show resolved
Hide resolved
models/base/scratch/snowplow_media_player_base_new_event_limits.sql
Outdated
Show resolved
Hide resolved
models/base/scratch/snowflake/snowplow_media_player_base_events_this_run.sql
Outdated
Show resolved
Hide resolved
models/base/scratch/default/snowplow_media_player_base_events_this_run.sql
Show resolved
Hide resolved
models/base/scratch/default/snowplow_media_player_base_events_this_run.sql
Outdated
Show resolved
Hide resolved
models/base/scratch/bigquery/snowplow_media_player_base_events_this_run.sql
Outdated
Show resolved
Hide resolved
models/base/scratch/databricks/snowplow_media_player_base_events_this_run.sql
Outdated
Show resolved
Hide resolved
macros/snowplow_media_player_base_events_this_run/source_url_col.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much there!
Thanks @rlh1994, these were super useful comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Just make sure to squash all the commit please |
c9a08c7
to
c35de0a
Compare
Should be good to merge this @matus-tomlein |
c35de0a
to
c854310
Compare
Issure #45
Description & motivation
This PR takes out the snowplow_web package as the base and uses a custom one taken from the ecommerce package. The motivation for this is to be able to process mobile events as well.
We do not yet support mobile events in this PR. We can only generate mobile events from the new media schemas which are not yet supported by the package, so I will add support for mobile later, when adding support for the new media schemas. I just wanted to prepare for this in advance.
Along the way, I refactored the
base_events_this_run
models to reuse some logic across warehouses using macros.Checklist