Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace snowplow_web with a base that can be compatible with mobile events (close #45) #44

Merged

Conversation

matus-tomlein
Copy link
Contributor

@matus-tomlein matus-tomlein commented Jul 5, 2023

Issure #45

Description & motivation

This PR takes out the snowplow_web package as the base and uses a custom one taken from the ecommerce package. The motivation for this is to be able to process mobile events as well.

We do not yet support mobile events in this PR. We can only generate mobile events from the new media schemas which are not yet supported by the package, so I will add support for mobile later, when adding support for the new media schemas. I just wanted to prepare for this in advance.

Along the way, I refactored the base_events_this_run models to reuse some logic across warehouses using macros.

Checklist

  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have raised a documentation PR if applicable (Link here if required)

Copy link
Contributor

@rlh1994 rlh1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimmed a few things for normal gotchas, can you also make sure to update the changelog as part of this PR please?

dbt_project.yml Outdated Show resolved Hide resolved
dbt_project.yml Outdated Show resolved Hide resolved
docs/markdown/snowplow_media_player_base_docs.md Outdated Show resolved Hide resolved
docs/markdown/snowplow_media_player_base_docs.md Outdated Show resolved Hide resolved
docs/markdown/snowplow_media_player_base_docs.md Outdated Show resolved Hide resolved
docs/markdown/snowplow_media_player_common_cols.md Outdated Show resolved Hide resolved
@rlh1994 rlh1994 changed the base branch from main to release/snowplow-media-player/0.6.0 July 5, 2023 12:38
@@ -1,4 +1,4 @@
"media_id","media_label","duration","media_type","media_player_type","play_time_min","avg_play_time_min","first_play","last_play","plays","valid_plays","complete_plays","impressions","avg_playback_rate","play_rate","completion_rate_by_plays","avg_percent_played","avg_retention_rate","last_base_tstamp","_10_percent_reached","_25_percent_reached","_50_percent_reached","_75_percent_reached","_100_percent_reached"
html-dbt,dbt Coalesce 2021 - Data modeling at Scale,1887.0,video,org.whatwg-media_element,39.333333333333336,9.833333333333332,2022-01-18 11:56:27.815,2022-01-18 12:11:12.690,4,2,1,4,1.0,1.0,0.25,0.3126656067832538,0.25,2022-01-20 19:13:21.293,1,1,1,1,2
yt-dbt-coalesce-2021,dbt Coalesce 2021 - Data modeling at Scale,1887.0,video,com.youtube-youtube,55.06666666666667,1.4882882882882882,2022-01-18 21:23:57.381,2022-01-20 19:13:21.293,37,5,0,39,1.0270270270270272,0.9487179487179487,0.0,0.04732236210773571,0.002702702702702702,2022-01-20 19:13:21.293,2,2,1,0,4
yt-dbt-coalesce-2022,dbt Coalesce 2022 - Data modeling at Scale 2,1889.0,video,com.youtube-youtube,,,2999-01-01 00:00:00.000,2000-01-01 00:00:00.000,0,0,0,1,2.0,,,,,2022-01-20 19:17:13.125,0,0,0,0,0
yt-dbt-coalesce-2022,dbt Coalesce 2022 - Data modeling at Scale 2,1889.0,video,com.youtube-youtube,,,,,0,0,0,1,2.0,,,,,2022-01-20 19:17:13.125,0,0,0,0,0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has the expected value changed here? If this is a replacement for the existing method should it not result in the same output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a mystery to me, I don't understand why the expected output was different for the different warehouses. Now all the warehouses have the same expected data so I removed the distinction between expected outputs by warehouse.

It must have been a side effect of using the snowplow_web as the base before compare to the custom base now. But I don't know what specifically could have caused it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different output is usually down to just slight differences between the processing approaches by warehouse (particularly when it comes to rounding or formatting). The loss of these values feels like something we should dive into and understand though, I suspect they are default values from somewhere (either a min/max default based on the warehouse, or we provided some defaults).

I'd be wary to release with these removed as many people may be filtering on these already and rely on them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have solved this mystery on Slack, it was a problem with the least and greatest functions giving different output (null or the max/min value) depending on whether null was present in the incremental run. I have added a nullif around the least and greatest functions to always return null in case there are no matched events.

@matus-tomlein matus-tomlein changed the title Replace snowplow_web with a base that is compatible with mobile events [WIP] Replace snowplow_web with a base that can be compatible with mobile events Jul 6, 2023
@matus-tomlein matus-tomlein marked this pull request as ready for review July 6, 2023 07:37
@matus-tomlein matus-tomlein changed the title Replace snowplow_web with a base that can be compatible with mobile events Replace snowplow_web with a base that can be compatible with mobile events (close #45) Jul 6, 2023
Copy link
Contributor

@rlh1994 rlh1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, mostly just a few nitpicky things, and some possible options to simplify/unify some stuff. Let me know if anything I've said is unclear because it might only make sense to me!

Copy link
Contributor

@rlh1994 rlh1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much there!

dbt_project.yml Outdated Show resolved Hide resolved
models/media_stats/snowplow_media_player_media_stats.sql Outdated Show resolved Hide resolved
dbt_project.yml Outdated Show resolved Hide resolved
@matus-tomlein
Copy link
Contributor Author

Thanks @rlh1994, these were super useful comments!

@matus-tomlein matus-tomlein requested a review from rlh1994 July 13, 2023 12:22
Copy link
Contributor

@rlh1994 rlh1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rlh1994
Copy link
Contributor

rlh1994 commented Jul 18, 2023

Just make sure to squash all the commit please

@rlh1994
Copy link
Contributor

rlh1994 commented Aug 18, 2023

Should be good to merge this @matus-tomlein

@matus-tomlein matus-tomlein merged commit 6865627 into release/snowplow-media-player/0.6.0 Aug 18, 2023
@matus-tomlein matus-tomlein deleted the issue/change_base branch August 18, 2023 16:27
matus-tomlein added a commit that referenced this pull request Sep 20, 2023
matus-tomlein added a commit that referenced this pull request Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants