-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unique media identifier (close #59) #61
Add unique media identifier (close #59) #61
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a changelog entry for this and we'll build it up as we go please, this release is likely to have a lot of changes/breaking changes so it would be good to keep track as we go
4ddb2be
to
0b53ccf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing that might be worth doing, although I appreciate it would be a pain, would be to add (or alter an existing) event to the test data with the same media_id but a different e.g. media_label. This should have caused a failure in the old version but would then pass correctly with these fixes - that ensures we've actually fixed the issue AND stops us making the same error in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, but it's quite confusing to have both the media_id
and media_identifier
in the derived tables. It's not obvious what is the difference between them, how do we explain it to users which one to use?
My suggestion would be to replace the media_id
with the media_identifier
(basically update and rename this macro).
If people are interested in the original player_id value from the YouTube or HTML5 context entity, maybe we can provide that as a separate field? I don't like the name player_id
because it doesn't really identify the player, more the content. Perhaps content_id
would be better? Or can we let users take the value directly from the context entities if they want?
Yeah I get it is confusing. I think it is important we continue to expose the player ids from YouTube and HTML5 events in some way as these may be used as filters or join keys (I like your idea of naming it Another option that came to mind, using the web package as an example, when we made the session identifier configurable we kept this called So to me we have the options:
WDYT? |
I think option 2, but maybe we call it player_id as that is more closely aligned to the original variable name in most cases from the context? |
Agreed, I like option 2! |
Thanks both, I've made the changes for option 2! I've also changed the int tests to cover the cases of content sharing the same media label across different media types or media_player_ types. I fully expect the pr tests to not pass first time as only tested locally with snowflake 🤞 This change adds |
Need to add a unique primary key to the |
cd925c5
to
d396916
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great to see both issues fixed!
d396916
to
6363007
Compare
87ddb6b
into
release/snowplow-media-player/0.7.0
Add unique media identifier.
Description & motivation
Generate a more robust unique media element identifier using a surrogate key generated from
media_id
,media_label
,media_type
andmedia_player_type
. This fixes the chance of duplicate media ids in the media stats table (close #59).Checklist