Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Data Team Request form under SQL Migrations section #7

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jack-mcquown
Copy link

@jack-mcquown jack-mcquown commented Apr 5, 2024

Dependencies

Documentation

Description

Adds a new comment about submitting a Data Team Request for having a schema migration reflected in Snowflake.

This change is based on the conversation in this data team request related to getting a new column into a funnel Snowflake table.

Testing Considerations

SQL Migrations

Deployment

Versioning

Copy link
Contributor

@jonathansharman jonathansharman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple questions about this process change - curious to hear your thoughts, Jack, @ParryChen, and/or anyone else on @nicheinc/data-enablement!

@@ -33,12 +33,13 @@
<!--
Migration Script: <link to migration script file>
Description: <a description of the specific schema or data changes made by this migration script>

Keep this tag - the data enablement team will be notified via email of any changes: @nicheinc/data-enablement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓️ Wasn't the purpose of this tag to notify @nicheinc/data-enablement (which includes @ParryChen) that there's been a schema migration that might need to be reflected in Snowflake? I'm not necessarily opposed to submitting a Data Team Request in these cases, but it would be slightly more cumbersome for the PR author, vs. just linking the migration and uncommenting a tag.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey so I think the structure of the data-enablement team has changed a lot, where I'm not part of the data enablement team anymore and these changes are now being shifted to the DEs(whom get all their work tracked through the data team request form). So I think these make sense!

Comment on lines +40 to +41
<!-- Does this migration affect tables in Snowflake? If yes, submit a Data Team Request to have the schema migration reflected in Snowflake.
https://form.asana.com/?k=uNnfTyZZVsxifMCOIPdwiw&d=684757491145461 -->
Copy link
Contributor

@jonathansharman jonathansharman Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓️/🔧 Is the expectation that not all schema migrations that add/remove tables need to be reflected in Snowflake - and that it's up to the dev to decide on a per-migration basis? Whether we're tagging a GH team or opening an Asana ticket, IMO it should still be the Data Team's call, and in that case, I think it might make more sense for this callout to be grouped with the schema migration comment above. 🤔

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so schema changes don't necessarily need to happen in Snowflake if the affected columns/schemas only serve backend services and not Snowflake stakeholders.

I think we could get more eyes on this from backend and data leadership so they can decide what's needed and what's not

Copy link

@wmarshall wmarshall Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a strong reason not to say "all schema migrations should be reflected in snowflake, even if they're trivial"? I feel like I'm missing something here.

It seems like we're setting ourselves up for future problems if we decide there are migrations that don't need to be reflected in snowflake because:

  1. We can always be wrong when we make this decision - regardless of who is involved or how many people or automated systems we use
  2. Adding more layers of review will slow down velocity, either of updates to snowflake (best case) or actual deployments (will make everyone sad).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so the schema migration process in snowflake often times involves re-snapshotting a table to backfill data from added columns. This re-snapshotting process needs to happen manually if the column data is populated with default values, because the Postgres WAL doesn't identify those as DML, and Streamkap won't pick these up as new events. It often takes some time to process/takes snowflake compute resources that cost money.

So depending on how often schema migrations occur, @jack-mcquown said it can happen a lot for certain services. It may be inefficient to add all schema migrations to Snowflake. Unless we add some note in backend schema migrations to use DML to update new columns rather than using default values.

@jonathansharman
Copy link
Contributor

@jack-mcquown @wmarshall @ParryChen, this PR has been hanging out for a while. What are folks thinking here? Do we need to have a wider discussion between BE and DE to hash out these organizational details?

@ParryChen
Copy link

Providing some more context as we've learned more about how schema migrations work for Snowflake under Streamkap.

So basically, there are two instances where manual work is currently necessary on our side.

  • When a new table is added to a schema
  • When a new column is added to a table, and the column is set up with a default value.

We don't really have a way right now to automate the above instances. So that is a potential blocker for having all schema migrations be reflected in Snowflake(assuming there's a lot).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants