-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tag table SQL migration scripts #131
Conversation
Arkatufus
commented
Dec 14, 2022
•
edited
Loading
edited
- Split scripts into steps
- 1_Migration_Setup.sql : setup table and functions/procedures, if they didn't exist
- 2_Migration.sql : the actual conversion process
- 3_Post_Migration_Cleanup.sql : deletes functions/procedures
- DROP_Migration.sql : deletes both table and functions/procedures
- Remove any transaction inside the script, scripts are idempotent
- All scripts are designed as idempotent UPSERT operation
- Special script with DROP prefix to restart
Would these be good enough for migration purposes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some thoughts
scripts/SQLSERVER_Migration.sql
Outdated
@@ -0,0 +1,46 @@ | |||
DROP TABLE IF EXISTS [dbo].[TagTable]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general thought here - doing something destructive like dropping the TagTable... It might be better to fail the operation altogether and force the user to decide what to do in the event that they try to run this script twice. We can move the destructive / cleanup bits to a separate script that they can run explicitly that way. Buuuuuut for the sake of getting something into the repo that we can test now? I think this looks like it checks all the boxes.
scripts/SQLSERVER_Migration.sql
Outdated
GO | ||
|
||
BEGIN TRANSACTION | ||
INSERT INTO dbo.TagTable(ordering_id, tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other way we could go about making these scripts safe to run multiple times is rather than delete / recreate we can try the following idempotent pattern:
- create if not exists (for tables, functions)
- for each item to be added to the tag table, do an upsert (insert if not exist, update row if does exist)
That way running the script multiple times would be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually instead of doing an upsert, we could just skip the insert since the data is immutable -therefore no change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO these should be grabbing batches of items rather than trying to do the whole table. Otherwise for large tables this could run for some unknown time before erroring out with a transaction size issue. (Also in general batching will make it easier to run 'on the fly' without locking too much, especially for something like MSSQL.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive how rusty my SQL experience is here, but you're talking about doing pagination with cursors right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not cursors... the overall flow I was imagining (at least, for 0 downtime migration) was along the lines of
- deploy everything in 'hybrid' mode (where we write both old style tags and new tags)
- Once everything is deployed in that mode, run a script in the background, psuedo-sql:
declare @moveUpTo = Select max(ordering) from journal_table
where (tags is not null and strlen(tags)>0)
and ordering not in (select ordering from journal_tag_table);
declare @startAtOrdering = 0;
while (@startAtOrdering < @moveUpTo)
begin
insert into journal tag table (select tags from journal_table where orderingId >= @startAtOrdering && orderingId < @startAtOrdering+1000); --Obviously need more transform here
@startAtOrdering = @startAtOrdering+1000;
end
-
Re-deploy in 'tag-table-only' mode
-
(optional) run a clean-up script to get rid of the old tags in the rows. but TBH only if you really need the storage back, since in cloud the storage space gained back may or may not be worth the RUs of re-writing everything and indexes getting kershuffled.
you might need to think about how you do the transactions here... but hopefully gets the point across.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deploy everything in 'hybrid' mode (where we write both old style tags and new tags)
That makes sense and I think this can be done easily since we're not getting rid of the old tag column - does that make sense to you @Arkatufus ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once everything is deployed in that mode, run a script in the background, psuedo-sql:
I see what you mean: breaking up the inserts into transactions with groups of N "tag rows" at a time. I think that should be doable.
Last changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for now but need to implement @to11mtm 's feedback on subsequent PRs