Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tag table SQL migration scripts #131

Merged
merged 4 commits into from
Feb 1, 2023

Conversation

Arkatufus
Copy link
Contributor

@Arkatufus Arkatufus commented Dec 14, 2022

  • Split scripts into steps
    • 1_Migration_Setup.sql : setup table and functions/procedures, if they didn't exist
    • 2_Migration.sql : the actual conversion process
    • 3_Post_Migration_Cleanup.sql : deletes functions/procedures
    • DROP_Migration.sql : deletes both table and functions/procedures
  • Remove any transaction inside the script, scripts are idempotent
  • All scripts are designed as idempotent UPSERT operation
  • Special script with DROP prefix to restart

@Arkatufus Arkatufus marked this pull request as draft December 14, 2022 05:12
@Arkatufus
Copy link
Contributor Author

Would these be good enough for migration purposes?

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some thoughts

@@ -0,0 +1,46 @@
DROP TABLE IF EXISTS [dbo].[TagTable];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general thought here - doing something destructive like dropping the TagTable... It might be better to fail the operation altogether and force the user to decide what to do in the event that they try to run this script twice. We can move the destructive / cleanup bits to a separate script that they can run explicitly that way. Buuuuuut for the sake of getting something into the repo that we can test now? I think this looks like it checks all the boxes.

GO

BEGIN TRANSACTION
INSERT INTO dbo.TagTable(ordering_id, tag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other way we could go about making these scripts safe to run multiple times is rather than delete / recreate we can try the following idempotent pattern:

  • create if not exists (for tables, functions)
  • for each item to be added to the tag table, do an upsert (insert if not exist, update row if does exist)

That way running the script multiple times would be safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually instead of doing an upsert, we could just skip the insert since the data is immutable -therefore no change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO these should be grabbing batches of items rather than trying to do the whole table. Otherwise for large tables this could run for some unknown time before erroring out with a transaction size issue. (Also in general batching will make it easier to run 'on the fly' without locking too much, especially for something like MSSQL.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive how rusty my SQL experience is here, but you're talking about doing pagination with cursors right?

Copy link
Member

@to11mtm to11mtm Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not cursors... the overall flow I was imagining (at least, for 0 downtime migration) was along the lines of

  1. deploy everything in 'hybrid' mode (where we write both old style tags and new tags)
  2. Once everything is deployed in that mode, run a script in the background, psuedo-sql:
declare @moveUpTo = Select max(ordering) from journal_table 
where (tags is not null and strlen(tags)>0)
and ordering not in (select ordering from journal_tag_table);

declare @startAtOrdering = 0;

while (@startAtOrdering < @moveUpTo)
begin
     insert into journal tag table (select tags from journal_table where orderingId >= @startAtOrdering && orderingId < @startAtOrdering+1000); --Obviously need more transform here
     @startAtOrdering = @startAtOrdering+1000;
end

  1. Re-deploy in 'tag-table-only' mode

  2. (optional) run a clean-up script to get rid of the old tags in the rows. but TBH only if you really need the storage back, since in cloud the storage space gained back may or may not be worth the RUs of re-writing everything and indexes getting kershuffled.
    you might need to think about how you do the transactions here... but hopefully gets the point across.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deploy everything in 'hybrid' mode (where we write both old style tags and new tags)

That makes sense and I think this can be done easily since we're not getting rid of the old tag column - does that make sense to you @Arkatufus ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once everything is deployed in that mode, run a script in the background, psuedo-sql:

I see what you mean: breaking up the inserts into transactions with groups of N "tag rows" at a time. I think that should be doable.

@Arkatufus
Copy link
Contributor Author

Last changes:

  • Split scripts into steps
  • Drop any transactions inside the script
  • All scripts are designed as idempotent UPSERT operation
  • Special script with DROP prefix to restart

@Arkatufus Arkatufus marked this pull request as ready for review December 19, 2022 17:37
Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now but need to implement @to11mtm 's feedback on subsequent PRs

@Aaronontheweb Aaronontheweb merged commit cc87494 into akkadotnet:dev Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants