-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import duckdb-replicator #6067
base: main
Are you sure you want to change the base?
Import duckdb-replicator #6067
Conversation
return fmt.Errorf("rename: unable to replicate new table: %w", err) | ||
} | ||
|
||
// TODO :: fix this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to think solutions for this issue. Few solutions that are on top of my mind:
- Use a single metadata file for entire database so that all operations are atomically done but need to think about concurrent write contentions.
- Use the
meta.json
to storerenamed_from_table
andrenamed_from_table_version
so that any table with the given version can be skipped even if delete fails.
This is an important fix but it should be okay to fix this later IMO. As of now most renames are from staging table to main table and table are always created with replace so it should be okay if staging table is not properly deleted during renames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to avoid a single metadata file for the entire database – it would make concurrent writes trickier and add complexity because the file structure is no longer sufficient metadata.
I agree this doesn't need to be solved now (worst case is some garbage old tables that are not used). I'm not sure if I understand your option 2 correctly, maybe this is what you mean, but IMO the way to solve this problem would be:
- Before rename, update the old table's
meta.json
to haverenaming_to_table: new_name
andrenaming_to_version: new_version
. - Copy the table to the
new_name
andnew_version
- Delete the old table. If this fails, during a later garbage collection (or "vacuum" or whatever you want to call it), if we see a table with
renaming_to_table
where that table and version actually exists, we can garbage collect the old table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah something similar. I meant I would add renamed_from_table
and renamed_from_table_version
to the new meta.json
and if we encounter a case where that table and version exists we skip that table and version and garbage collect that.
So essentially it is keeping the track at new table vs old table.
Both are fine IMO. Depending upon where we keep it we need to make sure that tables are processed in that order(either created_version
asc or created_version
desc).
No description provided.