Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import duckdb-replicator #6067

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

begelundmuller
Copy link
Contributor

No description provided.

runtime/pkg/duckdbreplicator/README.md Outdated Show resolved Hide resolved
runtime/pkg/duckdbreplicator/backup.go Outdated Show resolved Hide resolved
runtime/pkg/duckdbreplicator/backup.go Outdated Show resolved Hide resolved
runtime/pkg/duckdbreplicator/backup.go Outdated Show resolved Hide resolved
runtime/pkg/duckdbreplicator/backup.go Outdated Show resolved Hide resolved
runtime/pkg/duckdbreplicator/db.go Outdated Show resolved Hide resolved
runtime/pkg/duckdbreplicator/db.go Outdated Show resolved Hide resolved
runtime/pkg/duckdbreplicator/db.go Outdated Show resolved Hide resolved
runtime/pkg/duckdbreplicator/io.go Outdated Show resolved Hide resolved
runtime/pkg/duckdbreplicator/singledb.go Outdated Show resolved Hide resolved
return fmt.Errorf("rename: unable to replicate new table: %w", err)
}

// TODO :: fix this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think solutions for this issue. Few solutions that are on top of my mind:

  1. Use a single metadata file for entire database so that all operations are atomically done but need to think about concurrent write contentions.
  2. Use the meta.json to store renamed_from_table and renamed_from_table_version so that any table with the given version can be skipped even if delete fails.

This is an important fix but it should be okay to fix this later IMO. As of now most renames are from staging table to main table and table are always created with replace so it should be okay if staging table is not properly deleted during renames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to avoid a single metadata file for the entire database – it would make concurrent writes trickier and add complexity because the file structure is no longer sufficient metadata.

I agree this doesn't need to be solved now (worst case is some garbage old tables that are not used). I'm not sure if I understand your option 2 correctly, maybe this is what you mean, but IMO the way to solve this problem would be:

  1. Before rename, update the old table's meta.json to have renaming_to_table: new_name and renaming_to_version: new_version.
  2. Copy the table to the new_name and new_version
  3. Delete the old table. If this fails, during a later garbage collection (or "vacuum" or whatever you want to call it), if we see a table with renaming_to_table where that table and version actually exists, we can garbage collect the old table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah something similar. I meant I would add renamed_from_table and renamed_from_table_version to the new meta.json and if we encounter a case where that table and version exists we skip that table and version and garbage collect that.
So essentially it is keeping the track at new table vs old table.
Both are fine IMO. Depending upon where we keep it we need to make sure that tables are processed in that order(either created_version asc or created_version desc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants