Import duckdb-replicator #6067

begelundmuller · 2024-11-08T16:19:02Z

No description provided.

runtime/pkg/duckdbreplicator/README.md

runtime/pkg/duckdbreplicator/backup.go

runtime/pkg/duckdbreplicator/db.go

runtime/pkg/duckdbreplicator/io.go

runtime/pkg/duckdbreplicator/singledb.go

runtime/pkg/duckdbreplicator/db.go

k-anshul · 2024-11-18T10:40:49Z

runtime/pkg/rduckdb/db.go

+		return fmt.Errorf("rename: unable to replicate new table: %w", err)
+	}
+
+	// TODO :: fix this


We need to think solutions for this issue. Few solutions that are on top of my mind:

Use a single metadata file for entire database so that all operations are atomically done but need to think about concurrent write contentions.

Use the meta.json to store renamed_from_table and renamed_from_table_version so that any table with the given version can be skipped even if delete fails.

This is an important fix but it should be okay to fix this later IMO. As of now most renames are from staging table to main table and table are always created with replace so it should be okay if staging table is not properly deleted during renames.

It would be nice to avoid a single metadata file for the entire database – it would make concurrent writes trickier and add complexity because the file structure is no longer sufficient metadata.

I agree this doesn't need to be solved now (worst case is some garbage old tables that are not used). I'm not sure if I understand your option 2 correctly, maybe this is what you mean, but IMO the way to solve this problem would be:

Before rename, update the old table's meta.json to have renaming_to_table: new_name and renaming_to_version: new_version.

Copy the table to the new_name and new_version

Delete the old table. If this fails, during a later garbage collection (or "vacuum" or whatever you want to call it), if we see a table with renaming_to_table where that table and version actually exists, we can garbage collect the old table.

Yeah something similar. I meant I would add renamed_from_table and renamed_from_table_version to the new meta.json and if we encounter a case where that table and version exists we skip that table and version and garbage collect that.
So essentially it is keeping the track at new table vs old table.
Both are fine IMO. Depending upon where we keep it we need to make sure that tables are processed in that order(either created_version asc or created_version desc).

Import

279d207

begelundmuller assigned k-anshul Nov 8, 2024

begelundmuller commented Nov 8, 2024

View reviewed changes

begelundmuller requested a review from k-anshul November 8, 2024 18:35

Remove go.mod

05a0603

begelundmuller commented Nov 8, 2024

View reviewed changes

runtime/pkg/duckdbreplicator/db.go Outdated Show resolved Hide resolved

k-anshul added 5 commits November 11, 2024 19:10

use single local directory

13653fd

use metadata.json for each table

e9a8c6c

use semaphore instead of mutex for write locks

0acae1c

local db monitor

9a6f9e6

small fixes

eac6d1b

k-anshul reviewed Nov 18, 2024

View reviewed changes

k-anshul added 3 commits November 20, 2024 22:05

non blocking read handle updates

09424ba

use tableMeta plus minor fix

3b0eee7

small cleanups

50660ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import duckdb-replicator #6067

Import duckdb-replicator #6067

begelundmuller commented Nov 8, 2024

k-anshul Nov 18, 2024

begelundmuller Nov 19, 2024

k-anshul Nov 19, 2024

Import duckdb-replicator #6067

Are you sure you want to change the base?

Import duckdb-replicator #6067

Conversation

begelundmuller commented Nov 8, 2024

k-anshul Nov 18, 2024

Choose a reason for hiding this comment

begelundmuller Nov 19, 2024

Choose a reason for hiding this comment

k-anshul Nov 19, 2024

Choose a reason for hiding this comment