-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sstable indetifier for deduplication instead of sstable generation ID #4069
Comments
cc: @karol-kokoszka @regevran @bhalevy Right now I'm not sure how to implement this. SM backup structure keeps different node's sstables under different paths.
|
Does it mean that in order to look for duplication we need to seek all paths?
Encryption may be changed on the very same node the backup was taken from as well (between the backup and the restore) |
This should never happen.
|
As for encryption keys, we care about the sstable's contents and not its representation. |
This feature is only going to be available in 2025.1 (Enterprise) / 6.3 (OSS), so I'm not sure we should use it until it's available and widely used. |
Do you mean that we'll have a long transition time? |
Tablets are migrating between the nodes + during the migration, the SSTable name can change. When the tablet is migrating, it means that new SSTable bundle name is generated + it may change the node where it belongs to. Let me put some summary of already identified problems.
The fact is that deduplication in SM is not gonna work efficiently for Scylla Enterprise 2024.2, when tablets are enabled.. The Encryption at Rest will bother us only if tablets can migrate between the Datacenters, but I understand this is not the case. |
I think that with dump-scylla-metada. |
|
Recently, Scylla merged scylladb/scylladb#21002.
We should use it for sstable deduplication instead of the currently used generation ID approach, as it has the following benefits:
The second argument is self explanatory.
In terms of the first one, we would need to create a design doc specifying how would the deduplication/upload handle the case when an sstable is already present in the backup location, but with different ID and under a different node path.
The text was updated successfully, but these errors were encountered: