You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently on backup, we preserve both the table metadata, as well as the per-tablet metadata, which would include the individual partition ranges for each tablet.
However, on restore, we recreate the table, with the same number of tablets, but we don't reuse the partition info from the backup! Instead, we just create a new table, with N tablets, evenly split in the partition space...
Historically, this has been fine because
the partition computation code has always been the same -- so the N partitions in the backup would have the same (start, end) as the new N created on restore
there was no mechanism to create skewed partitions -- the code to generate partitions would create even (at worse off by 1) splits
In the future though, there's at least two mechanisms that will change these assumptions
tablet splitting -- a tablet splitting into two will lead to two smaller partitions
explicit YSQL syntax to specify split points -- this could lead to adhoc distribution of split points
In a few words: currently CatalogManager::RecreateTable() fills CreateTableRequestPB to recreate the table. It fills partition schema too:
*req.mutable_partition_schema() = meta.partition_schema();
but in CtalogManager::CreateTable() it is overwritten by CreatePartitions():
} else {
s = PartitionSchema::FromPB(req.partition_schema(), schema, &partition_schema);
RETURN_NOT_OK(partition_schema.CreatePartitions(num_tablets, &partitions));
}
(it's even loaded from the Req, but the result status is ignored and "partition_schema" is overwritten- based on 'num_tablets'.)
These lines must be updated to cover cases when the partition schema can be taken from the Req.
The fix should not be complex. But we need carefully test all cases...
cql/sql, colocated, cql/sql backup restoring, hash tables, range tables, presplit tables... may be something else
Oleg, can you also share a snippet of how to run yb_backup locally against a yb-ctl cluster, so this can be tested easily manually?
mikhpolitov
changed the title
[docdb] Restore should preserve the exact partitioning that the source tablets had
[docdb] [YCQL] Restore should preserve the exact partitioning that the source tablets had
Oct 1, 2021
Currently on backup, we preserve both the table metadata, as well as the per-tablet metadata, which would include the individual partition ranges for each tablet.
However, on restore, we recreate the table, with the same number of tablets, but we don't reuse the partition info from the backup! Instead, we just create a new table, with N tablets, evenly split in the partition space...
Historically, this has been fine because
(start, end)
as the new N created on restoreIn the future though, there's at least two mechanisms that will change these assumptions
Implementation notes from @OlegLoginov
Oleg, can you also share a snippet of how to run
yb_backup
locally against a yb-ctl cluster, so this can be tested easily manually?cc @ttyusupov
The text was updated successfully, but these errors were encountered: