-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make restore AZ aware #4039
Comments
Unfortunately, I don't have a clear idea on how to safely use AZ information in SM restore. |
If datacenter.RF == count(datacenter.racks), then each rack gets one replica. Typical example is RF=3 and nr_racks=3. If this holds, you can take a rack's backup and copy it to just one restored cluster rack, with nodetool refresh --load-and-stream --keep-rack (doesn't exist yet). This reduces the number of receivers from 3 to 1, and significantly reduces the compaction load. |
The reduction in receivers is already achieved with --primary-replica-only, but I guess that streaming withing the same rack should be faster. Perhaps this would also speed up the post-restore repair, as (depending on data consistency during backup) less data would need to be transferred between the nodes during the repair. |
Cc @regevran |
This should be a scylladb issue, but as an optimization, not for the general case. |
The scenario where the DC replication factor equals the number of racks in a given datacenter requires Scylla Manager to understand the mapping between the source rack and the destination rack. In this case, Scylla Manager could ensure that data from a single rack is restored (downloaded and then subjected to l&s) only by nodes from the corresponding rack, assuming that the --keep-rack flag is implemented in the core. This is because the l&s process would always select a node from the same rack. A repair would not be necessary because the fact that entire replica is restored, is guaranteed by the --keep-rack flag and the fact that the RF equals the number of racks. Currently, l&s duplicates work because the --primary-replica-only flag causes the same replica's data to always be streamed to the same node, which eventually necessitates a post-restore repair. To clarify, we assume this optimization applies only when the restore occurs on an identical topology. Currently, the backup manifest does not include rack information. The RF can be determined from the dumped schema by parsing the CREATE KEYSPACE... string. We could extend the manifest to include rack information, but this would mean that older backups would not contain this information. As a result, the optimization would only apply to new backups. We will not include this in version 3.4. The prerequisite for this optimization is that the --keep-rack flag is implemented on the core side. |
It’s a bit counterintuitive to combine EDIT |
FWIW, this can be retrieved from system.local or system.peers if they are backed up. |
|
During restore improvement meetings, it was mentioned that making SM AZ aware could speed up the restore process.
We should experiment with that and see the results.
The text was updated successfully, but these errors were encountered: