-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backup: use locality filter to restrict processes to locality #99089
Conversation
Approved for backupccl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some tests we could add to ensure the planning is as expected (I guess we should be able to use EXPLAIN (DISTSQL)
)? It might be annoying to set up the cluster with all the localities though, and I can't easily recall an example of where we do that.
Reviewed 1 of 2 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @benbardin, @dt, @mgartner, and @msbutler)
e6c9387
to
d714663
Compare
Yep, as mentioned in the PR text, I was still working on tests. They're in now, along with the rebase, so this is RFAL. In the (now) first commit, the existing tests of The second second commit integrating filtering into BACKUP now includes an end-to-end test using a BACKUP to |
@yuzefovich I don't think we can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuzefovich I don't think we can EXPLAIN DISTSQL a job, so I think the test in the second commit is as close as we can get to e2e test?
Yeah, I think you're right.
I only have nits, so but it'd be good to get an approval from someone from DR too.
Reviewed 1 of 5 files at r5, 6 of 6 files at r6, 3 of 3 files at r7, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @benbardin, @dt, @mgartner, and @msbutler)
pkg/sql/distsql_physical_planner.go
line 1426 at r6 (raw file):
// If we were able to determine the locality information for at least some // instances, use the region-aware resolver.
nit: s/region/locality/g
.
pkg/sql/distsql_physical_planner.go
line 1455 at r6 (raw file):
// No instances had any locality tiers in common with the node locality so // just return the a gateway if it is eligible. If it isn't, just pick a
nit: "the a gateway".
pkg/sql/distsql_plan_bulk.go
line 51 at r6 (raw file):
} // setupAllNodesPlanningSystem creates a planCtx and returns all nodes available
nit: mention the locality filter.
pkg/sql/distsql_plan_bulk.go
line 88 at r6 (raw file):
} // setupAllNodesPlanningTenant creates a planCtx and returns all nodes available
nit: ditto.
pkg/ccl/backupccl/backup_test.go
line 372 at r7 (raw file):
defer cleanupFn() // locationToDir converts backup URIs based on localFoo to the temporary
nit: s/locationToDir/dirOf/g
.
pkg/ccl/backupccl/backup_test.go
line 530 at r7 (raw file):
// Job exec relocation will return an error while it waits for resumption on // a matching node, but ths makes for a slow test so just send the job stmt to
nit: s/ths/this/g
.
pkg/sql/distsql_physical_planner_test.go
line 1455 at r6 (raw file):
} type mockAddressResolver map[base.SQLInstanceID]roachpb.Locality
nit: add "interface assertion" (i.e. something like var _ AddressResolver = mockAddressResolver{}
).
This adds a locality filter to PlanningCtx, set via an argument to the SetupAllNodesPlanning variant that allows controlling the replica oracle, that will restrict the instances on which planning can occur to only those which match the locality filter. This option automatically triggers the use of instance-based planning. Currently this option is only available via SetupAllNodesPlanning but future work could expose it to all physical planning via a session var. Release note: none. Epic: none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
good to get an approval from someone from DR too.
Indeed, Ben signed off earlier so all set there.
Thanks again for all the help/guidance in figuring out this out.
bors r=yuzefovich,benbardin
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @benbardin, @mgartner, and @msbutler)
pkg/sql/distsql_physical_planner.go
line 1565 at r2 (raw file):
Previously, benbardin (Ben Bardin) wrote…
I love the new name!
Done.
I just passed it anyway so there is one fewer cite to update when we add generalized query loc filters.
pkg/sql/distsql_physical_planner.go
line 1426 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/region/locality/g
.
Done.
pkg/sql/distsql_physical_planner.go
line 1455 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: "the a gateway".
Done.
pkg/sql/distsql_plan_bulk.go
line 46 at r2 (raw file):
Previously, benbardin (Ben Bardin) wrote…
👍
Done.
pkg/sql/distsql_plan_bulk.go
line 51 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: mention the locality filter.
Done.
pkg/sql/distsql_plan_bulk.go
line 88 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: ditto.
Done.
pkg/ccl/backupccl/backup_test.go
line 372 at r7 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/locationToDir/dirOf/g
.
Done.
pkg/ccl/backupccl/backup_test.go
line 530 at r7 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/ths/this/g
.
Done.
pkg/sql/distsql_physical_planner_test.go
line 1455 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: add "interface assertion" (i.e. something like
var _ AddressResolver = mockAddressResolver{}
).
Done.
This extends the previous locality filter option that pinned the backup job to a specific locality filter to also pin the processors which backup the actual row data as well, allowing assignment of a backup job to only the instances running in a particular locality. Those instances will attempt to read from the closest replica they can, however a potential optimization would be to pick an instace closest to a replica that also matches the locality filter if there is one -- currently it will likely just pick the instance closest to the leaseholder which matches the filter, however there may be an instance closer to a usable replica that is not the leaseholder. To make use of these replicas, a replica oracle that ranks replicas matching the filter would need to be passed when locality filtering placement is in-use; this optimization is left to a follow-up. Release note (sql change): the nodes involved in the execution of backups, including processing of row data and the job coordination, can now be controlled using an execution-locality filter. Epic: CRDB-9547.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r-
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @benbardin, @mgartner, and @msbutler)
Canceled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, overlooked that git push had failed. Let's try that again now
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @benbardin, @mgartner, @msbutler, and @yuzefovich)
Build succeeded: |
This extends the previous locality filter option that pinned the backup job to a specific
locality filter to also pin the processors which backup the actual row data as well, allowing
assignment of a backup job to only the instances running in a particular locality.
Those instances will attempt to read from the closest replica they can, however a potential
optimization would be to pick an instace closest to a replica that also matches the locality
filter if there is one -- currently it will likely just pick the instance closest to the
leaseholder which matches the filter, however there may be an instance closer to a usable
replica that is not the leaseholder. To make use of these replicas, a replica oracle that
ranks replicas matching the filter would need to be passed when locality filtering placement
is in-use; this optimization is left to a follow-up.
Release note (sql change): the nodes involved in the execution of backups, including processing of row data and the job coordination, can now be controlled using an execution-locality filter.
Epic: CRDB-9547.