-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new flag - strict-topology #282
Conversation
Welcome @avalluri! |
Hi @avalluri. Thanks for your PR. I'm waiting for a kubernetes-csi or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a README.md section which explains the topology support and add something for the new mode there?
8dea010
to
050a6e5
Compare
050a6e5
to
5b784fb
Compare
@pohly I tried adding a section to ReadMe that explains how |
b6b30b0
to
b970817
Compare
b970817
to
bfda61a
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description looks good to me now. For the code I'll defer to someone who is more familiar with it.
One more thing. Can you add a
support strict topology for volumes with delayed binding
to the PR description?
bfda61a
to
508be1a
Compare
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally lgtm! Thanks for working on this! Can you also add a release note to the initial comment describing the new option?
/approve
@@ -62,6 +62,7 @@ var ( | |||
|
|||
enableLeaderElection = flag.Bool("enable-leader-election", false, "Enables leader election. If leader election is enabled, additional RBAC rules are required. Please refer to the Kubernetes CSI documentation for instructions on setting up these RBAC rules.") | |||
leaderElectionType = flag.String("leader-election-type", "endpoints", "the type of leader election, options are 'endpoints' (default) or 'leases' (strongly recommended). The 'endpoints' option is deprecated in favor of 'leases'.") | |||
strictTopology = flag.Bool("strict-topology", false, "Passes only selected node topology to CreateVolume Request, unlike default behavior of passing all nodes that match with topology keys of the selected node.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match the wording in the README:
"passing all nodes" => "passing aggregated cluster topologies"
pkg/controller/topology.go
Outdated
if err != nil { | ||
return nil, err | ||
if selectedCSINode != nil && strictTopology { | ||
// Make sure that selected node topology is in allowed topologies list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably be more efficient and just assume Kubernetes does the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but there is a test "topology from selected node is not in allowedTopologies
" for this, so I added this check to satisfy the test.
pkg/controller/topology.go
Outdated
requisiteTerms, err = aggregateTopologies(kubeClient, driverName, selectedCSINode) | ||
if err != nil { | ||
return nil, err | ||
if selectedCSINode != nil && strictTopology { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be changed to a switch statement with the other 2 conditions, since all 3 are mutually exclusive from each other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are not mutually exclusive. It's possible that both allowedTopologies
and selectedNode
set, and resulted topology depends on strictTopology
value.
I could move this block to inside above if selectedNode != nil {..}
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: avalluri, msau42 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
With the current implementation, In delayed binding case, CSI driver is offered with all nodes topology that are matched with 'selected node' topology keys in CreateVolumeRequest.AccessibilityRequirements. So this allows the driver to select any node from the passed preferred list to create volume. But this results in scheduling failure when the volume created on a node other than Kubernetes selected node. To address this, introduced new flag "--strict-topology', when set, in case of delayed binding, the driver is offered with only selected node topology, so that driver has to create the volume on this node. Modified tests so that now every test is run with and without 'strict topology'.
508be1a
to
5bd554b
Compare
/lgtm |
Our recent change (kubernetes-csi/external-provisioner#282) got merged to master which fixes late binding case. Till it appears in next release(v1.2) we use canary builds which holds this change.
…strict-topology With the current implementation, In delayed binding case, CSI driver is offered with all nodes topology that are matched with 'selected node' topology keys in CreateVolumeRequest.AccessibilityRequirements. So this allows the driver to select any node from the passed preferred list to create volume. But this results in scheduling failure when the volume created on a node other than Kubernetes selected node. To address this, introduced new flag "--strict-topology', when set, in case of delayed binding, the driver is offered with only selected node topology, so that driver has to create the volume on this node. Modified tests so that now every test is run with and without 'strict topology'.
Our recent change (kubernetes-csi/external-provisioner#282) got merged to master which fixes late binding case. Till it appears in next release(v1.2) we use canary builds which holds this change.
Our recent change (kubernetes-csi/external-provisioner#282) got merged to master which fixes late binding case. Till it appears in next release(v1.2) we use canary builds which holds this change.
Bump deps to v0.20.0
With the current implementation, In delayed binding case, CSI driver is offered
with all nodes topology that are matched with 'selected node' topology keys in
CreateVolumeRequest.AccessibilityRequirements
. So this allows the driver toselect any node from the passed
preferred/requisite
list to create volume. But thisresults in scheduling failure when the volume created on a node other than
Kubernetes selected node.
To address this, introduced new flag "--strict-topology', when set, in case of
delayed binding, the driver is offered with only selected node topology, so that
the driver has to create the volume on this node.
This new flag can be used by drivers that support strict topology for volumes with delayed binding.
What type of PR is this?
/kind bug
/kind design
What this PR does / why we need it:
In case of delayed binding, creating volume on the different topology that is not accessed by the Kubernetes selected node for Pod scheduling, leads to unresolvable scheduling failures. So we should not allow the driver to create such volumes. We can avoid this by passing right/strict accessibility topologies instead of 'aggregated topology' to CreateVolume request.
Which issue(s) this PR fixes:
Fixes #221
Does this PR introduce a user-facing change?:
Yes