Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new flag - strict-topology #282

Merged
merged 1 commit into from
Jun 4, 2019

Conversation

avalluri
Copy link
Contributor

@avalluri avalluri commented May 21, 2019

With the current implementation, In delayed binding case, CSI driver is offered
with all nodes topology that are matched with 'selected node' topology keys in
CreateVolumeRequest.AccessibilityRequirements. So this allows the driver to
select any node from the passed preferred/requisite list to create volume. But this
results in scheduling failure when the volume created on a node other than
Kubernetes selected node.

To address this, introduced new flag "--strict-topology', when set, in case of
delayed binding, the driver is offered with only selected node topology, so that
the driver has to create the volume on this node.

This new flag can be used by drivers that support strict topology for volumes with delayed binding.

What type of PR is this?
/kind bug
/kind design

What this PR does / why we need it:
In case of delayed binding, creating volume on the different topology that is not accessed by the Kubernetes selected node for Pod scheduling, leads to unresolvable scheduling failures. So we should not allow the driver to create such volumes. We can avoid this by passing right/strict accessibility topologies instead of 'aggregated topology' to CreateVolume request.

Which issue(s) this PR fixes:
Fixes #221

Does this PR introduce a user-facing change?:
Yes

support strict topology for volumes with delayed binding

@k8s-ci-robot
Copy link
Contributor

Welcome @avalluri!

It looks like this is your first PR to kubernetes-csi/external-provisioner 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/external-provisioner has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. kind/design Categorizes issue or PR as related to design. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 21, 2019
@k8s-ci-robot k8s-ci-robot requested review from davidz627 and lpabon May 21, 2019 12:57
@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 21, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @avalluri. Thanks for your PR.

I'm waiting for a kubernetes-csi or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a README.md section which explains the topology support and add something for the new mode there?

pkg/controller/topology_test.go Outdated Show resolved Hide resolved
pkg/controller/topology.go Outdated Show resolved Hide resolved
@avalluri avalluri force-pushed the fix-late-binding branch from 8dea010 to 050a6e5 Compare May 22, 2019 15:33
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 22, 2019
@avalluri avalluri force-pushed the fix-late-binding branch from 050a6e5 to 5b784fb Compare May 23, 2019 08:43
@avalluri
Copy link
Contributor Author

Can you add a README.md section which explains the topology support and add something for the new mode there?

@pohly I tried adding a section to ReadMe that explains how AccessibilityRequirements are prepared. Can you please have a look if it is good enough.

@avalluri avalluri force-pushed the fix-late-binding branch 4 times, most recently from b6b30b0 to b970817 Compare May 24, 2019 08:40
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@avalluri avalluri force-pushed the fix-late-binding branch from b970817 to bfda61a Compare May 25, 2019 20:54
@davidz627
Copy link
Contributor

/cc @msau42 @verult

@k8s-ci-robot k8s-ci-robot requested review from msau42 and verult May 28, 2019 21:04
@pohly
Copy link
Contributor

pohly commented May 29, 2019

/retest

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description looks good to me now. For the code I'll defer to someone who is more familiar with it.

One more thing. Can you add a

support strict topology for volumes with delayed binding

to the PR description?

README.md Outdated Show resolved Hide resolved
@avalluri avalluri force-pushed the fix-late-binding branch from bfda61a to 508be1a Compare May 29, 2019 12:41
@davidz627
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 29, 2019
Copy link
Collaborator

@msau42 msau42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally lgtm! Thanks for working on this! Can you also add a release note to the initial comment describing the new option?

/approve

@@ -62,6 +62,7 @@ var (

enableLeaderElection = flag.Bool("enable-leader-election", false, "Enables leader election. If leader election is enabled, additional RBAC rules are required. Please refer to the Kubernetes CSI documentation for instructions on setting up these RBAC rules.")
leaderElectionType = flag.String("leader-election-type", "endpoints", "the type of leader election, options are 'endpoints' (default) or 'leases' (strongly recommended). The 'endpoints' option is deprecated in favor of 'leases'.")
strictTopology = flag.Bool("strict-topology", false, "Passes only selected node topology to CreateVolume Request, unlike default behavior of passing all nodes that match with topology keys of the selected node.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the wording in the README:

"passing all nodes" => "passing aggregated cluster topologies"

if err != nil {
return nil, err
if selectedCSINode != nil && strictTopology {
// Make sure that selected node topology is in allowed topologies list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably be more efficient and just assume Kubernetes does the right thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but there is a test "topology from selected node is not in allowedTopologies" for this, so I added this check to satisfy the test.

requisiteTerms, err = aggregateTopologies(kubeClient, driverName, selectedCSINode)
if err != nil {
return nil, err
if selectedCSINode != nil && strictTopology {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be changed to a switch statement with the other 2 conditions, since all 3 are mutually exclusive from each other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are not mutually exclusive. It's possible that both allowedTopologies and selectedNode set, and resulted topology depends on strictTopology value.

I could move this block to inside above if selectedNode != nil {..}

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: avalluri, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2019
@avalluri avalluri changed the title RFC: Introduce new flag - strict-topology Introduce new flag - strict-topology Jun 4, 2019
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 4, 2019
With the current implementation, In delayed binding case, CSI driver is offered
with all nodes topology that are matched with 'selected node' topology keys in
CreateVolumeRequest.AccessibilityRequirements. So this allows the driver to
select any node from the passed preferred list to create volume. But this
results in scheduling failure when the volume created on a node other than
Kubernetes selected node.

To address this, introduced new flag "--strict-topology', when set, in case of
delayed binding, the driver is offered with only selected node topology, so that
driver has to create the volume on this node.

Modified tests so that now every test is run with and without 'strict topology'.
@avalluri avalluri force-pushed the fix-late-binding branch from 508be1a to 5bd554b Compare June 4, 2019 09:50
@msau42
Copy link
Collaborator

msau42 commented Jun 4, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2019
@k8s-ci-robot k8s-ci-robot merged commit 1730a1e into kubernetes-csi:master Jun 4, 2019
avalluri added a commit to avalluri/pmem-CSI that referenced this pull request Jun 7, 2019
Our recent change
(kubernetes-csi/external-provisioner#282) got merged to
master which fixes late binding case. Till it appears in next release(v1.2) we
use canary builds which holds this change.
pohly pushed a commit to pohly/external-provisioner that referenced this pull request Jun 20, 2019
…strict-topology

With the current implementation, In delayed binding case, CSI driver is offered
with all nodes topology that are matched with 'selected node' topology keys in
CreateVolumeRequest.AccessibilityRequirements. So this allows the driver to
select any node from the passed preferred list to create volume. But this
results in scheduling failure when the volume created on a node other than
Kubernetes selected node.

To address this, introduced new flag "--strict-topology', when set, in case of
delayed binding, the driver is offered with only selected node topology, so that
driver has to create the volume on this node.

Modified tests so that now every test is run with and without 'strict topology'.
pohly pushed a commit to pohly/pmem-CSI that referenced this pull request Jun 26, 2019
Our recent change
(kubernetes-csi/external-provisioner#282) got merged to
master which fixes late binding case. Till it appears in next release(v1.2) we
use canary builds which holds this change.
pohly pushed a commit to pohly/pmem-CSI that referenced this pull request Jun 26, 2019
Our recent change
(kubernetes-csi/external-provisioner#282) got merged to
master which fixes late binding case. Till it appears in next release(v1.2) we
use canary builds which holds this change.
kbsonlong pushed a commit to kbsonlong/external-provisioner that referenced this pull request Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong AccessibilityRequirement passed in CreateVolumeRequest
5 participants