Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of CSI: failed allocation should not block its own controller unpublish into release/1.3.x #14508

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #14484 to be assessed for backporting due to the inclusion of the label backport/1.3.x.

The below text is copied from the body of the original PR.


A Nomad user reported problems with CSI volumes associated with failed allocations, where the Nomad server did not send a controller unpublish RPC.

The controller unpublish is skipped if other non-terminal allocations on the same node claim the volume. The check has a bug where the allocation belonging to the claim being freed was included in the check incorrectly. During a normal allocation stop for job stop or a new version of the job, the allocation is terminal so that's ok. But allocations that fail are not yet marked terminal at the point in time when the client sends the unpublish RPC to the server.

For CSI plugins that support controller attach/detach, this means that the controller will not be able to detach the volume from the allocation's host and the replacement claim will fail until a GC is run. This changeset fixes the conditional so that the claim's own allocation is not included, and makes the logic easier to read. Include a test case covering this path.

This PR includes two other tiny bug fixes that were going to be a pain if I had to backport 3 different PRs. They're in their own commits:

  • Fix missing copies in the volume unpublish workflow. Entities we get from the state store should always be copied before altering. Ensure that we copy the volume in the top-level unpublish workflow before handing off to the steps.
  • The list stub object for volumes in nomad/structs did not match the stub object in api. The api package also did not include the current readers/writers fields that are expected by the UI. True up the two objects and add the previously undocumented fields to the docs.

@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/b-csi-controller-unpublish/pleasantly-casual-duck branch from 3ad889f to 88cb907 Compare September 8, 2022 17:30
@hc-github-team-nomad-core hc-github-team-nomad-core merged commit a7ce11c into release/1.3.x Sep 8, 2022
@hc-github-team-nomad-core hc-github-team-nomad-core deleted the backport/b-csi-controller-unpublish/pleasantly-casual-duck branch September 8, 2022 17:30
@github-actions
Copy link

github-actions bot commented Jan 7, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants