CSI: failed allocation should not block its own controller unpublish #14484

tgross · 2022-09-07T20:50:57Z

A Nomad user reported problems with CSI volumes associated with failed allocations, where the Nomad server did not send a controller unpublish RPC.

The controller unpublish is skipped if other non-terminal allocations on the same node claim the volume. The check has a bug where the allocation belonging to the claim being freed was included in the check incorrectly. During a normal allocation stop for job stop or a new version of the job, the allocation is terminal so that's ok. But allocations that fail are not yet marked terminal at the point in time when the client sends the unpublish RPC to the server.

For CSI plugins that support controller attach/detach, this means that the controller will not be able to detach the volume from the allocation's host and the replacement claim will fail until a GC is run. This changeset fixes the conditional so that the claim's own allocation is not included, and makes the logic easier to read. Include a test case covering this path.

This PR includes two other tiny bug fixes that were going to be a pain if I had to backport 3 different PRs. They're in their own commits:

Fix missing copies in the volume unpublish workflow. Entities we get from the state store should always be copied before altering. Ensure that we copy the volume in the top-level unpublish workflow before handing off to the steps.
The list stub object for volumes in nomad/structs did not match the stub object in api. The api package also did not include the current readers/writers fields that are expected by the UI. True up the two objects and add the previously undocumented fields to the docs.

Entities we get from the state store should always be copied before altering. Ensure that we copy the volume in the top-level unpublish workflow before handing off to the steps.

github-actions · 2022-09-07T20:57:12Z

Ember Asset Size action

As of 7388241

Files that stayed the same size 🤷‍:

File	raw	gzip
nomad-ui.js	0 B	0 B
vendor.js	0 B	0 B
nomad-ui.css	0 B	0 B
vendor.css	0 B	0 B

github-actions · 2022-09-07T21:36:48Z

Ember Test Audit comparison

	main	d079d662e2f0ffbff4e1f63625cc96e32d8e369d	change
passes	1418	1417	-1
failures	0	1	+1
flaky	0	0	0
duration	12m 35s 444ms	000ms	-12m 35s 444ms

The list stub object for volumes in `nomad/structs` did not match the stub object in `api`. The `api` package also did not include the current readers/writers fields that are expected by the UI. True up the two objects and add the previously undocumented fields to the docs.

Nomad user reported problems with CSI volumes associated with failed allocations, where the Nomad server did not send a controller unpublish RPC. The controller unpublish is skipped if other non-terminal allocations on the same node claim the volume. The check has a bug where the allocation belonging to the claim being freed was included in the check incorrectly. During a normal allocation stop for job stop or a new version of the job, the allocation is terminal. But allocations that fail are not yet marked terminal at the point in time when the client sends the unpublish RPC to the server. For CSI plugins that support controller attach/detach, this means that the controller will not be able to detach the volume from the allocation's host and the replacement claim will fail until a GC is run. This changeset fixes the conditional so that the claim's own allocation is not included, and makes the logic easier to read. Include a test case covering this path.

lgfa29

Just questions, so feel free to ignore them 😄

lgfa29 · 2022-09-08T15:53:38Z

nomad/csi_endpoint.go

+	vol = vol.Copy()
 	err = v.nodeUnpublishVolume(vol, claim)
 	if err != nil {
 		return err
 	}

 NODE_DETACHED:
+	vol = vol.Copy()


Just checking my understanding: are the two Copy calls required (instead of, for example, copying it once before the switch statement) because nodeUnpublishVolume will eventually call CSIVolumeDenormalize which will read the volume from the state store again?

Right. That isn't guaranteed because nodeUnpublishVolume might return before that point if the node has been GC'd, so we can end up copying one extra time uselessly. That's unfortunate but doesn't feel like a big deal as it's a bit of a corner case. The other option would be to try to make it really precise about when we need to copy, but I think we've found that to be really error-prone. (And maybe something we could solve for in the state store itself at some point.)

lgfa29 · 2022-09-08T16:03:02Z

nomad/csi_endpoint.go

+		claim.State = structs.CSIVolumeClaimStateReadyToFree
 		return nil


If we skip the call to checkpointClaim does that mean that this claim state change is lost if a leadership transition happens?

This assignment is really just for helping out testing. If we return nil from controllerUnpublishVolume the next step in the caller is to set claim.State = structs.CSIVolumeClaimStateReadyToFree and checkpoint.

(Same applies to the one below)

lgfa29 · 2022-09-08T16:24:51Z

nomad/csi_endpoint.go

+	// allocations
+	shouldCancel := func(alloc *structs.Allocation) bool {
+		if alloc != nil && alloc.ID != claim.AllocationID &&
+			alloc.NodeID == claim.NodeID && !alloc.TerminalStatus() {
 			claim.State = structs.CSIVolumeClaimStateReadyToFree


I'm probably missing something, so just to understand this better, is the claim state updated here because it always needs to be set to CSIVolumeClaimStateReadyToFree before returning?

…14484) A Nomad user reported problems with CSI volumes associated with failed allocations, where the Nomad server did not send a controller unpublish RPC. The controller unpublish is skipped if other non-terminal allocations on the same node claim the volume. The check has a bug where the allocation belonging to the claim being freed was included in the check incorrectly. During a normal allocation stop for job stop or a new version of the job, the allocation is terminal. But allocations that fail are not yet marked terminal at the point in time when the client sends the unpublish RPC to the server. For CSI plugins that support controller attach/detach, this means that the controller will not be able to detach the volume from the allocation's host and the replacement claim will fail until a GC is run. This changeset fixes the conditional so that the claim's own allocation is not included, and makes the logic easier to read. Include a test case covering this path. Also includes two minor extra bugfixes: * Entities we get from the state store should always be copied before altering. Ensure that we copy the volume in the top-level unpublish workflow before handing off to the steps. * The list stub object for volumes in `nomad/structs` did not match the stub object in `api`. The `api` package also did not include the current readers/writers fields that are expected by the UI. True up the two objects and add the previously undocumented fields to the docs.

…14484) (#14507) A Nomad user reported problems with CSI volumes associated with failed allocations, where the Nomad server did not send a controller unpublish RPC. The controller unpublish is skipped if other non-terminal allocations on the same node claim the volume. The check has a bug where the allocation belonging to the claim being freed was included in the check incorrectly. During a normal allocation stop for job stop or a new version of the job, the allocation is terminal. But allocations that fail are not yet marked terminal at the point in time when the client sends the unpublish RPC to the server. For CSI plugins that support controller attach/detach, this means that the controller will not be able to detach the volume from the allocation's host and the replacement claim will fail until a GC is run. This changeset fixes the conditional so that the claim's own allocation is not included, and makes the logic easier to read. Include a test case covering this path. Also includes two minor extra bugfixes: * Entities we get from the state store should always be copied before altering. Ensure that we copy the volume in the top-level unpublish workflow before handing off to the steps. * The list stub object for volumes in `nomad/structs` did not match the stub object in `api`. The `api` package also did not include the current readers/writers fields that are expected by the UI. True up the two objects and add the previously undocumented fields to the docs. Co-authored-by: Tim Gross <[email protected]>

…14484) (#14506) A Nomad user reported problems with CSI volumes associated with failed allocations, where the Nomad server did not send a controller unpublish RPC. The controller unpublish is skipped if other non-terminal allocations on the same node claim the volume. The check has a bug where the allocation belonging to the claim being freed was included in the check incorrectly. During a normal allocation stop for job stop or a new version of the job, the allocation is terminal. But allocations that fail are not yet marked terminal at the point in time when the client sends the unpublish RPC to the server. For CSI plugins that support controller attach/detach, this means that the controller will not be able to detach the volume from the allocation's host and the replacement claim will fail until a GC is run. This changeset fixes the conditional so that the claim's own allocation is not included, and makes the logic easier to read. Include a test case covering this path. Also includes two minor extra bugfixes: * Entities we get from the state store should always be copied before altering. Ensure that we copy the volume in the top-level unpublish workflow before handing off to the steps. * The list stub object for volumes in `nomad/structs` did not match the stub object in `api`. The `api` package also did not include the current readers/writers fields that are expected by the UI. True up the two objects and add the previously undocumented fields to the docs. Co-authored-by: Tim Gross <[email protected]>

github-actions · 2023-01-07T02:14:42Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

csi: fix missing copies in volume unpublish workflow

16cfd8e

Entities we get from the state store should always be copied before altering. Ensure that we copy the volume in the top-level unpublish workflow before handing off to the steps.

tgross force-pushed the b-csi-controller-unpublish branch from d079d66 to 7388241 Compare September 7, 2022 20:55

tgross added theme/storage type/bug labels Sep 7, 2022

tgross added this to the 1.4.0 milestone Sep 7, 2022

tgross added backport/1.1.x backport to 1.1.x release line backport/1.2.x backport to 1.1.x release line backport/1.3.x backport to 1.3.x release line labels Sep 7, 2022

tgross changed the title ~~CSI: allocation should not block its own controller unpublish~~ CSI: failed allocation should not block its own controller unpublish Sep 7, 2022

vercel bot deployed to Preview – nomad-storybook-and-ui September 7, 2022 21:02 View deployment

tgross force-pushed the b-csi-controller-unpublish branch from 7388241 to 4154123 Compare September 8, 2022 13:31

tgross added 3 commits September 8, 2022 09:31

changelog entry

1038b68

tgross force-pushed the b-csi-controller-unpublish branch from 4154123 to 1038b68 Compare September 8, 2022 13:33

tgross marked this pull request as ready for review September 8, 2022 13:34

tgross requested review from lgfa29 and DerekStrickland September 8, 2022 13:35

vercel bot deployed to Preview – nomad-storybook-and-ui September 8, 2022 13:37 View deployment

tgross mentioned this pull request Sep 8, 2022

test: fix concurrent map access in TestStatsFetcher #14496

Merged

lgfa29 approved these changes Sep 8, 2022

View reviewed changes

tgross merged commit f2186be into main Sep 8, 2022

tgross deleted the b-csi-controller-unpublish branch September 8, 2022 17:30

tgross mentioned this pull request Sep 27, 2022

CSI: ensure initial unpublish state is checkpointed #14675

Merged

github-actions bot locked as resolved and limited conversation to collaborators Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSI: failed allocation should not block its own controller unpublish #14484

CSI: failed allocation should not block its own controller unpublish #14484

tgross commented Sep 7, 2022 •

edited

Loading

github-actions bot commented Sep 7, 2022 •

edited

Loading

github-actions bot commented Sep 7, 2022 •

edited

Loading

lgfa29 left a comment

lgfa29 Sep 8, 2022

tgross Sep 8, 2022 •

edited

Loading

lgfa29 Sep 8, 2022

tgross Sep 8, 2022 •

edited

Loading

lgfa29 Sep 8, 2022

github-actions bot commented Jan 7, 2023

		claim.State = structs.CSIVolumeClaimStateReadyToFree
		return nil

CSI: failed allocation should not block its own controller unpublish #14484

CSI: failed allocation should not block its own controller unpublish #14484

Conversation

tgross commented Sep 7, 2022 • edited Loading

github-actions bot commented Sep 7, 2022 • edited Loading

Ember Asset Size action

github-actions bot commented Sep 7, 2022 • edited Loading

Ember Test Audit comparison

lgfa29 left a comment

Choose a reason for hiding this comment

lgfa29 Sep 8, 2022

Choose a reason for hiding this comment

tgross Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

lgfa29 Sep 8, 2022

Choose a reason for hiding this comment

tgross Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

lgfa29 Sep 8, 2022

Choose a reason for hiding this comment

github-actions bot commented Jan 7, 2023

tgross commented Sep 7, 2022 •

edited

Loading

github-actions bot commented Sep 7, 2022 •

edited

Loading

github-actions bot commented Sep 7, 2022 •

edited

Loading

tgross Sep 8, 2022 •

edited

Loading

tgross Sep 8, 2022 •

edited

Loading