-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI: unique volume per allocation #10136
Conversation
c257b9b
to
dd628be
Compare
dd628be
to
a90d367
Compare
a90d367
to
3b7e58e
Compare
2d04b6c
to
2d2118b
Compare
2d2118b
to
d3820d6
Compare
d3820d6
to
cebcb23
Compare
c80c665
to
d77fd59
Compare
d77fd59
to
3ff2917
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had a couple of questions after skimming the PR. Will review with closer attention later today.
@@ -92,8 +92,13 @@ func (c *csiHook) Postrun() error { | |||
mode = structs.CSIVolumeClaimWrite | |||
} | |||
|
|||
source := pair.request.Source | |||
if pair.request.PerAlloc { | |||
source = source + structs.AllocSuffix(c.alloc.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall an issue where alloc indexes aren't unique, and we may run into two allocations sharing the same id in cases of canaries where multiple deployment versions are running. Would that cause an issue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would! But it's one of our invariants that if you're running a job with PerAlloc
you can't also use canaries. (It doesn't make sense as a concept with volumes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make a note of that and/or add a test for that. I fear one day, we relax or change the requirement a bit and miss this assumption in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll definitely add some more commentary to make that clear though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized I totally missed the docs too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3b1d19c
nomad/state/state_store.go
Outdated
if obj == nil { | ||
return nil, nil | ||
} | ||
vol, ok := obj.(*structs.CSIVolume) | ||
if !ok { | ||
return nil, fmt.Errorf("volume row conversion error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a violated invariant, potentially a data corruption issue. Not sure what should happen here. It may be nice to log and include the obj
type as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the rest of the state store code actually just tosses this conversion error out and lets us panic later. That's probably the right move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 4ae9cb8
Add a `PerAlloc` field to volume requests that directs the scheduler to test feasibility for volumes with a source ID that includes the allocation index suffix (ex. `[0]`), rather than the exact source ID.
Read the `PerAlloc` field when making the volume claim at the client to determine if the allocation index suffix (ex. `[0]`) should be added to the volume source ID.
ac9a9ed
to
3b1d19c
Compare
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #7877
Adds a
PerAlloc
field to volume requests that directs the scheduler to test feasibility for volumes with a source ID that includes the allocation index suffix (ex.[0]
), rather than the exact source ID. This suffix is also being added at the client when the volume claim RPCs are sent.Reviewer notes:
This PR currently depends on and includes 97bb912 from CSI: remove prefix matching from CSIVolumeByID and fix CLI prefix matching #10158. Once that's merged, d44d62e should be dropped from this PR.Done