-
Notifications
You must be signed in to change notification settings - Fork 113
Fix potential major:minor conflicts during device updates #836
Conversation
Codecov Report
@@ Coverage Diff @@
## master #836 +/- ##
==========================================
+ Coverage 60.08% 60.30% +0.22%
==========================================
Files 17 17
Lines 2668 2678 +10
==========================================
+ Hits 1603 1615 +12
+ Misses 902 901 -1
+ Partials 163 162 -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an improvement, but it looks like some of the mapping might come from the spec itself (as indicated by the predicted by the caller comment in the code. How does your approach address that?
idxData, ok := devIdx[device.ContainerPath] | ||
if !ok { | ||
return grpcStatus.Errorf(codes.Internal, | ||
"Should have found a matching device %s in the spec", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a "device matching %s"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh? %s is filled in with ContainerPath here. This message hasn't actually changed, just moved a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the grammar. It looks grammatically wrong to me. I would find it clearer if it said something like Should have found a guest device matching host device /dev/foo
rather than Should have found a matching device /dev/foo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - thanks @dgibson !
lgtm
This needs to be ported to 2.0 fwics (https://github.com/kata-containers/kata-containers/blob/2.0-dev/src/agent/src/device.rs#L197) so I've raised kata-containers/kata-containers#703.
/test-ubuntu |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! - lgtm - thanks @dgibson
Sorry, I don't understand the question. |
Hrm. Several of the Jenkins CI tests are reporting failure, but as usual with Jenkins I can't for the life of me figure out where to look for useful information about the failures. |
Only change in the latest push is a rebase. |
/test-ubuntu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me with minor comments.
idxData, ok := devIdx[device.ContainerPath] | ||
if !ok { | ||
return grpcStatus.Errorf(codes.Internal, | ||
"Should have found a matching device %s in the spec", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the grammar. It looks grammatically wrong to me. I would find it clearer if it said something like Should have found a guest device matching host device /dev/foo
rather than Should have found a matching device /dev/foo
.
device.go
Outdated
spec.Linux.Devices[idxData.idx].Minor = minor | ||
|
||
// there is no resource to update | ||
if spec.Linux == nil || spec.Linux.Resources == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a condition on resourceIdx
here, since your loop below depends on it. Given how devIdx
is constructed, I think it is the only necessary condition, but I prefer to replicate the same conditions used in makeDevIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think it makes sense to drop those conditions entirely - they're redundant with the checks in makeDevIndex.
We don't need a condition on resourceIdx: makeDevIndex creates an empty slice (rather than nil) if there is nothing matching. That's deliberate so that we don't need another special case here.
The agent needs to update device entries in the OCI spec so that it has the correct major:minor numbers for the guest, which may differ from the host. Entries in the main device list are looked up by device path, but entries in the device resources list are looked up by (host) major:minor. This is done one device at a time, updating as we go in updateSpecDeviceList(). But since the host and guest have different namespaces, one device might have the same major:minor as a different device on the host. In that case we could update one resource entry to the correct guest values, then mistakenly update it again because it now matches a different host device. To avoid this, rather than looking up and updating one by one, we make all the lookups in advance, creating a map from (host) device path to the indices in the spec where the device and resource entries can be found. Fixes: #834 Signed-off-by: David Gibson <[email protected]>
To update device resource entries from host to guest, we search for the right entry by host major:minor numbers, then later update it. However block and character devices exist in separate major:minor namespaces so we could have one block and one character device with matching major:minor and thus incorrectly update both with the details for whichever device is processed second. Add a check on device type to prevent this. Fixes: #835 Signed-off-by: David Gibson <[email protected]>
/test-ubuntu |
@dgibson we'll need this for 2.0 as well .. |
@bpradipt - yep, I've raised kata-containers/kata-containers#703 for this. Adding labels too... |
I think it needs to be backported to stable too. |
@jodh-intel, for future reference, is there a way I can set the porting tags myself? I couldn't see how to do it. |
@dgibson - labels are only visible if you are a "member" of the repo. I've sent you a few requests so if you accept you should be able to see them on the rhs of the page. |
/test-ubuntu |
Aha! Thanks. |
@jodh-intel, it's not clear to me what needs to be done now to move this forward. Of the two failing tests, the first I'm not sure why it is failing, since the porting labels are applied now. The other one appears to be complaining that the memory footprint has increased - which makes sense since I've added a new data structure. I don't see any easy way to avoid that while fixing the bug though. |
The way the agent updates device and resource entries one by one for guest versus host information can lead to it incorrectly updating devices because of major:minor collisions, either between host and guest numbering (issue #834) or between block and character devices (issue #835).
This PR fixes both of them.