Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Fix potential major:minor conflicts during device updates #836

Merged
merged 2 commits into from
Sep 22, 2020
Merged

Fix potential major:minor conflicts during device updates #836

merged 2 commits into from
Sep 22, 2020

Conversation

dgibson
Copy link
Contributor

@dgibson dgibson commented Sep 10, 2020

The way the agent updates device and resource entries one by one for guest versus host information can lead to it incorrectly updating devices because of major:minor collisions, either between host and guest numbering (issue #834) or between block and character devices (issue #835).

This PR fixes both of them.

@codecov
Copy link

codecov bot commented Sep 10, 2020

Codecov Report

Merging #836 into master will increase coverage by 0.22%.
The diff coverage is 95.74%.

@@            Coverage Diff             @@
##           master     #836      +/-   ##
==========================================
+ Coverage   60.08%   60.30%   +0.22%     
==========================================
  Files          17       17              
  Lines        2668     2678      +10     
==========================================
+ Hits         1603     1615      +12     
+ Misses        902      901       -1     
+ Partials      163      162       -1     

Copy link
Member

@c3d c3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an improvement, but it looks like some of the mapping might come from the spec itself (as indicated by the predicted by the caller comment in the code. How does your approach address that?

idxData, ok := devIdx[device.ContainerPath]
if !ok {
return grpcStatus.Errorf(codes.Internal,
"Should have found a matching device %s in the spec",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a "device matching %s"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh? %s is filled in with ContainerPath here. This message hasn't actually changed, just moved a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the grammar. It looks grammatically wrong to me. I would find it clearer if it said something like Should have found a guest device matching host device /dev/foo rather than Should have found a matching device /dev/foo.

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel
Copy link
Contributor

/test-ubuntu

@lifupan
Copy link
Member

lifupan commented Sep 10, 2020

/test

Copy link

@devimc devimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! - lgtm - thanks @dgibson

@dgibson
Copy link
Contributor Author

dgibson commented Sep 10, 2020

I think this is an improvement, but it looks like some of the mapping might come from the spec itself (as indicated by the predicted by the caller comment in the code. How does your approach address that?

Sorry, I don't understand the question.

@dgibson
Copy link
Contributor Author

dgibson commented Sep 10, 2020

Hrm. Several of the Jenkins CI tests are reporting failure, but as usual with Jenkins I can't for the life of me figure out where to look for useful information about the failures.

@dgibson
Copy link
Contributor Author

dgibson commented Sep 15, 2020

Only change in the latest push is a rebase.

@bergwolf
Copy link
Member

/test-ubuntu

Copy link
Member

@c3d c3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me with minor comments.

idxData, ok := devIdx[device.ContainerPath]
if !ok {
return grpcStatus.Errorf(codes.Internal,
"Should have found a matching device %s in the spec",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the grammar. It looks grammatically wrong to me. I would find it clearer if it said something like Should have found a guest device matching host device /dev/foo rather than Should have found a matching device /dev/foo.

device.go Outdated
spec.Linux.Devices[idxData.idx].Minor = minor

// there is no resource to update
if spec.Linux == nil || spec.Linux.Resources == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a condition on resourceIdx here, since your loop below depends on it. Given how devIdx is constructed, I think it is the only necessary condition, but I prefer to replicate the same conditions used in makeDevIndex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it makes sense to drop those conditions entirely - they're redundant with the checks in makeDevIndex.

We don't need a condition on resourceIdx: makeDevIndex creates an empty slice (rather than nil) if there is nothing matching. That's deliberate so that we don't need another special case here.

The agent needs to update device entries in the OCI spec so that it has the
correct major:minor numbers for the guest, which may differ from the host.

Entries in the main device list are looked up by device path, but entries
in the device resources list are looked up by (host) major:minor.  This is
done one device at a time, updating as we go in updateSpecDeviceList().

But since the host and guest have different namespaces, one device might
have the same major:minor as a different device on the host.  In that case
we could update one resource entry to the correct guest values, then
mistakenly update it again because it now matches a different host device.

To avoid this, rather than looking up and updating one by one, we make all
the lookups in advance, creating a map from (host) device path to the
indices in the spec where the device and resource entries can be found.

Fixes: #834

Signed-off-by: David Gibson <[email protected]>
To update device resource entries from host to guest, we search for the
right entry by host major:minor numbers, then later update it.  However
block and character devices exist in separate major:minor namespaces so
we could have one block and one character device with matching major:minor
and thus incorrectly update both with the details for whichever device is
processed second.

Add a check on device type to prevent this.

Fixes: #835

Signed-off-by: David Gibson <[email protected]>
@jodh-intel
Copy link
Contributor

/test-ubuntu

@bpradipt
Copy link
Contributor

@dgibson we'll need this for 2.0 as well ..

@jodh-intel
Copy link
Contributor

@bpradipt - yep, I've raised kata-containers/kata-containers#703 for this. Adding labels too...

@jodh-intel jodh-intel added needs-forward-port Changes need to be applied to a newer branch / repository needs-backport Changes need to be applied to an older branch / repository labels Sep 17, 2020
@jodh-intel
Copy link
Contributor

I think it needs to be backported to stable too.

@dgibson
Copy link
Contributor Author

dgibson commented Sep 18, 2020

@jodh-intel, for future reference, is there a way I can set the porting tags myself? I couldn't see how to do it.

@jodh-intel
Copy link
Contributor

@dgibson - labels are only visible if you are a "member" of the repo. I've sent you a few requests so if you accept you should be able to see them on the rhs of the page.

@jodh-intel
Copy link
Contributor

/test-ubuntu

@dgibson dgibson linked an issue Sep 18, 2020 that may be closed by this pull request
@dgibson
Copy link
Contributor Author

dgibson commented Sep 18, 2020

@dgibson - labels are only visible if you are a "member" of the repo. I've sent you a few requests so if you accept you should be able to see them on the rhs of the page.

Aha! Thanks.

@dgibson
Copy link
Contributor Author

dgibson commented Sep 22, 2020

@jodh-intel, it's not clear to me what needs to be done now to move this forward.

Of the two failing tests, the first I'm not sure why it is failing, since the porting labels are applied now.

The other one appears to be complaining that the memory footprint has increased - which makes sense since I've added a new data structure. I don't see any easy way to avoid that while fixing the bug though.

@jodh-intel
Copy link
Contributor

Hi @dgibson - well, we don't see the usual green merge button due to the failing porting check. However, that is failing due to #844 still not being landed. That said, all the other checks have passed (aside from the metrics which is a known issue), so... let's merge this!

@jodh-intel jodh-intel merged commit cfff0bb into kata-containers:master Sep 22, 2020
@dgibson dgibson deleted the bug834and835 branch September 22, 2020 10:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs-backport Changes need to be applied to an older branch / repository needs-forward-port Changes need to be applied to a newer branch / repository
Projects
None yet
8 participants