Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Passing PCI device information from host to VM is limited and messy #854

Closed
dgibson opened this issue Oct 7, 2020 · 5 comments
Closed
Assignees
Labels
bug Incorrect behaviour needs-forward-port Changes need to be applied to a newer branch / repository needs-review Needs to be assessed by the team. no-backport-needed Changed do not need to be applied to an older branch / repository

Comments

@dgibson
Copy link
Contributor

dgibson commented Oct 7, 2020

Get your issue reviewed faster

This is a code structure and extensibility issue, rather than a problem with behaviour (for now).

Description of problem

Implementing some sorts of container devices requires hotplugging PCI devices to the Kata VM, then wiring it up within the VM. This requires the runtime and agent to collaborate, and therefore requires communication to be passed about the devices.

To do that we need a way of referring to devices that is meaningful to both the host and the VM.

  • Normal DDDD:BB:DD.F PCI addresses work for the guest, but are no good in the host, since BB is allocated by guest side software (either firmware or kernel), and doesn't even have to remain the same for the VM's lifetime. DDDD may or may not make sense to the host depending on platform.
  • Hypervisor IDs (e.g. qemu's id property) work for the host, but are meaningless to the guest, in addition to the fact that their format depends on the specific hypervisor in use.
  • PCI vendor:device tuples only describe the device type, not the specific instance of the device.

So, none of those options work. The only thing we can really use is what I'm going to call a "PCI path", that is we give the slot & function numbers of the bridge on the root bus under which the device lives, followed by those for the next bridge to traverse and so on down to the slot & function of the device itself. Slot & function numbers have meaning in the PCI spec, and so are stable and well defined from both host and guest perspectives.

We already effectively use this in several places, however it's just described as bridgeSlot/deviceSlot so we don't handle:

  • Devices plugged directly into the root bus (no bridge)
  • Devices accessed via more than one bridge
  • Devices that occupy a non-zero function, or are accessed via any bridge using a non-zero function.

Expected result

Uniform handling of PCI paths, which will can address any PCI device with any bridge toplogy. Consistent naming in the code to make it clear what we're dealing with (calling paths "PCI address" is very misleading, since that usually refers to a DDDD:BB:DD.F style address).

Actual result

A bunch of ad-hoc cases.

Further information

For the purposes here, plain PCI to PCI bridges, PCI-E to PCI bridges, PCI-E root ports and PCI-E switches can all be handled uniformly as varieties of logical PCI to PCI bridges (or a collection of P2P bridges in the case of a PCI-E switch).

cloud-hypervisor purports to return the guest PCI address (BDF) for devices that are hotplugged. As far as I can tell, this is a design error in the clh API, and only works by accident because clh doesn't use any PCI to PCI bridges (including root ports).

@dgibson dgibson added bug Incorrect behaviour needs-review Needs to be assessed by the team. needs-forward-port Changes need to be applied to a newer branch / repository no-backport-needed Changed do not need to be applied to an older branch / repository labels Oct 7, 2020
@dgibson dgibson self-assigned this Oct 7, 2020
@c3d
Copy link
Member

c3d commented Oct 7, 2020

This looks good, but I have a question about this:

or a collection of P2P bridges in the case of a PCI-E switch

What bridge name would appear for these "virtual P2P bridges"? Can we guarantee that would be stable? (I assume you might use a pair of ports?

@dgibson
Copy link
Contributor Author

dgibson commented Oct 7, 2020

@c3d I don't know what you mean by "bridge name".

@likebreath
Copy link
Contributor

@dgibson Thank you for the detailed write-up and taking efforts to fix the issue.

/cc @sboeuf @amshinde PTAL.

@c3d
Copy link
Member

c3d commented Oct 9, 2020

@c3d I don't know what you mean by "bridge name".

Sorry, I mean the bridgeSlot part in the code.

@dgibson
Copy link
Contributor Author

dgibson commented Oct 10, 2020

The bridgeSlot is just a PCI slot number (aka device number). Virtual bridges have them, just the same as physical ones, and since they're used in "over the wire" transactions, yes, they're stable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Incorrect behaviour needs-forward-port Changes need to be applied to a newer branch / repository needs-review Needs to be assessed by the team. no-backport-needed Changed do not need to be applied to an older branch / repository
Projects
None yet
Development

No branches or pull requests

3 participants