Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up PCI path handling #1190

Merged
merged 13 commits into from
Feb 19, 2021
Merged

Clean up PCI path handling #1190

merged 13 commits into from
Feb 19, 2021

Conversation

dgibson
Copy link
Contributor

@dgibson dgibson commented Dec 11, 2020

Consistent and safer handling, forward port of kata-containers/agent#855 and kata-containers/runtime#3003.

@dgibson dgibson added wip Work in Progress (PR incomplete - needs more work or rework) no-backport-needed forward-port Change from an older branch / repository labels Dec 11, 2020
@dgibson dgibson self-assigned this Dec 11, 2020
@dgibson dgibson marked this pull request as draft December 11, 2020 04:45
@dgibson
Copy link
Contributor Author

dgibson commented Dec 15, 2020

/test-vfio

@dgibson dgibson force-pushed the pcipath branch 2 times, most recently from 254580d to 771a3ea Compare December 17, 2020 04:44
@dgibson
Copy link
Contributor Author

dgibson commented Dec 17, 2020

/test

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dgibson! A few comments...

src/agent/src/device.rs Outdated Show resolved Hide resolved
src/agent/src/device.rs Outdated Show resolved Hide resolved
src/agent/src/device.rs Outdated Show resolved Hide resolved
src/agent/src/pci.rs Outdated Show resolved Hide resolved
@dgibson dgibson linked an issue Jan 14, 2021 that may be closed by this pull request
@dgibson
Copy link
Contributor Author

dgibson commented Jan 14, 2021

/test

@dgibson
Copy link
Contributor Author

dgibson commented Jan 14, 2021

/test

@dgibson
Copy link
Contributor Author

dgibson commented Jan 14, 2021

/retest

1 similar comment
@dgibson
Copy link
Contributor Author

dgibson commented Jan 15, 2021

/retest

@dgibson
Copy link
Contributor Author

dgibson commented Jan 15, 2021

/test

@dgibson dgibson force-pushed the pcipath branch 3 times, most recently from 3ca2b1b to 7f06293 Compare January 20, 2021 04:46
@dgibson
Copy link
Contributor Author

dgibson commented Jan 20, 2021

/test

@dgibson
Copy link
Contributor Author

dgibson commented Jan 20, 2021

/retest

@dgibson
Copy link
Contributor Author

dgibson commented Jan 21, 2021

/test

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dgibson.

src/agent/src/pci.rs Outdated Show resolved Hide resolved
src/runtime/virtcontainers/pkg/types/pcipath.go Outdated Show resolved Hide resolved
get_pci_device_address() has pretty confusing semantics.  Both its input
and output are in other parts of the code described as a "PCI address", but
neither is *actually* a PCI address (in the standard DDDD:BB:DD.F format).

What it's really about is resolving a "PCI path" - that is way to locate a
PCI device by using it's slot number and the slot number of the bridge
leading to it - into a sysfs path.

Rename the function, and change a bunch of variable names to make those
semantics clearer.

Forward port of
kata-containers/agent@0eb612f06484

Signed-off-by: David Gibson <[email protected]>
Currently pcipath_to_sysfs(), which translates PCI paths into sysfs paths
accepts only pci paths with exactly 2 components; which represents PCI
devices separated from the root bus by exactly one PCI to PCI bridge (which
could be a virtual P2P bridge, such as a PCI-E root port).

There are cases we might reasonably want to support which have devices
either plugged directly into the root bus (zero bridges), or under
multiple layers of P2P bridge (a PCI-E switch would require at least 2
layers).

So, generalize pcipath_to_sysfs to support any number of components in the
PCI path.  We also make it use the new type for PCI paths internally rather
than plain strings.

This is a loose forward port of
kata-containers/agent@9804b1e

fixes kata-containers#1040

Signed-off-by: David Gibson <[email protected]>
pcipath_to_sysfs takes a PCI path, with a particular format.  A number of
places implicitly need strings in that format, many of them repeat the
description.  To make things safer and briefer use the pci::Path type for
the purpose more widely, and just describe the string formatting of it at
the type definition.

Then, update variable names and comments throughout to call things in
this format "PCI path", rather than "PCI identifier", which is vague,
or "PCI address" which is just plain wrong.  Likewise we change names and
comments which incorrectly refer to sysfs paths as a "PCI address".

This changes the grpc proto definitions, but because it's just
changing the name of a field without changing the field number, it
shouldn't change the actual protocol.

A loose forward port of
kata-containers/agent@da4bc1d

Signed-off-by: David Gibson <[email protected]>
Currently pcipath_to_sysfs() generates the path to the root bus node in
sysfs via create_pci_root_bus_path().  This is inconvenient for testing,
though, so instead make it take this as a parameter and generate the path
in the (single) caller.  As a bonus this will make life a bit easier when
we want to support machines with multiple PCI roots.

Signed-off-by: David Gibson <[email protected]>
Port this test from the Kata 1 Go agent to the Kata 2 Rust agent.

Signed-off-by: David Gibson <[email protected]>
The "PCI address" returned by Endpoint::PciPath() isn't actually a PCI
address (DDDD:BB:DD.F), but rather a PCI path.  Rename and use the
PciPath type to clean this up and the various parts of the network
code connected to it.

Forward port of
kata-containers/runtime@3e58971

fixes kata-containers#1040

Signed-off-by: David Gibson <[email protected]>
BlockDrive::PCIAddr doesn't actually store a PCI address
(DDDD:BB:DD.F) but a PCI path.  Use the PciPath type and rename things
to make that clearer.

TestHandleBlockVolume() previously used a bizarre value "0002:01" for
the "PCI address" which was neither an actual PCI address, nor a PCI
path.  Update it to use a PCI path - the actual value appears not to
matter in this test, as long as its consistent throughout.

Forward port of
kata-containers/runtime@64751f3

fixes kata-containers#1040

Signed-off-by: David Gibson <[email protected]>
VhostUserDeviceAttrs::PCIAddr didn't actually store a PCI address
(DDDD:BB:DD.F), but rather a PCI path.  Use the PciPath type and
rename things to make that clearer.

TestHandleBlockVolume previously used the bizarre value "0001:01"
which is neither a PCI address nor a PCI path for this value.  Change
it to a valid PCI path - it appears the actual value didn't matter for
that test, as long as it was consistent.

Forward port of
kata-containers/runtime@3596058

fixes kata-containers#1040

Signed-off-by: David Gibson <[email protected]>
@dgibson
Copy link
Contributor Author

dgibson commented Feb 18, 2021

/test

@dgibson
Copy link
Contributor Author

dgibson commented Feb 18, 2021

/test-vfio

@dgibson dgibson merged commit a060b9a into kata-containers:main Feb 19, 2021
@dgibson dgibson deleted the pcipath branch February 19, 2021 01:23
dgibson added a commit to dgibson/kata-containers that referenced this pull request Mar 12, 2021
Currently runtime and agent special case virtio-blk devices under clh,
ostensibly because the PCI address information is not available in that
case.

In fact, cloud-hypervisor's VmAddDiskPut API does return a PciDeviceInfo,
which includes a PCI address.  That API is broken, because PCI addressing
depends on guest (firmware or OS) actions that the hypervisor won't know
about.  clh only gets away with this because it only uses a single PCI root
and never uses PCI bridges, in which case the guest addresses are
accurately predictable: they always have domain and bus zero.

Until kata-containers#1190, Kata
couldn't handle PCI addressing unless there was exactly one bridge, which
might be why this was actually special-cased for clh.

With kata-containers#1190 merged, we can handle more general PCI paths, and we can derive
a trivial (one element) PCI path from the information that the clh API
gives us.  We can use that to remove this special case.

fixes kata-containers#1431

Signed-off-by: David Gibson <[email protected]>
dgibson added a commit to dgibson/kata-containers that referenced this pull request Mar 24, 2021
Currently runtime and agent special case virtio-blk devices under clh,
ostensibly because the PCI address information is not available in that
case.

In fact, cloud-hypervisor's VmAddDiskPut API does return a PciDeviceInfo,
which includes a PCI address.  That API is broken, because PCI addressing
depends on guest (firmware or OS) actions that the hypervisor won't know
about.  clh only gets away with this because it only uses a single PCI root
and never uses PCI bridges, in which case the guest addresses are
accurately predictable: they always have domain and bus zero.

Until kata-containers#1190, Kata
couldn't handle PCI addressing unless there was exactly one bridge, which
might be why this was actually special-cased for clh.

With kata-containers#1190 merged, we can handle more general PCI paths, and we can derive
a trivial (one element) PCI path from the information that the clh API
gives us.  We can use that to remove this special case.

fixes kata-containers#1431

Signed-off-by: David Gibson <[email protected]>
dgibson added a commit to dgibson/kata-containers that referenced this pull request Apr 9, 2021
Currently runtime and agent special case virtio-blk devices under clh,
ostensibly because the PCI address information is not available in that
case.

In fact, cloud-hypervisor's VmAddDiskPut API does return a PciDeviceInfo,
which includes a PCI address.  That API is broken, because PCI addressing
depends on guest (firmware or OS) actions that the hypervisor won't know
about.  clh only gets away with this because it only uses a single PCI root
and never uses PCI bridges, in which case the guest addresses are
accurately predictable: they always have domain and bus zero.

Until kata-containers#1190, Kata
couldn't handle PCI addressing unless there was exactly one bridge, which
might be why this was actually special-cased for clh.

With kata-containers#1190 merged, we can handle more general PCI paths, and we can derive
a trivial (one element) PCI path from the information that the clh API
gives us.  We can use that to remove this special case.

fixes kata-containers#1431

Signed-off-by: David Gibson <[email protected]>
dgibson added a commit to dgibson/kata-containers that referenced this pull request Apr 13, 2021
Currently runtime and agent special case virtio-blk devices under clh,
ostensibly because the PCI address information is not available in that
case.

In fact, cloud-hypervisor's VmAddDiskPut API does return a PciDeviceInfo,
which includes a PCI address.  That API is broken, because PCI addressing
depends on guest (firmware or OS) actions that the hypervisor won't know
about.  clh only gets away with this because it only uses a single PCI root
and never uses PCI bridges, in which case the guest addresses are
accurately predictable: they always have domain and bus zero.

Until kata-containers#1190, Kata
couldn't handle PCI addressing unless there was exactly one bridge, which
might be why this was actually special-cased for clh.

With kata-containers#1190 merged, we can handle more general PCI paths, and we can derive
a trivial (one element) PCI path from the information that the clh API
gives us.  We can use that to remove this special case.

fixes kata-containers#1431

Signed-off-by: David Gibson <[email protected]>
dgibson added a commit to dgibson/kata-containers that referenced this pull request Apr 13, 2021
Currently runtime and agent special case virtio-blk devices under clh,
ostensibly because the PCI address information is not available in that
case.

In fact, cloud-hypervisor's VmAddDiskPut API does return a PciDeviceInfo,
which includes a PCI address.  That API is broken, because PCI addressing
depends on guest (firmware or OS) actions that the hypervisor won't know
about.  clh only gets away with this because it only uses a single PCI root
and never uses PCI bridges, in which case the guest addresses are
accurately predictable: they always have domain and bus zero.

Until kata-containers#1190, Kata
couldn't handle PCI addressing unless there was exactly one bridge, which
might be why this was actually special-cased for clh.

With kata-containers#1190 merged, we can handle more general PCI paths, and we can derive
a trivial (one element) PCI path from the information that the clh API
gives us.  We can use that to remove this special case.

fixes kata-containers#1431

Signed-off-by: David Gibson <[email protected]>
Jakob-Naucke added a commit to Jakob-Naucke/kata-containers that referenced this pull request Mar 8, 2022
Initial VFIO-AP support (kata-containers#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (kata-containers#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and information about the enabled APQNs is
included.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: kata-containers#3678
Signed-off-by: Jakob Naucke <[email protected]>
Jakob-Naucke added a commit to Jakob-Naucke/kata-containers that referenced this pull request Mar 11, 2022
Initial VFIO-AP support (kata-containers#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (kata-containers#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and information about the enabled APQNs is
included.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: kata-containers#3678
Signed-off-by: Jakob Naucke <[email protected]>
Jakob-Naucke added a commit to Jakob-Naucke/kata-containers that referenced this pull request Mar 17, 2022
Initial VFIO-AP support (kata-containers#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (kata-containers#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and information about the enabled APQNs is
included.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: kata-containers#3678
Signed-off-by: Jakob Naucke <[email protected]>
Jakob-Naucke added a commit to Jakob-Naucke/kata-containers that referenced this pull request Mar 24, 2022
Initial VFIO-AP support (kata-containers#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (kata-containers#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: kata-containers#3678
Signed-off-by: Jakob Naucke <[email protected]>
Jakob-Naucke added a commit to Jakob-Naucke/kata-containers that referenced this pull request Mar 28, 2022
Initial VFIO-AP support (kata-containers#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (kata-containers#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: kata-containers#3678
Signed-off-by: Jakob Naucke <[email protected]>
Jakob-Naucke added a commit to Jakob-Naucke/kata-containers that referenced this pull request Mar 29, 2022
Initial VFIO-AP support (kata-containers#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (kata-containers#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: kata-containers#3678
Signed-off-by: Jakob Naucke <[email protected]>
BbolroC pushed a commit to BbolroC/kata-containers that referenced this pull request Jul 4, 2022
Initial VFIO-AP support (kata-containers#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (kata-containers#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: kata-containers#3678
Signed-off-by: Jakob Naucke <[email protected]>
BbolroC pushed a commit to BbolroC/kata-containers that referenced this pull request Jul 4, 2022
Initial VFIO-AP support (kata-containers#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (kata-containers#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: kata-containers#3678
Signed-off-by: Jakob Naucke <[email protected]>
BbolroC pushed a commit to BbolroC/kata-containers that referenced this pull request Oct 20, 2022
Initial VFIO-AP support (kata-containers#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (kata-containers#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: kata-containers#3678
Signed-off-by: Jakob Naucke <[email protected]>
BbolroC pushed a commit to BbolroC/kata-containers that referenced this pull request Feb 24, 2023
Initial VFIO-AP support (kata-containers#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (kata-containers#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: kata-containers#3678
Signed-off-by: Jakob Naucke <[email protected]>
BbolroC pushed a commit to BbolroC/kata-containers that referenced this pull request Mar 16, 2023
Initial VFIO-AP support (kata-containers#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (kata-containers#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: kata-containers#3678
Signed-off-by: Jakob Naucke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forward-port Change from an older branch / repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up handling of PCI paths
6 participants