Skip to content

Latest commit

 

History

History
297 lines (244 loc) · 9.28 KB

offload-mechanisms.org

File metadata and controls

297 lines (244 loc) · 9.28 KB

Exploring different offload mechanisms

Netlink mirroring

All Linux user space applications use netlink messages, either directly or indirectly, to program the kernel networking stack. Netlink provides fine-grained control over all aspects of kernel networking and represents the scope of features required by OPI.

Netlink is the configuration interface of the kernel networking stack. All user space applications use AF_NETLINK socket message to configure the kernel networking state and to receive notifications of changes made by other applications.

images/netlink.png

Classic hardware offload

In the classical hardware offload approach, the driver must implement all offloads via the ndo_setup_tc hook in struct net_device_ops, defined in include/linux/netdevice.h.

images/classic_offload.png

Mirroring kernel networking state

DPU hardware could offload more than the existing capabilities that are supported by the ndo_setup_tc hook. For example routing tables could be mirrored to the DPU to allow for hardware-accelerated route based forwarding. This could be done by using a netlink listener consuming netlink notifications and using an out-of-band mechanism to program the equivalent state on the DPU hardware.

images/mirror_state.png

Extending offloads with BPF

It might be possible to use BPF struct_ops to provide a way to extend offload capabilities without driver development work.

images/bpf_enablement.png

Switchdev offloads with BPF

This diagram shows roughly what Anton has prototyped for switchdev notifier offloads.

Each of the switchdev notifier types could be handled by:

  1. A driver dispatching them to a registered BPF program
  2. The BPF program sending them on a ringbuf to userspace
enum switchdev_notifier_type {
	SWITCHDEV_FDB_ADD_TO_BRIDGE = 1,
	SWITCHDEV_FDB_DEL_TO_BRIDGE,
	SWITCHDEV_FDB_ADD_TO_DEVICE,
	SWITCHDEV_FDB_DEL_TO_DEVICE,
	SWITCHDEV_FDB_OFFLOADED,
	SWITCHDEV_FDB_FLUSH_TO_BRIDGE,

	SWITCHDEV_PORT_OBJ_ADD, /* Blocking. */
	SWITCHDEV_PORT_OBJ_DEL, /* Blocking. */
	SWITCHDEV_PORT_ATTR_SET, /* May be blocking . */

	SWITCHDEV_VXLAN_FDB_ADD_TO_BRIDGE,
	SWITCHDEV_VXLAN_FDB_DEL_TO_BRIDGE,
	SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE,
	SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE,
	SWITCHDEV_VXLAN_FDB_OFFLOADED,

	SWITCHDEV_BRPORT_OFFLOADED,
	SWITCHDEV_BRPORT_UNOFFLOADED,
};

images/switchdev_offload.png

FIB offloads

The same mechanism used for switchdev FDB offloads could also be used for routing FIB offloads.

enum fib_event_type {
	FIB_EVENT_ENTRY_REPLACE,
	FIB_EVENT_ENTRY_APPEND,
	FIB_EVENT_ENTRY_ADD,
	FIB_EVENT_ENTRY_DEL,
	FIB_EVENT_RULE_ADD,
	FIB_EVENT_RULE_DEL,
	FIB_EVENT_NH_ADD,
	FIB_EVENT_NH_DEL,
	FIB_EVENT_VIF_ADD,
	FIB_EVENT_VIF_DEL,
};

Stats collection with BPF

It is necessary to be able to collect statistics for offloaded flows. This is required by e.g. ovs to ensure active flows do not get deleted. A driver typically provides stats on demand via the flow offload infrastructure which uses the `FLOW_CLS_STATS` command to request the driver to provide the latest hardware stats.

BPF could provide the “glue” code between stats requests and out-of-band stats collection. A user space application routinely collects stats from the hardware and writes the latest stats into BPF maps. When stats are requested, a BPF program can be used to populate a preallocated stats struct from the data present in the BPF maps.

images/stats_offload.png