All Linux user space applications use netlink messages, either directly or indirectly, to program the kernel networking stack. Netlink provides fine-grained control over all aspects of kernel networking and represents the scope of features required by OPI.
Netlink is the configuration interface of the kernel networking stack. All user space applications use AF_NETLINK socket message to configure the kernel networking state and to receive notifications of changes made by other applications.
In the classical hardware offload approach, the driver must implement all offloads via the
ndo_setup_tc
hook in struct net_device_ops
, defined in include/linux/netdevice.h.
DPU hardware could offload more than the existing capabilities that are supported by the
ndo_setup_tc
hook. For example routing tables could be mirrored to the DPU to allow for
hardware-accelerated route based forwarding. This could be done by using a netlink listener
consuming netlink notifications and using an out-of-band mechanism to program the equivalent
state on the DPU hardware.
It might be possible to use BPF struct_ops
to provide a way to extend offload capabilities
without driver development work.
This diagram shows roughly what Anton has prototyped for switchdev notifier offloads.
Each of the switchdev notifier types could be handled by:
- A driver dispatching them to a registered BPF program
- The BPF program sending them on a ringbuf to userspace
enum switchdev_notifier_type {
SWITCHDEV_FDB_ADD_TO_BRIDGE = 1,
SWITCHDEV_FDB_DEL_TO_BRIDGE,
SWITCHDEV_FDB_ADD_TO_DEVICE,
SWITCHDEV_FDB_DEL_TO_DEVICE,
SWITCHDEV_FDB_OFFLOADED,
SWITCHDEV_FDB_FLUSH_TO_BRIDGE,
SWITCHDEV_PORT_OBJ_ADD, /* Blocking. */
SWITCHDEV_PORT_OBJ_DEL, /* Blocking. */
SWITCHDEV_PORT_ATTR_SET, /* May be blocking . */
SWITCHDEV_VXLAN_FDB_ADD_TO_BRIDGE,
SWITCHDEV_VXLAN_FDB_DEL_TO_BRIDGE,
SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE,
SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE,
SWITCHDEV_VXLAN_FDB_OFFLOADED,
SWITCHDEV_BRPORT_OFFLOADED,
SWITCHDEV_BRPORT_UNOFFLOADED,
};
The same mechanism used for switchdev FDB offloads could also be used for routing FIB offloads.
enum fib_event_type {
FIB_EVENT_ENTRY_REPLACE,
FIB_EVENT_ENTRY_APPEND,
FIB_EVENT_ENTRY_ADD,
FIB_EVENT_ENTRY_DEL,
FIB_EVENT_RULE_ADD,
FIB_EVENT_RULE_DEL,
FIB_EVENT_NH_ADD,
FIB_EVENT_NH_DEL,
FIB_EVENT_VIF_ADD,
FIB_EVENT_VIF_DEL,
};
It is necessary to be able to collect statistics for offloaded flows. This is required by e.g. ovs to ensure active flows do not get deleted. A driver typically provides stats on demand via the flow offload infrastructure which uses the `FLOW_CLS_STATS` command to request the driver to provide the latest hardware stats.
BPF could provide the “glue” code between stats requests and out-of-band stats collection. A user space application routinely collects stats from the hardware and writes the latest stats into BPF maps. When stats are requested, a BPF program can be used to populate a preallocated stats struct from the data present in the BPF maps.