Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEFAULT_RCVBUF too small when querying interfaces with more than 108 VFs #813

Closed
gibizer opened this issue Jun 3, 2021 · 0 comments · Fixed by #814
Closed

DEFAULT_RCVBUF too small when querying interfaces with more than 108 VFs #813

gibizer opened this issue Jun 3, 2021 · 0 comments · Fixed by #814

Comments

@gibizer
Copy link
Contributor

gibizer commented Jun 3, 2021

Hi,

Similar to #751 if the number of VFs are increased over 108 then the buffer in pyroute2 becomes too small.

<180>2021-05-31T16:35:00.707793+02:00 compute-0-4 neutron-sriov-nic-agent[33389]: 2021-05-31 16:35:00.706 455 WARNING pyroute2.netlink [-] Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/pyroute2/netlink/init.py", line 1345, in _ft_decode_generic
self.decode_nlas(offset)
File "/usr/lib/python3.6/site-packages/pyroute2/netlink/init.py", line 1470, in decode_nlas
offset)
struct.error: unpack_from requires a buffer of at least 4 bytes
<179>2021-05-31T16:35:00.731008+02:00 compute-0-4 neutron-sriov-nic-agent[33389]: 2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-34c81c8f-7a7a-4a75-8ce9-c2b91cfdba21
] Error in agent loop. Devices info: {}: TypeError: can not serialize 'error' object
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last):
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 472, in daemon_loop
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy)
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent result = f(*args, **kwargs)
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 242, in scan_devices
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info()
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_device
s_info
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info():
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_device
s_info
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent mac = self.get_pci_device(pci_slot)
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent macs = self.pci_dev_wrapper.get_assigned_macs([vf_index])
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent vfs = ip.link.get_vfs()
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return privileged.get_link_vfs(self.name, self._parent.namespace)
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return self.channel.remote_call(name, args, kwargs)
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent raise exc_type(*result[2])
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: can not serialize 'error' object
2021-05-31 16:35:00.707 6 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent

Doubling the DEFAULT_RCVBUF in pyroute2 solves the issue in our environment with 127 VFs.

elajkat added a commit to elajkat/pyroute2 that referenced this issue Oct 18, 2022
NetlinkSocketBase.recv uses DEFAULT_RCVBUF to fetch things like ip links
via netlink sockets. It happed in the last cycles that with new NICs
with higher VF numbers or similar we have to change the buffer size
(see svinota#751 or svinota#813).
A better solution (thanks to one or our experts) can be to check the
necessary buffer size before receiving the sockets.

Fixes svinota#1044

Change-Id: I87c75e4d424653e5a29408b3ac2ba8504cb2db49
elajkat added a commit to elajkat/pyroute2 that referenced this issue Oct 19, 2022
NetlinkSocketBase.recv uses DEFAULT_RCVBUF to fetch things like ip links
via netlink sockets. It happed in the last cycles that with new NICs
with higher VF numbers or similar we have to change the buffer size
(see svinota#751 or svinota#813).
A better solution (thanks to one or our experts) can be to check the
necessary buffer size before receiving the sockets.

Fixes svinota#1044

Change-Id: Ide711f27c99e4dfb75fb579f10f005cb8e1b9b37
svinota added a commit that referenced this issue Oct 19, 2022
Add _peek_bufsize to NetlinkSocketBase so recv can use it

Bug-Url: #1045
Bug-Url: #1044
Bug-Url: #751
Bug-Url: #813
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant