-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meta-lxatac-bsp: lxatac-net-switch-hacks: workarounds for connection losses #216
Conversation
I've marked @marckleinebudde, who has investigated the issue and came up with the workarounds, and @jluebbe, who has Yocto expertise as reviewers to review content (@marckleinebudde) and implementation (@jluebbe). |
meta-lxatac-bsp/recipes-core/lxatac-net-switch-hacks/files/spi-irq-prio-44009000.service
Outdated
Show resolved
Hide resolved
meta-lxatac-bsp/recipes-core/lxatac-net-switch-hacks/lxatac-net-switch-hacks.bb
Outdated
Show resolved
Hide resolved
meta-lxatac-bsp/recipes-core/lxatac-net-switch-hacks/lxatac-net-switch-hacks.bb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
When experiencing load the kernels default min_free_kbytes (of around 2M) seem to little. Hot paths can run out of memory. Increasing the limit to 8M seems to mitigate the problem. This manifests in issues in communicating with the ethernet switch under high loads, resulting in network connection losses. This is only fighting symptoms of an underlying issue, which why it is marked as a hack. Signed-off-by: Leonard Göhrs <[email protected]>
…hread When the system is under high load some SPI transfers with the ethernet switch will time out before they are handled. Increase the priority of the kernel thread that handles the SPI transfer to work around the issue. It does not make a lot of sense for a SPI transfer, that is 100% under the hosts control (it does not and can not wait for the device for example) to time out in the first place. This means we are only fighting symptoms here, which is why this change is also marked as a hack. Signed-off-by: Leonard Göhrs <[email protected]>
For completion sake here is the kernel log of the issue this PR aims to work around:
|
This PR adds two workarounds that should make the network connection more reliable under high loads:
Increase atomic memory pool size
When experiencing load the kernels default
min_free_kbytes
(of around 2M) seem to little. Hot paths can run out of memory.Increasing the limit to 8M seems to mitigate the problem.
This manifests in issues in communicating with the ethernet switch under high loads, resulting in network connection losses.
This is only fighting symptoms of an underlying issue, which why it is marked as a hack.
Increase SPI kernel thread priority
When the system is under high load some SPI transfers with the ethernet switch will time out before they are handled.
Increase the priority of the kernel thread that handles the SPI transfer to work around the issue.
It does not make a lot of sense for a SPI transfer, that is 100% under the hosts control (it does not and can not wait for the device for example) to time out in the first place.
This means we are only fighting symptoms here, which is why this change is also marked as a hack.