-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Security: Remove the ErrorSlot from the heartbeat and connection tasks #3263
Comments
redistributing issues that can be separately worked from Epic #2311 |
Hey team! Please add your planning poker estimate with ZenHub @conradoplg @dconnolly @gustavovalverde @jvff @oxarbitrage @teor2345 @upbqdn |
This cleanup isn't needed any time soon. |
I got a panic in this code, so I'm re-opening this ticket. I'll add the panic info to the ticket, and the details in a comment. |
Here is the panic I saw recently:
It looks like it might have been triggered by switching to another chain fork. Here are the detailed logs:
|
ticket to be split and re-estimated. |
I think I might have found another instance of this bug, but it results in a syncer hang, not a panic. The logs look like:
Here are the full logs from the start of the hang:
|
I saw another instance of the same panic today. There was no chain fork, so it seems like it is just caused by inbound service overload. |
I'm seeing panics in this code every few days now. |
Motivation
The error handling for the
Connection
and heartbeat tasks is really messy. It has been really buggy in the past, and it's hard to understand or modify. The way the code is written can cause hangs or panics.At the moment, it's causing a "failed servers must set their error slot" panic when Zebra is overloaded.
Scheduling
This is a denial of service risk which might be triggerable remotely.
This is high-risk hang code with multiple mutexes, so it should be prioritised over other potential hang or panic bugs.
Sub-Tasks
High-priority ongoing connection panics:
Client
#4733Connection
methods to theClient
#4734ErrorSlot
type #4735Other panics that are very rare:
MustUseOneshotSender
, send an error instead #4736Cleanups:
This will give us a much stronger guarantee that the connection task will exit when there's an error.
Related Work
The text was updated successfully, but these errors were encountered: