You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
If a node receives a transaction on RPC port for a synced subnet that is currently without validators, it enters a faulty state because there are no validators to finalize a block. Node starts failing health check with: snowman consensus is not healthy reason: block processing time 11h5m29.98576991s > 30s
It also warns about the cause of the issue: validator set is empty
But once the validators have been re-added and subnet starts processing blocks again, node doesn't recover. Instead, it is forever stuck with the same failing heath check: snowman consensus is not healthy reason: block processing time 12h44m13.985749723s > 30s
The only workaround to resume normal operation is to restart the node, after which it correctly syncs to the chain and resumes processing transactions. Stuck block presumably gets dropped.
To Reproduce
Spin up a subnet, sync one non-validating node. Post a transaction to its RPC port. Node starts failing health check. Add a validator to the subnet. Observe that the failing node doesn't recover. Restart the failing node. Node comes up healthy.
Expected behavior
Once validators have been added (or re-added) to the subnet, the node should be able to recover to a healthy state and resume normal operation without any intervention. Not sure what would be the correct/desirable behaviour for transactions that were submitted while validators were absent. They could be dropped, or possibly re-entered into the mempool to be included in a future block. At a minimum, the behaviour that happens when stuck node is restarted can be reproduced. The main idea is for the nodes to be able to auto-recover, without external intervention (a restart).
The text was updated successfully, but these errors were encountered:
Describe the bug
If a node receives a transaction on RPC port for a synced subnet that is currently without validators, it enters a faulty state because there are no validators to finalize a block. Node starts failing health check with:
snowman consensus is not healthy reason: block processing time 11h5m29.98576991s > 30s
It also warns about the cause of the issue:
validator set is empty
But once the validators have been re-added and subnet starts processing blocks again, node doesn't recover. Instead, it is forever stuck with the same failing heath check:
snowman consensus is not healthy reason: block processing time 12h44m13.985749723s > 30s
The only workaround to resume normal operation is to restart the node, after which it correctly syncs to the chain and resumes processing transactions. Stuck block presumably gets dropped.
To Reproduce
Spin up a subnet, sync one non-validating node. Post a transaction to its RPC port. Node starts failing health check. Add a validator to the subnet. Observe that the failing node doesn't recover. Restart the failing node. Node comes up healthy.
Expected behavior
Once validators have been added (or re-added) to the subnet, the node should be able to recover to a healthy state and resume normal operation without any intervention. Not sure what would be the correct/desirable behaviour for transactions that were submitted while validators were absent. They could be dropped, or possibly re-entered into the mempool to be included in a future block. At a minimum, the behaviour that happens when stuck node is restarted can be reproduced. The main idea is for the nodes to be able to auto-recover, without external intervention (a restart).
The text was updated successfully, but these errors were encountered: