-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clearing packets apparently blocks Hermes from properly working #4071
Comments
Might be related: #2612 |
A first issue I have noticed with the RPC endpoint is that it doesn't seem to persist ABCI responses.
This is usually due to the CometBFT configuration And have you tried using a Node with more historical data? The one being used doesn't seem to have the packet data anymore: |
@ljoss17 most likely, will check. I do not have Osmosis archive node unfortunately, so I cannot check it with older heights, but I will check the |
It is a valid point, this log message can be confusing, thanks for the feedback! |
@ljoss17 did you also check the point about clear packets routine thing blocking the rest? This seems to be the biggest issue here, for me at least. |
I will look into that right now. Would you be able to share logs at |
@ljoss17 here you go, let me know if you need more.
|
Ok so it seems the |
@ljoss17 can packets clearing interfere with Hermes functioning properly? (my thought was they might submit transaction and mess with account sequence, but not sure if if it's the case or if there's more caveats) |
Clear on start will first finish trying to clear all packets, due to the high number of packets in your case the instance is "stuck" at the clearing phase and that is why it isn't relaying. Could you try adding the pending packets to the For interval clearing, if it happens concurrently to the relaying, one of the two tentatives to relay might result in redundant packet error (either the relaying is faster or the clearing) if there is a new packet when clearing is triggered. |
@ljoss17 one more thing I just discovered: so I used the same config as above, with trace logging, but without clear_on_start (but with clear_interval), and it seems to also not relay anything since the interval clearing was executed. Here's the same metric for the last 6 hours:
Here are the trace logs for the period since 16:30, hope it would be of any help: For |
Interesting, thank you very much for the additional information! |
@ljoss17 just to clarify: my biggest concern here is not the packets being not relayed correctly (this is likely either the node being misconfigured, or blocks being pruned), but rather Hermes not doing anything once the packets are being cleared. I think that might be disturbing if let's say there are indeed some packets being pending, a lot of them, so clearing would take some time, and during that time Hermes won't function properly. This specific case is likely something I can resolve; problem is once this happens again, it'll (apparently) block Hermes from doing anything else. |
I see, for the clear on start I will look into it to see if there is a way to cleanly split the clearing part and relaying part so that both can run concurrently without interference. If the clearing and standard relaying have a clear separation it will be easier to have them run concurrently. |
@ljoss17 okay so, the |
Summary of Bug
There might be multiple issues at once here, not sure whether it would be better to report these as a separate ones, but here we go.
So, after something happened (not sure what, but still), Hermes started to think there's more than 20k packets pending:
and it tries to clear these. Looking at the logs, this might take really long time (like more than 24h for sure), and during all this time, Hermes somehow isn't broadcasting any transactions (I can see it both from logs and from metrics). And apparently trying to clear the packets (both manually via
hermes clear packets --chain osmosis-1 --channel channel-0 --port transfer
and by waiting for Hermes main process packet clear routine to finish) doesn't work, as once it's done, Hermes runs the clean packets routine again, and everything is blocked again.Basically there are 3 issues here:
I am not sure what can cause it and what was changed so it started behaving that way, but it seems wrong.
I fixed it by disabling the check routine, but that seems like a workaround.
Here are Hermes logs:
And here's the minimal config I was able to reproduce it with:
Version
Steps to Reproduce
See above for the minimal config example and for logs.
Acceptance Criteria
For Admin Use
The text was updated successfully, but these errors were encountered: