-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for incremental packet clearing #2167
Conversation
hi @adizere I read your PR, and the main change is that you divide the build packet data into multiple batches, each with a size of 50. I use my setup for testing. 10000 packets are generated at one tx. It can be seen from the log that the average cost of building packet data for 50 events is 2.5s(that is take much time) |
I try to add events store(for improve performance query_txs) to my branch , it clear 10000packet, only take 2min. |
Thanks for the feedback!
It's not clear if this is the expected behavior or it's unexpected (i.e., a problem). Could you clarify?
This is impressive! Do you use this approach in production? Also: how much space and time does it take exactly?
|
The process of clear packet is as follows: Events store takes 20MB to store 20000 packets(10000send_packet + 10000write_ack) your can use my branch for testing setup chain config is: max_msg_num=100
# enable event store
enable_event_store=true
# rpc poll interval for event store fetching events
event_store_rpc_poll_interval="500ms"
# wallet balance decimal place , use for wallet balances metrics
decimal_place=6 |
|
||
let (commit_sequences, _) = commitments_on_chain( | ||
path: &PathIdentifiers, | ||
) -> Result<(Vec<u64>, Height), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the doc comment for this method to detail the new return type? It would also be helpful if the doc comment explained why the Height
is returned in the Ok
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps even introduce a struct in place of the tuple, and a newtype for the u64
while we are at it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestions, thanks! I'm on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the doc comment for this method to detail the new return type? It would also be helpful if the doc comment explained why the
Height
is returned in theOk
case.
fixed
Perhaps even introduce a struct in place of the tuple, and a newtype for the
u64
while we are at it?
Maybe we do it in a different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree let's do it in a separate PR.
I started working on this but it wasn't as easy as it sounds. Two complications are that:
-
A struct already exists that could do the job, but that needs to be moved from
relayer-cli
torelayer
crate and some adaptations are necessary
https://github.com/informalsystems/ibc-rs/blob/f7cbe1cfea7f4b7aeb1bf0e99a1791f46abb2e89/relayer-cli/src/commands/query/packet/acks.rs#L16-L19 -
the method
unreceived_packets
has multiple uses and we have adapt all callers if we change its signature
Currently, the "generate" and the "refresh" phases are completely decoupled. This means there may be several minutes that elapse between the two phases. For this reason, the "refresh" phase double-checks that something was not relayed (by another relayer instance) or the packet did not time-out (since it was generated). Does this make sense? This is a place where we can optimize things and I can imagine we can skip the "refresh" in some cases. But we have not experimented with this idea yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some review comments and addressed them in 2775ed0 if you want to take a look. I tested quite a bit with this and looks good, encountering still issues with seq mismatch during simulation but not made worse by these changes.
@ancazamfir the branch is now even with master and I undid the foreign_client.rs changes. |
Co-authored-by: Anca Zamfir <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, great stuff @adizere !!!
Tested all scenarios I could think of and things look good for this issue. There are still things to improve but not here.
I opened #2211 for the redundant client updates that cause not only the warning message and more fees to pe paid by hermes.
Also there is the issue of fragmenting in batches even earlier, on packet commitment query as this might also take a long time (i couldn't find the issue for this but i believe there is one?)
most concerns will be addressed in a separate PR
* Added docs for unreceived_acknowledgements and unreceived_packets * Flipped order of sentences * First iteration for informalsystems#2087; packet-recv works, packet-ack left * Incremental processing of packet-ack relaying * Added consensus_state_height_bounded with optimistic path * Nits * Simplified progress tracking * Comment cleanup * fmt and better logs * Document the alternative path in consensus_state_height_bounded * renamed method to make it more explicit h/t SeanC * better var name * Added workaround to tonic::code inconsistencies. Ref: informalsystems#1971 (comment) * refactored the packet query methods * Refactored CLIs for incremental relaying * Experiment * Fix for packet send for non-zero delay * Using app status instead of network status * fmt * Undoing unnecessary chenges on foreign_client.rs * Removed outdated comments * Apply suggestions from code review Co-authored-by: Anca Zamfir <[email protected]> * Small refactor in cli.rs * Address review comments * Remove duplicate code * Undo one-chain change * Fix documentation * Changelog Co-authored-by: Anca Zamfir <[email protected]> Co-authored-by: Anca Zamfir <[email protected]>
Closes: #2087
Basic manual tests
This PR fixes an issue with Hermes taking a very long time, being unresponsive or being unable to clear packets in production. We cannot reproduce the production conditions, so the manual tests will be more rudimental.
Setup
gm
to create and start two networkscreate channel ibc-0 -c ibc-1 --port-a transfer --port-b transfer --new-client-connection
gm hermes config
andgm hermes keys
Test with this branch
Create 2000 ft token transfers
hermes tx raw ft-transfer ibc-0 ibc-1 transfer channel-0 9999 -o 1000 -n 2000
Clear the outstanding packets
hermes clear packets ibc-1 transfer channel-0
After all the event data is pulled from the chain, Hermes proceeds to submit the transactions one by one
Test with
master
branchCreate 2000 ft token transfers
hermes tx raw ft-transfer ibc-0 ibc-1 transfer channel-0 9999 -o 1000 -n 2000
hermes clear packets ibc-1 transfer channel-0
PR author checklist:
unclog
.docs/
).Reviewer checklist:
Files changed
in the GitHub PR explorer.