Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert ClientUpdate message into every transaction when splitting a message batch #2191

Closed
7 tasks
lyqingye opened this issue May 9, 2022 · 11 comments
Closed
7 tasks
Labels
A: bug Admin: something isn't working A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews I: logic Internal: related to the relaying logic
Milestone

Comments

@lyqingye
Copy link

lyqingye commented May 9, 2022

Summary of Bug

error logs:

prepending Destination client update at height 0-89532
assembled batch of 301 message(s)
estimate_gas: failed to simulate tx. propagating error to caller: gRPC call failed with status: status: Unknown, message: "failed to execute message; message index: 0: receive packet verification failed: couldn't verify counterparty packet commitment: failed packet commitment verification for client (07-tendermint-3): client state height < proof height ({0 89531} < {0 89532}): invalid height", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "67626"} }

set chain config max_msg_num = 50
set receiver chain average block time is 6s

The process of Hermes relaying packets can be divided into the following steps:

  1. genreate TransitMessage from events
  2. assemble msgs, build UpdateClientMsg and append it into TransitMessage
  3. batch send msgs

Suppose we assemble a batch of msgs with a length of 300, and then we start to build UpdateClientMsg at Event-Height and append it into Msgs, that is look like this:

[UpdateClientMsg,ReceivePacketMsg,ReceivePacketMsg,ReceivePacketMsg....]  len = 301

Next, we start sending msgs, dividing 301 msgs into multiple Transaction, and the number of msgs in each Transaction is max_msg_num , So we got seven Transactions, called as then Transaction1,Transaction2....Transaction7

assuming that the ClientState LatestHeight of the receiving chain is Event Height - 1 (proof height), Then transaction1 can execute estimate_gas successful , because Transaction1 contain UpdateClientMsg(client height = EventHeight)

at this time, Transaction1 is not confirmed (block time is 6S). We continue to send Transaction2. Because Transaction2 does not contain UpdateClientMsg, Transaction2 is bound to fail. error msg like this:

  receive packet verification failed: couldn't verify counterparty packet commitment: failed packet commitment verification for client (07-tendermint-3): client state height < proof height ({0 89531} < {0 89532})

Solution:
insert Same UpdateClientMsg into Every Transaction

https://github.com/informalsystems/ibc-rs/blob/817aaa36a35914ec1370f62f9e5873ab8d65a53d/relayer/src/link/operational_data.rs#L190-L197

Version

Steps to Reproduce

Acceptance Criteria


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@adizere
Copy link
Member

adizere commented May 9, 2022

Note that there is a simple fix for this we included in PR #2167. The problem is also discussed here: #1971 (comment)

The fix is:

7fd032b#diff-d18b8c452b8118832b4112538d034e9490cf2c30ab35c42d2e19d2f842131e14R518-R520

@lyqingye
Copy link
Author

lyqingye commented May 9, 2022

Note that there is a simple fix for this we included in PR #2167. The problem is also discussed here: #1971 (comment)

The fix is:

7fd032b#diff-d18b8c452b8118832b4112538d034e9490cf2c30ab35c42d2e19d2f842131e14R518-R520

Catching this exception and retrying means that subsequent batches need to wait for the first transaction to be confirmed.

@lyqingye
Copy link
Author

lyqingye commented May 9, 2022

Note that there is a simple fix for this we included in PR #2167. The problem is also discussed here: #1971 (comment)

The fix is:

7fd032b#diff-d18b8c452b8118832b4112538d034e9490cf2c30ab35c42d2e19d2f842131e14R518-R520

Catching this exception and retrying means that subsequent batches need to wait for the first transaction to be confirmed.

that is my solution

@adizere
Copy link
Member

adizere commented May 10, 2022

that is my solution

I see! You're including a ClientUpdate message multiple times in the batch, so that each individual transaction has a ClientUpdate with it.

Catching this exception and retrying means that subsequent batches need to wait for the first transaction to be confirmed.

Hmm, not sure I understand what you mean. This exception is indeed caught, but there is no retrying involved in this solution. What happens is that the tx simulation fails (because there is no ClientUpdate message installed yet) and we proceed to submit the transaction anyway by using the default gas. Ref:

https://github.com/informalsystems/ibc-rs/blob/37dbe8c42ae09403432ebfcfdb2a8a34d86909e9/relayer/src/chain/cosmos/estimate.rs#L123-L132

Do you see any problems with this approach? It works fine in practice, and we avoid adding additional ClientUpdate messages.

@adizere adizere added this to the v0.15.0 milestone May 10, 2022
@adizere adizere self-assigned this May 10, 2022
@adizere adizere added A: bug Admin: something isn't working I: logic Internal: related to the relaying logic A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews duplicate labels May 10, 2022
@adizere adizere moved this to Backlog in IBC-rs: the road to v1 May 10, 2022
@lyqingye
Copy link
Author

lyqingye commented May 10, 2022

I probably understand what you mean. The first batch has been submitted to the mem-pool. Subsequent batches can ignore the exception and submit it directly to the mem-pool. when first transaction confirmed , subsequent batches can using Latest Client to verify proof!

Thank you for your answer!

another question, if the default transaction fee is used, will it be regarded as failure or waste ?

Repository owner moved this from Backlog to Closed in IBC-rs: the road to v1 May 10, 2022
@lyqingye lyqingye reopened this May 10, 2022
Repository owner moved this from Closed to Blocked in IBC-rs: the road to v1 May 10, 2022
@adizere
Copy link
Member

adizere commented May 10, 2022

I probably understand what you mean. The first batch has been submitted to the mem-pool. Subsequent batches can ignore the exception and submit it directly to the mem-pool. when first transaction confirmed , subsequent batches can using Latest Client to verify proof!

Thank you for your answer!

another question, if the default transaction fee is used, will it be regarded as failure or waste ?

This is a good point. It is not regarded as failure. It can be regarded as waste, however. Let's think about the two solutions we have and how they affect gas+fee:

@lyqingye
Copy link
Author

lyqingye commented May 11, 2022

I probably understand what you mean. The first batch has been submitted to the mem-pool. Subsequent batches can ignore the exception and submit it directly to the mem-pool. when first transaction confirmed , subsequent batches can using Latest Client to verify proof!
Thank you for your answer!
another question, if the default transaction fee is used, will it be regarded as failure or waste ?

This is a good point. It is not regarded as failure. It can be regarded as waste, however. Let's think about the two solutions we have and how they affect gas+fee:

  • solution falcons-x@8e3d2ae

    • this approach has the advantage that is uses accurate fees, but the transactions are larger (because the ClientUpdate message is in every tx). so the fees are more accurate but potentially larger than the default fee
  • solution 7fd032b#diff-d18b8c452b8118832b4112538d034e9490cf2c30ab35c42d2e19d2f842131e14R518-R520

    • the disadvantage here is that we use a basic default fee which can be larger than the estimated one; the advantage is that each transactions (except for the first one in a batch) are smaller and cost less (ClientUpdate messages tend to be expensive due to the crypto operations within them) so even a default fee is typically enough

Assuming that there are only two Msg in the second batch, using the default gas at this time will inevitably cause waste
The default gas cannot be set too small. assume that the number of Msg in the second batch is (max-num-msg - 1)
I insert UpdateClientMsg in every batch, because the UpdateClientMsg in subsequent batches will not involve crypto operation, The code is as follows:

func (cs ClientState) CheckHeaderAndUpdateState(
	ctx sdk.Context, cdc codec.BinaryCodec, clientStore sdk.KVStore,
	header exported.Header,
) (exported.ClientState, exported.ConsensusState, error) {
	tmHeader, ok := header.(*Header)
	if !ok {
		return nil, nil, sdkerrors.Wrapf(
			clienttypes.ErrInvalidHeader, "expected type %T, got %T", &Header{}, header,
		)
	}

	prevConsState, _ := GetConsensusState(clientStore, cdc, header.GetHeight())
	if prevConsState != nil {
              // always Equal at here 
		if reflect.DeepEqual(prevConsState, tmHeader.ConsensusState()) {
			return &cs, prevConsState, nil
		}
		conflictingHeader = true
	}

@adizere
Copy link
Member

adizere commented May 11, 2022

because I insert UpdateClientMsg in every batch, the UpdateClientMsg in subsequent batches will not involve crypto operation, The code is as follows:

Interesting, thanks for highlighting that, I wasn't aware of it.

Let's keep this issue open and we'll have a closer look and try to integrate falcons-x@8e3d2ae, so that we make Hermes more accurate wrt fee usage. I'll go ahead and groom the issue description to reflect the discussion here.

Before we can adopt falcons-x@8e3d2ae, we will likely want to refactor the transaction partitioning mechanism, as the current approach is very fragile.

@adizere adizere changed the title Receive packet verification failed Hermes can be wasteful with fees May 11, 2022
@adizere adizere modified the milestones: v0.15.0, v1.0.0 May 11, 2022
@ancazamfir
Copy link
Collaborator

that is my solution

Current design is to fill a Tx with messages until either max_msg_num or max_tx_size are reached, whichever occurs first. The logic in your solution should be changed to follow batch_messages, i.e.:
https://github.com/informalsystems/ibc-rs/blob/eae0afd90d020af6a2e5feaa681dacf1294c7b8d/relayer/src/chain/cosmos/batch.rs#L119-L151

If we go with fixing this in the worker, we should also refactor the code to actually fragment in the packet worker assemble_msgs() (i.e. return Result<Vec<TrackedMsgs>, LinkError>) and remove the current fragmentation from the chain runtime.

@lyqingye
Copy link
Author

that is my solution

Current design is to fill a Tx with messages until either max_msg_num or max_tx_size are reached, whichever occurs first. The logic in your solution should be changed to follow batch_messages, i.e.:

https://github.com/informalsystems/ibc-rs/blob/eae0afd90d020af6a2e5feaa681dacf1294c7b8d/relayer/src/chain/cosmos/batch.rs#L119-L151

If we go with fixing this in the worker, we should also refactor the code to actually fragment in the packet worker assemble_msgs() (i.e. return Result<Vec<TrackedMsgs>, LinkError>) and remove the current fragmentation from the chain runtime.

Thank you for your correction and agree with you

@adizere adizere modified the milestones: v1.0.0, v1.1 Jun 28, 2022
@mzabaluev mzabaluev changed the title Hermes can be wasteful with fees Insert ClientUpdate message into every transaction when splitting a message batch Aug 15, 2022
@mzabaluev
Copy link
Contributor

This should be dedicated to equipping every tx with ClientUpdate. Sizing and gas issues should be tracked elsewhere: #2560 and #2422.

@adizere adizere removed the duplicate label Aug 15, 2022
@adizere adizere removed their assignment Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: bug Admin: something isn't working A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews I: logic Internal: related to the relaying logic
Projects
None yet
Development

No branches or pull requests

4 participants