Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate the gas limit for messages to determine the maximum size of a cross-net message batch #202

Closed
adlrocha opened this issue Sep 7, 2023 · 4 comments

Comments

@adlrocha
Copy link
Contributor

adlrocha commented Sep 7, 2023

  • Look at the gas used by cross-net message between two bottom-up checkpoint or top-down cross-net finality.
  • Check that the maximum size of the block is not exceeded to determine the maximum batch permitted.
  • Use this for the execution.

Happy to follow-up if this is not clear, I am writing while syncing with the team :)

@aakoshh
Copy link
Contributor

aakoshh commented Sep 11, 2023

Okay so the problem is that we have top-down finality in the block and together with the other transactions (and bottom-up checkpoints) in a block it might be too much to execute, it may require more gas than the block gas limit.

In the case of top-down we have to remember the way its' constructed is that any validator can propose that we finalize a given height which it sees as final, and if the others also see it as final, they all execute messages. However this can involve arbitrarily large number of cross messages, so we discussed that the proposal should only be voted on if the volume of cross message execution it implies is within limits.

The (theoretical) problem is, what if even a single block on the parent contains too many messages to execute? We can say: the parent should not create so many? But the contracts emitting them might not know about each other and it would be bad UX to fail because of this kind of throttling. We also maintain cross-messages by block height, so finalizing a height means all the messages in it are ready. It would be a pain to put outgoing messages into "future buckets".

My suggestion would be that if this is a problem, we handle it on the consumer side (ie. the subnet):

  1. The current final block height on the parent chain is A
  2. Validators finalize block height B = A + n
  3. During execution, validator add all cross messages from the n between A and B to the gateway but do not execute them. They simply append to an execution queue, where any number of messages can go in a block (within reason).
  4. In subsequent blocks, they execute messages from this queue based on how much free gas limit they have in their blocks. For example is the block gas limit is BGL they can execute the next k cross messages such that BGL >= sum(tx.gas_limit for tx in block.transactions) + sum(cross_msg.gas_limit for cross_msg in gateway.cross_msgs[..k])
  5. After execution, they move the high-water-mark of executed top-down messages by k.

A similar strategy can be applied to each and every bottom-up subnet checkpoint as well.

So the idea is to dump all messages into queues, and instead of executing all of them, take as much as can be safely done in each block, until they are all gone.

@cryptoAtwill
Copy link
Contributor

@aakoshh We already have a top down nonce that chains the cross messages. If we include the nonce as part of the finality, we might not need to store everything in gateway as it includes extra gas cost.

@aakoshh
Copy link
Contributor

aakoshh commented Sep 18, 2023

@cryptoAtwill so you are saying instead of adding the cross messages to the destination gateway and then automatically executing some in each block depending on how much gas there is, we add just the nonce saying what the current finalized highest nonce is, and it's the job of Fendermint to fetch new messages up to this nonce in each block and execute them up to the gas limit.

That should work too I suppose, with top-down we should always have the parent to go to.

With bottom-up, using the IPLD resolver, the data should be in the IPLD block store, although I have to say it's in a completely different format and correlating a nonce to the CID of an data structure which contains an array of messages can be troublesome.

For bottom-up with Lotus, it wouldn't work as Lotus gives you no support for any shenanigans, you only option is to pass the messages into the contract and then you are left with the decision of whether to execute them straight away or store them.

Since these messages are stored in the source subnet gateway already, the cost of storing them in the destination as well I believe is not outrageous.

NB in IBC the messages are not stored in the contracts, only their hashes are. The message itself is emitted as an event, and the relayer must reconstruct it from there, and provide proof that the hash was indeed included in the outbox.

So you are right to worry about the cost, but just wanted to say that we lost that saving opportunity already when the outgoing messages were stored in the gateway.

@adlrocha
Copy link
Contributor Author

Since these messages are stored in the source subnet gateway already, the cost of storing them in the destination as well I believe is not outrageous.

I think I may have missed something. Why would you need to store them also in the destination?

I feel the issue with running out of gas is only concerning in the bottom-up case. I don't know if we have agreed already on how we are going to execute top-down messages, if implicitly or explicitly, but for top-down we can commit the finality, and then Fendermint can determine the execution policy as long as it is shared by all nodes (either by executing one by one as if they belonged to independent blocks, in small batches, etc.). Running out of gas in this case is not critical as the cross-net messages are available in the aprent, and is a matter of just checking that they are considered final by the child subnet, and execute them accordingly.

For bottom-up though this can be messy, as we propagate a bottom-up checkpoint whose messages need to be executed entirely while competing for block space with other transactions. For the case of Lotus as a parent may be even worse, as this has to be sent in a single transaction to the gateway including all the information needed for both, checkpoint commitment and execution.

I guess here what we can do is either:

  • Use a two step process, where relayers first commit the checkpoint, and then they can trigger the execution of messages included in the checkpoint one-by-one so all this execution doesn't need to fit atomically in the same block. This may require the use of Merkle inclusion proofs or some other more naive scheme to link cross-net messages to a committed checkpoint.
  • Try to be conservative in the way cross-net messages are included in checkpoints to ensure that they don't run out of gas increasing the checkpoint period and letting the relayer be "smart" on what is the best time to commit the checkpoint. This one is easier to implement but it could lead to twisted incentives :(

@jsoares jsoares transferred this issue from consensus-shipyard/fendermint Dec 19, 2023
@jsoares jsoares closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants