-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate ways to impose partial ordering of messages sent to a chain #11
Comments
Another complication to consider is that multiple partial ordering subsystems may affect each other and introduce deadlock. Consider a case where a worker task for sending an ordered channel packet with sequence 2 across a solomachine channel, followed by a worker task of the same ordered channel sending packet sequence 1. This would result in a deadlock where the first worker acquired the lock for solomachine, the second work acquired the lock for the ordered channel, but none could acquire both locks to proceed. |
A possible way to solve this is to take advantage of context-generic programming and employ message polymorphism to differentiate different messages. Recall that similar thing has already done for the In the new design, we will introduce new abstract message types such as With this new design, we will be able to provide different concrete message types and concrete message senders depending on the situation. At the same time, if there is no need to handle special case, it is also possible to default all abstract messages back to the same concrete message type. For instance, in a Cosmos-to-Cosmos implementation, both abstract types On the other hand, we will also able special case the implementation of solomachine-to-cosmos relaying, such that Similarly, for ordered packet relaying, there will be special traits like Lastly, the constraint composition allows us to take care of doubly ordered messages such as All in all, the relayer implementation is going to be more complicated because of this. However, this may be an unavoidable complexity, as regardless of which programming technique we use, we would still need to write special cases when handling the sending of different kinds of messages. Nevertheless, the use of context-generic programming provides the flexibility for the implementation to either use the same concrete type for all abstract message types when there is no special case, or to use as many concrete type for each abstract message type as the number of special cases needed. A key downside of this approach is that it requires the full generality as provided by context-generic programming. This means that it is no longer possible to use the all-in-one traits to represent everything uniformly. The all-in-one traits do not allow customization or handling of special cases. If we modify the all-in-one traits to support the special cases, it would no longer be a special case, and instead become a general case that needs to be handled by all implementations, even if they do not have a special case. However, this may be a good opportunity for us to abandon the all-in-one traits, which was originally intended to ease development, but may be proven to be too restrictive. |
Summary
Find out if it is possible for a chain context to determine if an opaque message requires waiting for another message to be sent first before it can be sent.
Problem Definition
In relayer-next informalsystems/hermes#2478, we employ massively parallel relaying by spawning many async tasks for each relaying operation. This concurrency architecture is based on the assumption that each relaying operation can be executed in parallel independent of each other.
However, there are several cases of relaying operations depending on each others, and that makes it difficult to parallelize such operations. Following are some of the use cases:
Ordered Channels
When relaying packets from ordered channels, a packet needs to wait for all other packets with a lower sequence to be relayed first. If multiple async tasks are generating IBC messages for the ordered channel at the same time, they must coordinate to ensure that IBC messages for the packets with lower sequence must be sent first.
Solomachine Relaying #2924
In solomachine relaying, the Cosmos chain rely on using a sequence stored in the client state to verify the IBC messages. In order for a proof to be valid, the signed data must have an exact match of the sequence stored in the client state. If multiple async tasks are generating IBC messages from a solomachine to a Cosmos chain, they must coordinate to ensure that the IBC messages with a lower sequence must be sent first.
Naive Workaround
A naive workaround for the above problem is to limit the parallelism, and allow only one async task at a time for relaying operations that are dependent on each other. However, this can significantly limit the performance of the relaying, as only one operation can be performed per chain height.
Note that even with the naive workaround, it is still possible to have concurrent relaying on unrelated operations. For example, there can be 3 async tasks that perform relaying of unordered packets, ordered packets, and solomachine packets, all to the same chain at the same time. And that is fine, because these operations are not dependent of each other.
Action Item
This issue would track the work to find a general way to resolve the task dependency problem. At a high level, we would need to design synchronization primitives such that the async tasks can wait for other tasks to complete some operation. Furthermore, the synchronization cannot just end after the previous message is sent as a transaction, as that would be too late to achieve any concurrency. Instead, we would need dedicated tasks to sequentialize the other tasks for generating messages, but then collect the ordered messages to be sent in the same batch.
For Admin Use
The text was updated successfully, but these errors were encountered: