-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't persist the network messages and their acknowledgements #1417
Comments
I think we should re-open #1079 and maybe reconsider the persisted queue too to mitigate messages that were transmitted, but not handled in the head logic (i.e. on the receiving side). |
You mean also add the option to persist whatever was in the queue at certain point in time? At which point would we persist the data? |
If u remove the network-messages then you cannot re-transmit messages anymore. On the other hand, removing the acknowledgements would result in lots of messages over the network. However it might work and lead to a head getting unstuck. |
Why do we need to persist messages at all? Why can we not do what the cardano-node is doing? A cardano-node only needs to ask whoever is available whether they have a longer chain. It does not rely on people storing outbound messages. Why should our situation be different? |
This is a good idea and something which was considered in the past (as mentioned in this ADR; there are other PRs and logbook items we could dig up). It was called a "pull-base approach" and this is what I think you mean in the essence of your comment. Have the network participants pull data from each other then sending it. However, that approach ultimately will come to similar questions about persistence. How much of data would we keep on the other end such that network participants can pull it? |
Nothing on the network layer. You only ask another node what it believes to be the case regarding the actual chain state and then you verify it. The chain (history of snapshots), and the working area (signatures from unconfirmed snapshots). |
We just stumbled over the We concluded that having the |
Exploration: #1593 Closing as "not planned" |
Why
We already saw a couple of times now that resending the lost/missed messages was not so robust since the
Head
still got stuck but now it wasn't clear what are the values we would need to set for the acks counter in order to fix the problem/resend lost messages.The initial idea of having the reliability layer store the index of sent/seen messages and replay them seems to not work so well because of couple of factors:
The node can crash after sending a message but before storing it on diskOn top of everything the saved acks do not correspond directly to stored network messages since we store them separately and this is not atomic process which can lead to failures.
In general it seems like storing the network messages and acks is not beneficial enough to justify it and therefore I propose this idea to remove it completely and keep them in memory like before.
By doing this we rely on other nodes not to crash at the same time so the node that crashed could catch up by re-receiving the lost network messages from other node/s.
There are couple of issues related to implementation of reliability layer of hydra-node like #1079 and a bug item that could be related #1202
What
Remove persistence handle from the hydra-node networking part.
How
Remove completely the
MessagePersistence
argument towithNetwork
function which eliminates message storing/reading from disk.The text was updated successfully, but these errors were encountered: