-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Reliability network layer #1074
Commits on Oct 4, 2023
-
Configuration menu - View commit details
-
Copy full SHA for c20956d - Browse repository at this point
Copy the full SHA c20956dView commit details -
Configuration menu - View commit details
-
Copy full SHA for deeaf05 - Browse repository at this point
Copy the full SHA deeaf05View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1f4252b - Browse repository at this point
Copy the full SHA 1f4252bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7eaced9 - Browse repository at this point
Copy the full SHA 7eaced9View commit details -
Configuration menu - View commit details
-
Copy full SHA for a1e42de - Browse repository at this point
Copy the full SHA a1e42deView commit details -
Configuration menu - View commit details
-
Copy full SHA for b8c0aa7 - Browse repository at this point
Copy the full SHA b8c0aa7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 48ae585 - Browse repository at this point
Copy the full SHA 48ae585View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7e925a1 - Browse repository at this point
Copy the full SHA 7e925a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 58a9915 - Browse repository at this point
Copy the full SHA 58a9915View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6cfcbea - Browse repository at this point
Copy the full SHA 6cfcbeaView commit details -
Configuration menu - View commit details
-
Copy full SHA for d6392dd - Browse repository at this point
Copy the full SHA d6392ddView commit details -
Configuration menu - View commit details
-
Copy full SHA for 81a1b86 - Browse repository at this point
Copy the full SHA 81a1b86View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0eee857 - Browse repository at this point
Copy the full SHA 0eee857View commit details -
Implement resending of unacked messages
This code is just an experiment and would require a lot of masssaging to get right...
Configuration menu - View commit details
-
Copy full SHA for df2d6de - Browse repository at this point
Copy the full SHA df2d6deView commit details -
Configuration menu - View commit details
-
Copy full SHA for 633ded7 - Browse repository at this point
Copy the full SHA 633ded7View commit details -
Configuration menu - View commit details
-
Copy full SHA for b61ecb0 - Browse repository at this point
Copy the full SHA b61ecb0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 46d083a - Browse repository at this point
Copy the full SHA 46d083aView commit details -
Configuration menu - View commit details
-
Copy full SHA for f864299 - Browse repository at this point
Copy the full SHA f864299View commit details -
Configuration menu - View commit details
-
Copy full SHA for 993ad96 - Browse repository at this point
Copy the full SHA 993ad96View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4fd49ae - Browse repository at this point
Copy the full SHA 4fd49aeView commit details -
Use one map to track all ack messages
- Add a test case to make sure the acks are updated on each received/broadcasted message
Configuration menu - View commit details
-
Copy full SHA for 16cb6f1 - Browse repository at this point
Copy the full SHA 16cb6f1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 098763c - Browse repository at this point
Copy the full SHA 098763cView commit details -
Configuration menu - View commit details
-
Copy full SHA for e287859 - Browse repository at this point
Copy the full SHA e287859View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9f55f93 - Browse repository at this point
Copy the full SHA 9f55f93View commit details -
To be more performant let's use Vectors in the Msg type as well as trace messages.
Configuration menu - View commit details
-
Copy full SHA for 83b3811 - Browse repository at this point
Copy the full SHA 83b3811View commit details -
Add helper for constructing new acks
Rename some binders to make more sense
Configuration menu - View commit details
-
Copy full SHA for 3f7017a - Browse repository at this point
Copy the full SHA 3f7017aView commit details -
Use Vector as the main data structure
- We used Sequence to capture messages and List to hold parties. Our benchmarks show worse performance than current master (they are not able to complete at all). By using Vector for everything I am noticing better performance but benchmarks for hydra-cluster stil don't finish. - Also introduce custom exception types for Reliability
Configuration menu - View commit details
-
Copy full SHA for 4e5eefe - Browse repository at this point
Copy the full SHA 4e5eefeView commit details -
After possible callback local acks might change so we need to re-read them from a TVar. Add a party in the BroadcastCounter to improve logging.
Configuration menu - View commit details
-
Copy full SHA for 35a9363 - Browse repository at this point
Copy the full SHA 35a9363View commit details -
Configuration menu - View commit details
-
Copy full SHA for a761822 - Browse repository at this point
Copy the full SHA a761822View commit details -
Ensure changes to vector clock are atomic
We had an issue whereby concurrent call to the callback function from lower layer lead to vector clock losing sync and not allowing progress anymore because observation and changes were not atomic. The resending logic does not seem to much make sense even though it seems to work, we need to analyse the behaviour of the system a bit more...
Configuration menu - View commit details
-
Copy full SHA for 092208b - Browse repository at this point
Copy the full SHA 092208bView commit details -
Simplify authentication layer to not require passing a wrapped message
The Authenticate layer has the signing key to sign messages, so there is no need to require senders to do the work themselves. This is possible because we have decoupled incoming from outgoing messages in the definition of a NetworkComponent.
Configuration menu - View commit details
-
Copy full SHA for 452954c - Browse repository at this point
Copy the full SHA 452954cView commit details -
Wire heartbeat into reliability
The idea is that the reliability layer works with heartbeats to send acks which do not change upon a ping, denoting the fact the node cannot make progress by sending new messages and giving the opportunity to peers to resend messages not acknowledged because of transient network failures. This makes the Reliability highly dependent on Heartbeat which raises the question of whether or not we would not like to merge the 2.
Configuration menu - View commit details
-
Copy full SHA for 25d264e - Browse repository at this point
Copy the full SHA 25d264eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4a25209 - Browse repository at this point
Copy the full SHA 4a25209View commit details -
Configuration menu - View commit details
-
Copy full SHA for b542279 - Browse repository at this point
Copy the full SHA b542279View commit details -
Randomly drop messages to simulate network failures
This is of course not meant to stay but is an interesting small technique that I thought would be nice to introduce. A similar technique could be used to simulate messages shuffling, or delays, etc. Next commit will revert this.
Configuration menu - View commit details
-
Copy full SHA for 35fb956 - Browse repository at this point
Copy the full SHA 35fb956View commit details -
Revert "Randomly drop messages to simulate network failures"
This reverts commit 86dbd743a3735804bea0220bd140dd7ca1670908.
Configuration menu - View commit details
-
Copy full SHA for 0552225 - Browse repository at this point
Copy the full SHA 0552225View commit details -
Configuration menu - View commit details
-
Copy full SHA for 880b886 - Browse repository at this point
Copy the full SHA 880b886View commit details -
Configuration menu - View commit details
-
Copy full SHA for d39056c - Browse repository at this point
Copy the full SHA d39056cView commit details -
Simplify parameters passing to reliability network for parties
And do the sorting inside the function and not require a sorted vector
Configuration menu - View commit details
-
Copy full SHA for 7472c7c - Browse repository at this point
Copy the full SHA 7472c7cView commit details -
Extract dedicated module for hydra-node's networking stack
The actual network stack used by the node is becoming more complex, and requires pulling in dependencies which are irrelevant.
Configuration menu - View commit details
-
Copy full SHA for e1e5d71 - Browse repository at this point
Copy the full SHA e1e5d71View commit details -
Configuration menu - View commit details
-
Copy full SHA for 240bfa5 - Browse repository at this point
Copy the full SHA 240bfa5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4d340b7 - Browse repository at this point
Copy the full SHA 4d340b7View commit details -
Ensure fixture parties are sorted as expected
We want to ensure alice, bob, carol as Party reflect the sorting of their identifiers
Configuration menu - View commit details
-
Copy full SHA for 4488ed7 - Browse repository at this point
Copy the full SHA 4488ed7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 23bb6c6 - Browse repository at this point
Copy the full SHA 23bb6c6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2f97ead - Browse repository at this point
Copy the full SHA 2f97eadView commit details -
If we receive a messagae with an unappropriate number of peers, we just drop the message as it is quite suspicious.
Configuration menu - View commit details
-
Copy full SHA for 15d7c47 - Browse repository at this point
Copy the full SHA 15d7c47View commit details -
Configuration menu - View commit details
-
Copy full SHA for 25dea1d - Browse repository at this point
Copy the full SHA 25dea1dView commit details -
Organize tests by direction (sending vs receiving)
Can ease test exploration.
Configuration menu - View commit details
-
Copy full SHA for 8a482fd - Browse repository at this point
Copy the full SHA 8a482fdView commit details -
Refactor: rely on abstract aliceReceives for property
Also note that we have an issue with this property that would pass if we drop all the messages.
Configuration menu - View commit details
-
Copy full SHA for 4699b0a - Browse repository at this point
Copy the full SHA 4699b0aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 714899f - Browse repository at this point
Copy the full SHA 714899fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 80b519d - Browse repository at this point
Copy the full SHA 80b519dView commit details -
Introduce a log message for sent messages
- Also add a new test case
Configuration menu - View commit details
-
Copy full SHA for ed3a54e - Browse repository at this point
Copy the full SHA ed3a54eView commit details -
Configuration menu - View commit details
-
Copy full SHA for d058ee6 - Browse repository at this point
Copy the full SHA d058ee6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 69b1f3d - Browse repository at this point
Copy the full SHA 69b1f3dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 76df095 - Browse repository at this point
Copy the full SHA 76df095View commit details -
- We record what each party has seen and remove messages seen by all parties. - Use IntMap instead of Vector for storing sent messages because we need to be able to remove old/seen indices without re-indexing. - Use Map to keep track of the last seen message by party - Introduce a test case to assert the withNetwork logs removal of old messages.
Configuration menu - View commit details
-
Copy full SHA for 60c0101 - Browse repository at this point
Copy the full SHA 60c0101View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2cda1ae - Browse repository at this point
Copy the full SHA 2cda1aeView commit details -
Implement a network "stress" test
We wire 2 components for alice and bob, mediated through a lossy channel, and check bob receives all messages from alice
Configuration menu - View commit details
-
Copy full SHA for c5183a9 - Browse repository at this point
Copy the full SHA c5183a9View commit details -
Try fixing "stress test" by reducing list of messages size
It would be better to wait (with a timeout) to receive all broadcast messages
Configuration menu - View commit details
-
Copy full SHA for c579933 - Browse repository at this point
Copy the full SHA c579933View commit details -
Configuration menu - View commit details
-
Copy full SHA for dc235ef - Browse repository at this point
Copy the full SHA dc235efView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4959354 - Browse repository at this point
Copy the full SHA 4959354View commit details -
* Remove imports * Remove unneeded resize * Tabulate to show length of messages sent * Increase ratio of dropped messages
Configuration menu - View commit details
-
Copy full SHA for a0ef070 - Browse repository at this point
Copy the full SHA a0ef070View commit details -
Configuration menu - View commit details
-
Copy full SHA for d9e4da4 - Browse repository at this point
Copy the full SHA d9e4da4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3073455 - Browse repository at this point
Copy the full SHA 3073455View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3520c4f - Browse repository at this point
Copy the full SHA 3520c4fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 546522c - Browse repository at this point
Copy the full SHA 546522cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 63b2b3f - Browse repository at this point
Copy the full SHA 63b2b3fView commit details -
This test is superseded by the stress test which covers more of it.
Configuration menu - View commit details
-
Copy full SHA for 15eae4c - Browse repository at this point
Copy the full SHA 15eae4cView commit details -
Configuration menu - View commit details
-
Copy full SHA for d5950ac - Browse repository at this point
Copy the full SHA d5950acView commit details -
Configuration menu - View commit details
-
Copy full SHA for 539962a - Browse repository at this point
Copy the full SHA 539962aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0b432d1 - Browse repository at this point
Copy the full SHA 0b432d1View commit details -
Configuration menu - View commit details
-
Copy full SHA for be24bed - Browse repository at this point
Copy the full SHA be24bedView commit details -
Configuration menu - View commit details
-
Copy full SHA for c1a5422 - Browse repository at this point
Copy the full SHA c1a5422View commit details -
Configuration menu - View commit details
-
Copy full SHA for d80c4e6 - Browse repository at this point
Copy the full SHA d80c4e6View commit details -
How is it that the test fail with `[0] /= [0, 0]` but the debug trace is `[0, 0] /= [0, 0]` ?
Configuration menu - View commit details
-
Copy full SHA for ef9b5e0 - Browse repository at this point
Copy the full SHA ef9b5e0View commit details -
Fixed wrong reference to type of message
Co-authored-by: Sebastian Nagel <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a5c86f1 - Browse repository at this point
Copy the full SHA a5c86f1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 922b3c4 - Browse repository at this point
Copy the full SHA 922b3c4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 647a572 - Browse repository at this point
Copy the full SHA 647a572View commit details -
Alice and Bob where not writing fast enough
At least not fast enough compared to the time we were giving their messages to arrive. Sending a message once every ten seconds and expecting all the messages to reach the peer in less than 100 seconds does not always work.
Configuration menu - View commit details
-
Copy full SHA for 5c1929f - Browse repository at this point
Copy the full SHA 5c1929fView commit details -
FIX bug: do not increase local vector clock when receiving Ping
Bug was exposed by running: ``` cabal test hydra-node --test-options '-m Reliability --seed 1054015251' ``` The problem was caused by Bob increasing his local view of received messsages from Alice from 15 to 16 when receiving a Ping from Alice when, actually, he never received this message 16 before. As a consequence, Alice would not resend message 16 or, when she resends message 16, Bob would ignore it anyway as it's expecting :x
Configuration menu - View commit details
-
Copy full SHA for ac35296 - Browse repository at this point
Copy the full SHA ac35296View commit details -
Configuration menu - View commit details
-
Copy full SHA for 81142ac - Browse repository at this point
Copy the full SHA 81142acView commit details -
Configuration menu - View commit details
-
Copy full SHA for e2a09f5 - Browse repository at this point
Copy the full SHA e2a09f5View commit details -
Configuration menu - View commit details
-
Copy full SHA for e76361c - Browse repository at this point
Copy the full SHA e76361cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 55e5442 - Browse repository at this point
Copy the full SHA 55e5442View commit details -
WIP: FIX: do not stuck the head if two peers are lagging behind each …
…other If Alice is lagging behind Bob and Bob is lagging behind Alice then nobody would resend any message to its peer. Here we remove one condition to unlock this.
Configuration menu - View commit details
-
Copy full SHA for 9efa9d6 - Browse repository at this point
Copy the full SHA 9efa9d6View commit details -
FIX: we never received messages from ourself
So we should not include ourself to the `seenMessages` map or, otherwise, in real life, we will never garbage collect.
Configuration menu - View commit details
-
Copy full SHA for 3489a47 - Browse repository at this point
Copy the full SHA 3489a47View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5a8fc11 - Browse repository at this point
Copy the full SHA 5a8fc11View commit details -
Only resend messages upon receiving a Ping
This is meant to ensure we only try to resend messages whenever the peer is quiescent, which was the original intent of using Pings in the first place in order to avoid resending messages too often. The assumption is that disconnections and messages drop should be few and far between in normal operations and it's therefore fine to rely on the Ping's roundtrip time to check for peers state.
Configuration menu - View commit details
-
Copy full SHA for 32ff306 - Browse repository at this point
Copy the full SHA 32ff306View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1d7839f - Browse repository at this point
Copy the full SHA 1d7839fView commit details -
Stress test peers wait for all messages to be received by both peers
Timeouts are inherently unreliable, esp. given an arbitrary and unknown list of messages and an arbitrary ordering of actions. Tests might fail because one of the peers stops before the other and therefore fails to send Pings which will notify the peer it's missing message, or fail to take into account the peer's Pings. This commit replaces complicated timeout logic with a simple STM-based check that _both_ peers received all the messages.
Configuration menu - View commit details
-
Copy full SHA for 798dc2a - Browse repository at this point
Copy the full SHA 798dc2aView commit details -
Configuration menu - View commit details
-
Copy full SHA for b6f7b6e - Browse repository at this point
Copy the full SHA b6f7b6eView commit details -
Configuration menu - View commit details
-
Copy full SHA for ff2b839 - Browse repository at this point
Copy the full SHA ff2b839View commit details -
Remove map tracking peer's view
This was used for GC messages and will be rewritten later
Configuration menu - View commit details
-
Copy full SHA for 4c819bc - Browse repository at this point
Copy the full SHA 4c819bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2360e94 - Browse repository at this point
Copy the full SHA 2360e94View commit details -
Configuration menu - View commit details
-
Copy full SHA for 95e53a6 - Browse repository at this point
Copy the full SHA 95e53a6View commit details -
Make finding our index a pure function
Also it throws an error if we do not find ourselves in the list of all parties. This is an absurd given we included ourselves to the list before sorting.
Configuration menu - View commit details
-
Copy full SHA for a718239 - Browse repository at this point
Copy the full SHA a718239View commit details -
Configuration menu - View commit details
-
Copy full SHA for 07e9c7d - Browse repository at this point
Copy the full SHA 07e9c7dView commit details -
Configuration menu - View commit details
-
Copy full SHA for e1b9c82 - Browse repository at this point
Copy the full SHA e1b9c82View commit details -
Configuration menu - View commit details
-
Copy full SHA for 262bba4 - Browse repository at this point
Copy the full SHA 262bba4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b55f64 - Browse repository at this point
Copy the full SHA 3b55f64View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5178b21 - Browse repository at this point
Copy the full SHA 5178b21View commit details -
Configuration menu - View commit details
-
Copy full SHA for ecb9522 - Browse repository at this point
Copy the full SHA ecb9522View commit details