refactor(iroh-net): Optimise present nodes in ActiveRelay #2781

flub · 2024-10-03T11:59:58Z

Description

The ActiveRelay actor keeps track of which remote nodes are present on the relay connection so that we can optimise relay connections to remote nodes. This does two main optimisations:

There were two sets of these nodes kept, they could easily be unified.
The set is best stored in a BTreeSet since they are simple NodeIds stored in them.
Bonus: rename peer to node to match our naming convention.
Bonus: identify nodes by NodeId since this is a routing key here.

Breaking Changes

Still none if all is well.

Notes & open questions

This targets #2779 as base.

Change checklist

Self-review.
Documentation updates following the style guide, if relevant.
~~[ ] Tests if relevant.~~
~~[ ] All breaking changes documented.~~

These are two cleanups in the relay client: - The `relay::Client` hands out a connection object when asked to connect. This `Conn` was imported with rename to `RelayClient` which was a bit confusing as this was already the relay client. It is now renamed to `RelayConn` which makes a lot more sense. The related builder struct etc are renamed to match. - The `relay::Client` had a counter for the number of connections made to the relay. That seems fun, but was entirely unused. If this is a useful thing to have it should probably be a counter metric instead but let's not add anything that no one is using. Removing this makes a lot of APIs a bit simpler and removes some state tracking.

github-actions · 2024-10-03T12:02:31Z

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/2781/docs/iroh/

Last updated: 2024-10-03T17:26:38Z

github-actions · 2024-10-03T12:14:59Z

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: aa7fc95

iroh-net/src/magicsock/relay_actor.rs

Co-authored-by: Divma <[email protected]>

divagant-martian

I checked the original code and while the two sets were not exactly the same, I see no reason why this should work differently to the previous one. From my perspective this lgtm!

The ActiveRelay actor keeps track of which remote nodes are present on the relay connection so that we can optimise relay connections to remote nodes. This does two main optimisations: - There were two sets of these nodes kept, they could easily be unified. - The set is best stored in a BTreeSet since they are simple NodeIds stored in them. - Bonus: rename peer to node to match our naming convention. - Bonus: identify nodes by NodeId since this is a routing key here.

Co-authored-by: Divma <[email protected]>

## Description When the connection to the relay server fails the read channel will return a read error. At this point the ActiveRelay actor will passively wait until it has been asked to send something again before it will re-establish a connection. However if the local node has no reason to send anything to the relay server, the connection is never re-established. This is problematic when the relay has remote nodes trying to send to this node. This doubly problematic when the connection is to the home relay: the node just sits there thinking everything is healty and quiet, but no traffic is reaching it. In a node with active traffic this doesn't really show up, since a send will be triggered quickly for an active connection and the connection with the relay server would be re-established. The start of the ActiveRelay run loop is the right place for this. A read error triggers the loop to go round, logs a read error already and then re-estagblishes the connection. This does not keep the relay connection open forever. The mechanism that is cleans up unused connections to relay servers will still function correctly since this only takes the time something was last sent to a relay server into account. As long as a connection with a remote node exists there will be a DISCO ping between the two nodes over the relay path, so the connection is correctly kept alive. The home relay is exempted from the relay connection cleanup so is also kept connected, leaving this node available to be contacted via the relay server. Which is the entire point of this bugfix. The relay_client.is_connected() call sends a message to the relay Client actor, and relay_client.connect() does that again. Taking the shortcut to only call .connect() however is not better because the logging becomes messier. In the common case there is one roundrip-message to the relay Client actor and this would not improve anyway. The two messages for the case where a reconnect is needed does not occur commonly. ## Breaking Changes None ## Notes & open questions Fixes fishfolk/bones#428 It is rather difficult to test though. This targets #2781 as base. ## Change checklist - [x] Self-review. - ~~[ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant.~~ - ~~[ ] Tests if relevant.~~ - ~~[ ] All breaking changes documented.~~

flub added 2 commits October 3, 2024 12:23

small doc fixup

b1ea4b2

flub requested a review from ramfox October 3, 2024 12:00

divagant-martian reviewed Oct 3, 2024

View reviewed changes

iroh-net/src/magicsock/relay_actor.rs Show resolved Hide resolved

iroh-net/src/magicsock/relay_actor.rs Show resolved Hide resolved

iroh-net/src/magicsock/relay_actor.rs Outdated Show resolved Hide resolved

iroh-net/src/magicsock/relay_actor.rs Outdated Show resolved Hide resolved

flub and others added 4 commits October 3, 2024 17:46

Apply suggestions from code review

62c7421

Co-authored-by: Divma <[email protected]>

fixup some more names

60ebe40

don't rename anything

b38cf2f

docs style

a22c2d3

flub force-pushed the flub/relay-actor-active-nodes branch from fd3545c to 8c594f5 Compare October 3, 2024 16:45

flub requested a review from divagant-martian October 3, 2024 16:50

divagant-martian approved these changes Oct 3, 2024

View reviewed changes

flub and others added 3 commits October 3, 2024 19:21

Rename more variables to be consistent

b344622

Apply suggestions from code review

275b399

Co-authored-by: Divma <[email protected]>

flub force-pushed the flub/relay-actor-active-nodes branch from 8c594f5 to 275b399 Compare October 3, 2024 17:24

flub mentioned this pull request Oct 3, 2024

fix(iroh-net): Keep the relay connection alive on read errors #2782

Merged

1 task

Base automatically changed from flub/relay-client-cleanup-1 to main October 3, 2024 17:57

flub added this pull request to the merge queue Oct 3, 2024

Merged via the queue into main with commit c7ac982 Oct 3, 2024
27 checks passed

flub deleted the flub/relay-actor-active-nodes branch October 4, 2024 07:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(iroh-net): Optimise present nodes in ActiveRelay #2781

refactor(iroh-net): Optimise present nodes in ActiveRelay #2781

flub commented Oct 3, 2024

github-actions bot commented Oct 3, 2024 •

edited

Loading

github-actions bot commented Oct 3, 2024 •

edited

Loading

divagant-martian left a comment

refactor(iroh-net): Optimise present nodes in ActiveRelay #2781

refactor(iroh-net): Optimise present nodes in ActiveRelay #2781

Conversation

flub commented Oct 3, 2024

Description

Breaking Changes

Notes & open questions

Change checklist

github-actions bot commented Oct 3, 2024 • edited Loading

github-actions bot commented Oct 3, 2024 • edited Loading

divagant-martian left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 3, 2024 •

edited

Loading

github-actions bot commented Oct 3, 2024 •

edited

Loading