-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
separates out routing shreds from establishing connections #33599
separates out routing shreds from establishing connections #33599
Conversation
f3572dc
to
c0ffb78
Compare
Codecov Report
@@ Coverage Diff @@
## master #33599 +/- ##
=======================================
Coverage 81.8% 81.8%
=======================================
Files 806 806
Lines 217588 217612 +24
=======================================
+ Hits 178106 178133 +27
+ Misses 39482 39479 -3 |
c0ffb78
to
8910c50
Compare
2cd44ce
to
e7fd2c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the extra routing layer, would it be possible to leave a "tombstone" in the cache that would indicate that we previously tried & failed to establish a connection and shouldn't try again?
turbine/src/quic_endpoint.rs
Outdated
}; | ||
let receiver = { | ||
let mut router = router.write().await; | ||
let bytes = match router.get(&remote_address) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Was going to recommend trying to de-duplicate this block as it is nearly identical to the block above, but with the continue
statement to control loop flow, I don't see a great way to do so unfortunately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a helper function to reduce the amount of duplicate code.
Currently each outgoing shred will attempt to establish a connection if one does not already exist. This is very wasteful and consumes many tokio tasks if the remote node is down or unresponsive. The commit decouples routing packets from establishing connections by adding a buffering channel for each remote address. Outgoing packets are always sent down this channel to be processed once the connection is established. If connecting attempt fails, all packets already pushed to the channel are dropped at once, reducing the number of attempts to make a connection if the remote node is down or unresponsive.
e7fd2c2
to
e3d6faa
Compare
How do we decide to retry connection in that case? i.e. when and how the tombstone gets cleared? An advantage of this routing layer is that the connection cache also simplifies from HashMap<(SocketAddr, Option<Pubkey>), Arc<RwLock<Option<Connection>>>> to HashMap<Pubkey, Connection> which makes the follow up patch for cache eviction much simpler. |
Hypothetically, the tombstone could contain a timestamp and we retry if the tombstone has reached some predefined age.
Fair enough. I'll take another another pass at this tomorrow |
Currently each outgoing shred will attempt to establish a connection if one does not already exist. This is very wasteful and consumes many tokio tasks if the remote node is down or unresponsive. The commit decouples routing packets from establishing connections by adding a buffering channel for each remote address. Outgoing packets are always sent down this channel to be processed once the connection is established. If connecting attempt fails, all packets already pushed to the channel are dropped at once, reducing the number of attempts to make a connection if the remote node is down or unresponsive. (cherry picked from commit 8becb72)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review, but just one question
let receiver = { | ||
let (sender, receiver) = tokio::sync::mpsc::channel(ROUTER_CHANNEL_BUFFER); | ||
router.write().await.insert(remote_address, sender); | ||
receiver | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the server task always creating a new channel, while the client task reuses one if it exists? Is it possible for the client side to have already tried to initiate a connection to that remote address? If that's the case, it looks like the server side would be clobbering the previous channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server side does not initiate a connection, it only accepts incoming connections from remote nodes.
If there is already a connection and for whatever reason the remote node initiates a new connection, then yes, it will drop the previous connection and replace it with the new one. This happens both in the router
hash-map here, and the cache
:
https://github.com/solana-labs/solana/blob/dc3c82729/turbine/src/quic_endpoint.rs#L407-L409
We can possibly allow multiple connections per pubkey by having a Vec<Connection>
instead of a single Connection
, but for now I think a single Connection
per pubkey would be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok great, that explains it, thanks! No need to have a Vec<Connection>
-- resetting on a new remote connection makes sense.
…ckport of #33599) (#33772) separates out routing shreds from establishing connections (#33599) Currently each outgoing shred will attempt to establish a connection if one does not already exist. This is very wasteful and consumes many tokio tasks if the remote node is down or unresponsive. The commit decouples routing packets from establishing connections by adding a buffering channel for each remote address. Outgoing packets are always sent down this channel to be processed once the connection is established. If connecting attempt fails, all packets already pushed to the channel are dropped at once, reducing the number of attempts to make a connection if the remote node is down or unresponsive. (cherry picked from commit 8becb72) Co-authored-by: behzad nouri <[email protected]>
Problem
Currently each outgoing shred will attempt to establish a connection if one does not already exist. This is very wasteful and consumes many tokio tasks if the remote node is down or unresponsive.
Summary of Changes
The commit decouples routing packets from establishing connections by adding a buffering channel for each remote address. Outgoing packets are always sent down this channel to be processed once the connection is established. If connecting attempt fails, all packets already pushed to the channel are dropped at once, reducing the number of attempts to make a connection if the remote node is down or unresponsive.