-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RfC] Node-to-node network latency measurement #10084
Comments
you don't think we can measure this from the votes?
they are already in CRDS. periodically every node sets the timestamp |
The gossip network is not necessarily a full mesh (or is it?) and the timestamps are subject to gossip propagation delay, gossip queue congestion and vote timing. A dedicated echo service wouldn't have any of these confounding factors and measure only the network latency from each node to every other node. Updated the title to clarify. |
Ah I see. You want to sample rtt time between any two nodes. We probably need a separate message for that. With eager push in gossip only a subset of the nodes will be 1 hop away. I think the gossip fanout is 6. |
Yes, hence the proposal to have a separate UDP service for the echo server. That way there'll be a separate queue and we can better differentiate between network latency and application layer congestion (like a gossip flood/queue drops). |
FWIW, I am currently recording ping times from my TdS node to all the others. Not everyone is responding to ping, so the data is incomplete, but it does start to give an indication of which nodes are fast or slow (from my node currently in NYC). It will be awesome to aggregate similar data from other nodes. At the moment, I am running the Ruby script in a single thread every 10 minutes. I intend to use the data for general curiosity & reporting purposes, so I didn't see the need for a shorter sample period. I expect to see some correlation between a node's average network performance and skipped blocks. I haven't done that analysis yet... I can see a faster sample rate being helpful when Rampy returns. I am not a Rust developer, but I will do what I can to help! -- BL |
What about using the health check rpc? #9505 Users already choose to expose the RPC publicly or not. One other option is adding a time stamp to pull requests. |
Gossip has lots of confounding factors that might distort measurement results (was the network slow or the gossip thread busy dealing with a flood?). RPC is TCP and therefore not representative of UDP latency - many ISPs treat UDP differently during congestion. And would have to gather tcp_rtt and tcp_rttvar data from the kernel rather than measuring at the application layer. |
Ok, makes sense. Is this something you want to add? Should be fairly easy since it's fairly independent of the core. We would need to propagate this as a start argument. |
We can build the analytics backend that aggregates and makes sense of the data (much of it already exists as part of another project). As for collecting and exposing the data in Solana, we probably won't have the short- to medium-term engineering capacity to build it. |
@leoluk what about some rules for enabling ICMP for the select validators. folks that want to do this can run the iptables commands to whitelist everyone that has some minimal stake in the network? |
@aeyakovenko Hmm, we can accurately estimate 1:n latency by measuring last-hop latency - this works even if ICMP is blocked. We could ask validators to deploy an active measurement probe alongside their validators, like @brianlong is doing, which collects traceroutes torwards every other node in the network. Should be easy to convince validators to allow ICMP Echo Requests, too. The question is whether this is enough to get an accurate picture of network conditions and detect subclusters - in this example network with one active probe at (a), it would be impossible to measure edges (3) and (4) - the other vertices might be in the same datacenter or continents apart. Having every node in the network measure their respective latencies to every other node, however, would allow for a highly accurate picture of cluster topology. |
@leoluk By "last-hop latency", are you referring to line 15 in the traceroute below?
|
Yes, the downside is that we're measuring the router's CPU usage as well. This means that extra statistical analysis would be necessary. (plus it can be hard to tell whether the latency is at the first or the last hop unless you can measure both directions) |
For reference the ping/pong packets added in #12794 may be utilized for this purpose. We are already maintaining timestamp of pings for rate-limiting purposes: |
as for latency, it should be fairly low (~ 100-150ms) across the mainnet-beta/testnet cluster. I got the number from turbine propagation. More prominent networking condition would be packet drops.. I'm planning to look at it more deeply. |
@leoluk we (bloXroute) are just starting to expand to Solana, but we have years of experience measuring network performance at very granular levels (it matters a lot for DeFi traders) Happy to jam and maybe collaborate if you’re interested |
Problem
End-to-end latency and geographical clustering has a huge impact on cluster performance, but it's impossible to reason about without measuring real end-to-end network latency (scenic routing, assymetric routes, congested peerings, and other unexpected network topologies).
Proposed Solution
Implement an additional UDP server on its own port, such that it has its own queue (perhaps a good use for the storage port?). This service implements a simple, stateless echo request/reply mechanism.
The node sends echo requests to every other node in the network at a low-but-reasonable interval (50ms? 100ms?), compresses the measurements and makes them available in a yet-to-be-determined fashion.
The text was updated successfully, but these errors were encountered: