[RfC] Node-to-node network latency measurement #10084

leoluk · 2020-05-16T20:54:59Z

Problem

End-to-end latency and geographical clustering has a huge impact on cluster performance, but it's impossible to reason about without measuring real end-to-end network latency (scenic routing, assymetric routes, congested peerings, and other unexpected network topologies).

Proposed Solution

Implement an additional UDP server on its own port, such that it has its own queue (perhaps a good use for the storage port?). This service implements a simple, stateless echo request/reply mechanism.

The node sends echo requests to every other node in the network at a low-but-reasonable interval (50ms? 100ms?), compresses the measurements and makes them available in a yet-to-be-determined fashion.

aeyakovenko · 2020-05-16T21:44:00Z

you don't think we can measure this from the votes?

#[derive(Serialize, Default, Deserialize, Debug, PartialEq, Eq, Clone)]
pub struct Vote {
    /// A stack of votes starting with the oldest vote
    pub slots: Vec<Slot>,
    /// signature of the bank's state at the last slot
    pub hash: Hash,
    /// processing timestamp of last slot
    pub timestamp: Option<UnixTimestamp>,
}

they are already in CRDS. periodically every node sets the timestamp

leoluk · 2020-05-17T10:37:35Z

The gossip network is not necessarily a full mesh (or is it?) and the timestamps are subject to gossip propagation delay, gossip queue congestion and vote timing. A dedicated echo service wouldn't have any of these confounding factors and measure only the network latency from each node to every other node.

Updated the title to clarify.

aeyakovenko · 2020-05-17T18:57:38Z

Ah I see. You want to sample rtt time between any two nodes. We probably need a separate message for that. With eager push in gossip only a subset of the nodes will be 1 hop away. I think the gossip fanout is 6.

leoluk · 2020-05-17T19:03:39Z

Yes, hence the proposal to have a separate UDP service for the echo server. That way there'll be a separate queue and we can better differentiate between network latency and application layer congestion (like a gossip flood/queue drops).

brianlong · 2020-05-18T03:43:49Z

FWIW, I am currently recording ping times from my TdS node to all the others. Not everyone is responding to ping, so the data is incomplete, but it does start to give an indication of which nodes are fast or slow (from my node currently in NYC). It will be awesome to aggregate similar data from other nodes.

At the moment, I am running the Ruby script in a single thread every 10 minutes. I intend to use the data for general curiosity & reporting purposes, so I didn't see the need for a shorter sample period. I expect to see some correlation between a node's average network performance and skipped blocks. I haven't done that analysis yet...

I can see a faster sample rate being helpful when Rampy returns.

I am not a Rust developer, but I will do what I can to help!

-- BL

aeyakovenko · 2020-05-20T03:31:49Z

What about using the health check rpc? #9505 Users already choose to expose the RPC publicly or not. One other option is adding a time stamp to pull requests.

leoluk · 2020-05-20T11:21:41Z

Gossip has lots of confounding factors that might distort measurement results (was the network slow or the gossip thread busy dealing with a flood?).

RPC is TCP and therefore not representative of UDP latency - many ISPs treat UDP differently during congestion. And would have to gather tcp_rtt and tcp_rttvar data from the kernel rather than measuring at the application layer.

aeyakovenko · 2020-05-20T17:08:07Z

Ok, makes sense. Is this something you want to add? Should be fairly easy since it's fairly independent of the core. We would need to propagate this as a start argument.

leoluk · 2020-05-20T20:23:31Z

We can build the analytics backend that aggregates and makes sense of the data (much of it already exists as part of another project). As for collecting and exposing the data in Solana, we probably won't have the short- to medium-term engineering capacity to build it.

aeyakovenko · 2020-05-21T03:23:27Z

@leoluk what about some rules for enabling ICMP for the select validators. folks that want to do this can run the iptables commands to whitelist everyone that has some minimal stake in the network?

leoluk · 2020-05-21T09:14:01Z

@aeyakovenko Hmm, we can accurately estimate 1:n latency by measuring last-hop latency - this works even if ICMP is blocked. We could ask validators to deploy an active measurement probe alongside their validators, like @brianlong is doing, which collects traceroutes torwards every other node in the network. Should be easy to convince validators to allow ICMP Echo Requests, too.

The question is whether this is enough to get an accurate picture of network conditions and detect subclusters - in this example network with one active probe at (a), it would be impossible to measure edges (3) and (4) - the other vertices might be in the same datacenter or continents apart.

Having every node in the network measure their respective latencies to every other node, however, would allow for a highly accurate picture of cluster topology.

brianlong · 2020-05-21T13:21:39Z

@leoluk By "last-hop latency", are you referring to line 15 in the traceroute below?

brianlong@solana-tds:~$ traceroute testnet.solana.com
traceroute to testnet.solana.com (216.24.140.155), 30 hops max, 60 byte packets
 1  165.227.96.253 (165.227.96.253)  8.777 ms  8.775 ms  8.768 ms
 2  138.197.248.8 (138.197.248.8)  0.933 ms 138.197.248.28 (138.197.248.28)  0.200 ms 138.197.248.8 (138.197.248.8)  0.294 ms
 3  nyk-b3-link.telia.net (62.115.45.5)  0.821 ms nyk-b3-link.telia.net (62.115.45.9)  2.461 ms nyk-b3-link.telia.net (62.115.45.5)  0.834 ms
 4  * * *
 5  nyk-b2-link.telia.net (62.115.137.99)  1.327 ms nyk-b2-link.telia.net (213.155.130.28)  1.974 ms nyk-b2-link.telia.net (62.115.137.99)  1.405 ms
 6  viawest-ic-350578-nyk-b2.c.telia.net (62.115.181.147)  2.186 ms  2.150 ms  2.000 ms
 7  be21.bbrt01.ewr01.flexential.net (148.66.237.190)  39.777 ms  39.746 ms  39.746 ms
 8  be110.bbrt02.chi01.flexential.net (66.51.5.149)  40.185 ms  39.959 ms  39.881 ms
 9  be10.bbrt01.chi01.flexential.net (66.51.5.117)  39.778 ms  39.736 ms  39.779 ms
10  be105.bbrt01.den05.flexential.net (66.51.5.106)  40.339 ms  40.383 ms  40.402 ms
11  be155.bbrt01.den02.flexential.net (148.66.236.209)  40.377 ms  40.045 ms  40.046 ms
12  be10.bbrt02.den02.flexential.net (148.66.237.41)  39.963 ms  40.347 ms  40.168 ms
13  po32.crsw02.den02.viawest.net (148.66.237.45)  39.870 ms  39.607 ms  39.714 ms
14  te7-1.aggm02.den02.flexential.net (148.66.236.227)  40.234 ms  40.020 ms  40.077 ms
15  usr3-ppp20.lvdi.net (216.24.140.148)  39.712 ms  39.483 ms  39.447 ms
16  * * *

leoluk · 2020-05-21T13:32:00Z

Yes, the downside is that we're measuring the router's CPU usage as well. This means that extra statistical analysis would be necessary.

(plus it can be hard to tell whether the latency is at the first or the last hop unless you can measure both directions)

behzadnouri · 2020-11-19T14:46:48Z

For reference the ping/pong packets added in #12794 may be utilized for this purpose. We are already maintaining timestamp of pings for rate-limiting purposes:
https://github.com/solana-labs/solana/blob/83799356d/core/src/ping_pong.rs#L33-L35
and may compare against the instant the pong packet arrives.

ryoqun · 2021-07-29T16:32:40Z

as for latency, it should be fairly low (~ 100-150ms) across the mainnet-beta/testnet cluster. I got the number from turbine propagation.

More prominent networking condition would be packet drops.. I'm planning to look at it more deeply.

uri-bloXroute · 2022-04-06T15:41:30Z

@leoluk we (bloXroute) are just starting to expand to Solana, but we have years of experience measuring network performance at very granular levels (it matters a lot for DeFi traders)

Happy to jam and maybe collaborate if you’re interested

leoluk changed the title ~~[RfC] End-to-end latency measurement~~ [RfC] Node-to-node network latency measurement May 17, 2020

mvines added this to the The Future! milestone May 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RfC] Node-to-node network latency measurement #10084

[RfC] Node-to-node network latency measurement #10084

leoluk commented May 16, 2020 •

edited

Loading

aeyakovenko commented May 16, 2020

leoluk commented May 17, 2020 •

edited

Loading

aeyakovenko commented May 17, 2020

leoluk commented May 17, 2020 •

edited

Loading

brianlong commented May 18, 2020

aeyakovenko commented May 20, 2020

leoluk commented May 20, 2020 •

edited

Loading

aeyakovenko commented May 20, 2020 •

edited

Loading

leoluk commented May 20, 2020

aeyakovenko commented May 21, 2020

leoluk commented May 21, 2020

brianlong commented May 21, 2020

leoluk commented May 21, 2020 •

edited

Loading

behzadnouri commented Nov 19, 2020

ryoqun commented Jul 29, 2021 •

edited

Loading

uri-bloXroute commented Apr 6, 2022

[RfC] Node-to-node network latency measurement #10084

[RfC] Node-to-node network latency measurement #10084

Comments

leoluk commented May 16, 2020 • edited Loading

Problem

Proposed Solution

aeyakovenko commented May 16, 2020

leoluk commented May 17, 2020 • edited Loading

aeyakovenko commented May 17, 2020

leoluk commented May 17, 2020 • edited Loading

brianlong commented May 18, 2020

aeyakovenko commented May 20, 2020

leoluk commented May 20, 2020 • edited Loading

aeyakovenko commented May 20, 2020 • edited Loading

leoluk commented May 20, 2020

aeyakovenko commented May 21, 2020

leoluk commented May 21, 2020

brianlong commented May 21, 2020

leoluk commented May 21, 2020 • edited Loading

behzadnouri commented Nov 19, 2020

ryoqun commented Jul 29, 2021 • edited Loading

uri-bloXroute commented Apr 6, 2022

leoluk commented May 16, 2020 •

edited

Loading

leoluk commented May 17, 2020 •

edited

Loading

leoluk commented May 17, 2020 •

edited

Loading

leoluk commented May 20, 2020 •

edited

Loading

aeyakovenko commented May 20, 2020 •

edited

Loading

leoluk commented May 21, 2020 •

edited

Loading

ryoqun commented Jul 29, 2021 •

edited

Loading