Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weight concurrent streams by stake #25993

Merged
merged 2 commits into from
Jun 21, 2022

Conversation

lijunwangs
Copy link
Contributor

@lijunwangs lijunwangs commented Jun 15, 2022

Weight concurrent streams by stake for staked nodes

Problem

Summary of Changes

Ported changes from #25056 after address merge conflicts and some refactoring

Fixes #26033

sakridge and others added 2 commits June 15, 2022 09:58
Weight concurrent streams by stake for staked nodes

Fixed some comp issues due to merge
@lijunwangs lijunwangs requested a review from sakridge June 15, 2022 17:54
@codecov
Copy link

codecov bot commented Jun 15, 2022

Codecov Report

Merging #25993 (9534aaf) into master (8caced6) will decrease coverage by 0.0%.
The diff coverage is 82.5%.

@@            Coverage Diff            @@
##           master   #25993     +/-   ##
=========================================
- Coverage    82.1%    82.1%   -0.1%     
=========================================
  Files         628      631      +3     
  Lines      171471   173734   +2263     
=========================================
+ Hits       140878   142650   +1772     
- Misses      30593    31084    +491     

@sakridge sakridge changed the title Add sender stake to quic packets Weight concurrent streams by stake Jun 17, 2022
Copy link
Contributor

@joncinque joncinque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes themselves look great, but I'm very curious about these numbers were chosen

@@ -21,6 +25,8 @@ use {
tokio::{task::JoinHandle, time::timeout},
};

const QUIC_TOTAL_STAKED_CONCURRENT_STREAMS: f64 = 100_000f64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was this number chosen? With 383.3M SOL staked on mainnet, that means every 3833 SOL staked gives you a stream, which means that any node with less than that amount staked gets 0 streams and counts as unstaked. According to Solana Beach, on mainnet, the 1776th staked node has just about enough stake for one stream, which means that ~90% of validators on the network get a stream (including a long tail of delinquent or totally unstaked validators).

So this number might even be a little too high, unless we know that a validator can handle a total of 100k concurrent streams.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100k streams * 1280 bytes per packet would be ~128MB and ~81MB for unstaked (128 * 500 * 1280) so that sounds pretty conservative in terms of packet memory use. Each stream and connection has other additional overhead as well though.

@@ -2,7 +2,7 @@ pub const QUIC_PORT_OFFSET: u16 = 6;
// Empirically found max number of concurrent streams
// that seems to maximize TPS on GCE (higher values don't seem to
// give significant improvement or seem to impact stability)
pub const QUIC_MAX_CONCURRENT_STREAMS: usize = 2048;
pub const QUIC_MAX_UNSTAKED_CONCURRENT_STREAMS: usize = 128;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What impact does this have on TPU clients? Are there some stats about TPS or network performance with and without this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a huge impact of w/o this change in my tpu-client test:

With this branch

2022-06-17T20:47:16.938332781Z INFO solana_bench_tps::bench] ---------------------+---------------+--------------------
[2022-06-17T20:47:16.938343148Z INFO solana_bench_tps::bench] http://35.247.120.58:8899 | 67938.42 | 1213244
[2022-06-17T20:47:16.938348907Z INFO solana_bench_tps::bench]
Average max TPS: 67938.42, 0 nodes had 0 TPS
[2022-06-17T20:47:16.938360601Z INFO solana_bench_tps::bench]
Highest TPS: 67938.42 sampling period 1s max transactions: 1213244 clients: 1 drop rate: 0.60
[2022-06-17T20:47:16.938367384Z INFO solana_bench_tps::bench] Average TPS: 19827.145

[2022-06-17T20:55:40.669179958Z INFO solana_bench_tps::bench] http://35.247.120.58:8899 | 55039.97 | 1311307
[2022-06-17T20:55:40.669184817Z INFO solana_bench_tps::bench]
Average max TPS: 55039.97, 0 nodes had 0 TPS
[2022-06-17T20:55:40.669194603Z INFO solana_bench_tps::bench]
Highest TPS: 55039.97 sampling period 1s max transactions: 1311307 clients: 1 drop rate: 0.29
[2022-06-17T20:55:40.669198742Z INFO solana_bench_tps::bench] Average TPS: 21661.62

[2022-06-18T01:07:46.093266884Z INFO solana_bench_tps::bench] Node address | Max TPS | Total Transactions
[2022-06-18T01:07:46.093280556Z INFO solana_bench_tps::bench] ---------------------+---------------+--------------------
[2022-06-18T01:07:46.093285691Z INFO solana_bench_tps::bench] http://35.247.120.58:8899 | 58969.85 | 1367411
[2022-06-18T01:07:46.093301930Z INFO solana_bench_tps::bench]
Average max TPS: 58969.85, 0 nodes had 0 TPS
[2022-06-18T01:07:46.093292778Z INFO solana_metrics::metrics] datapoint: bench-tps-lamport_balance balance=14373355930776120i
[2022-06-18T01:07:46.093311329Z INFO solana_bench_tps::bench]
Highest TPS: 58969.85 sampling period 1s max transactions: 1367411 clients: 1 drop rate: 0.76
[2022-06-18T01:07:46.093349226Z INFO solana_bench_tps::bench] Average TPS: 22578.842

Master

[2022-06-17T21:33:02.909510265Z INFO solana_bench_tps::bench] Node address | Max TPS | Total Transactions
[2022-06-17T21:33:02.909517556Z INFO solana_bench_tps::bench] ---------------------+---------------+--------------------
[2022-06-17T21:33:02.909520144Z INFO solana_bench_tps::bench] http://35.247.120.58:8899 | 44006.95 | 1102674
[2022-06-17T21:33:02.909535908Z INFO solana_bench_tps::bench]
Average max TPS: 44006.95, 0 nodes had 0 TPS
[2022-06-17T21:33:02.909548109Z INFO solana_bench_tps::bench]
Highest TPS: 44006.95 sampling period 1s max transactions: 1102674 clients: 1 drop rate: 0.35
[2022-06-17T21:33:02.909526414Z INFO solana_metrics::metrics] datapoint: bench-tps-lamport_balance balance=14373355930776120i
[2022-06-17T21:33:02.909555115Z INFO solana_bench_tps::bench] Average TPS: 18090.404

[2022-06-17T21:38:51.706520412Z INFO solana_bench_tps::bench] Token balance: 14373355930776120
[2022-06-17T21:38:51.706553141Z INFO solana_bench_tps::bench] Node address | Max TPS | Total Transactions
[2022-06-17T21:38:51.706559304Z INFO solana_bench_tps::bench] ---------------------+---------------+--------------------
[2022-06-17T21:38:51.706562073Z INFO solana_bench_tps::bench] http://35.247.120.58:8899 | 65728.62 | 1300790
[2022-06-17T21:38:51.706573521Z INFO solana_bench_tps::bench]
Average max TPS: 65728.62, 0 nodes had 0 TPS
[2022-06-17T21:38:51.706581605Z INFO solana_bench_tps::bench]
Highest TPS: 65728.62 sampling period 1s max transactions: 1300790 clients: 1 drop rate: 0.34
[2022-06-17T21:38:51.706595604Z INFO solana_bench_tps::bench] Average TPS: 21306.563

// Total stake and nodes => stake map
#[derive(Default)]
pub struct StakedNodes {
pub total_stake: f64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why is this an f64? Seems like a u64 would make more sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is mostly due to its float arithmetic? Like?

                        VarInt::from_u64(
                            ((stake as f64 / total_stake as f64)
                                * QUIC_TOTAL_STAKED_CONCURRENT_STREAMS)
                                as u64,
                        )
                        .unwrap(),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should just flip the ops and drop the floats. floats are the devil

@lijunwangs
Copy link
Contributor Author

I have done simultaneous tests of using thin-client and rpc-client to target the cluster with bench-tps. The rpc-client trigger txns sends from staked nodes whereas the thin-client triggers txn sends from non-staked. Here are some results:

Simutaneous thin-client and rpc-client

Thin-client
lijun_solana_com@lijun-dev:~/solana$ ./target/release/solana-bench-tps -u http://35.247.120.58:8899 --identity /home/lijun_solana_com/.config/solana/id.json --tx_count 1000 --thread-batch-sleep-ms 0 -t 20 --duration 120 --tpu-use-quic -n 35.247.120.58:8000 --rpc-addr 35.247.120.58:8899 --tpu-addr 35.247.120.58:8003 --read-client-keys /home/lijun_solana_com/gce-keypairs.yaml

[2022-06-16T18:08:49.544484314Z INFO solana_bench_tps::bench] Token balance: 14373355930776120
[2022-06-16T18:08:49.544526202Z INFO solana_bench_tps::bench] Node address | Max TPS | Total Transactions
[2022-06-16T18:08:49.544533154Z INFO solana_bench_tps::bench] ---------------------+---------------+--------------------
[2022-06-16T18:08:49.544535975Z INFO solana_bench_tps::bench] 35.247.120.58:8003 | 47510.64 | 1476330
[2022-06-16T18:08:49.544543994Z INFO solana_bench_tps::bench]
Average max TPS: 47510.64, 0 nodes had 0 TPS
[2022-06-16T18:08:49.544549895Z INFO solana_bench_tps::bench]
Highest TPS: 47510.64 sampling period 1s max transactions: 1476330 clients: 1 drop rate: 0.76
[2022-06-16T18:08:49.544556860Z INFO solana_bench_tps::bench] Average TPS: 11921.062
[2022-06-16T18:08:49.544563201Z INFO solana_metrics::metrics] datapoint: bench-tps-lamport_balance balance=14373355930776120i

[2022-06-17T18:01:54.980784127Z INFO solana_bench_tps::bench] Node address | Max TPS | Total Transactions
[2022-06-17T18:01:54.980791749Z INFO solana_bench_tps::bench] ---------------------+---------------+--------------------
[2022-06-17T18:01:54.980793942Z INFO solana_bench_tps::bench] 35.247.120.58:8003 | 69254.39 | 1598152
[2022-06-17T18:01:54.980804572Z INFO solana_bench_tps::bench]
Average max TPS: 69254.39, 0 nodes had 0 TPS
[2022-06-17T18:01:54.980811390Z INFO solana_bench_tps::bench]
Highest TPS: 69254.39 sampling period 1s max transactions: 1598152 clients: 1 drop rate: 0.66
[2022-06-17T18:01:54.980829010Z INFO solana_bench_tps::bench] Average TPS: 13276.292
[2022-06-17T18:01:54.980809287Z INFO solana_metrics::metrics] datapoint: bench-tps-lamport_balance balance=13746711861552240i

Rpc-client
lijun_solana_com@lijun-dev:~/solana$ ./target/release/solana-bench-tps -u http://35.247.120.58:8899 --identity /home/lijun_solana_com/.config/solana/id.json --tx_count 1000 --thread-batch-sleep-ms 0 -t 20 --duration 120 --tpu-use-quic -n 35.247.120.58:8000 --read-client-keys /home/lijun_solana_com/gce-keypairs.yaml --use-rpc-client

[2022-06-16T18:08:49.526198869Z INFO solana_bench_tps::bench] Node address | Max TPS | Total Transactions
[2022-06-16T18:08:49.526201845Z INFO solana_bench_tps::bench] ---------------------+---------------+--------------------
[2022-06-16T18:08:49.526210733Z INFO solana_bench_tps::bench] http://35.247.120.58:8899 | 49127.05 | 1458235
[2022-06-16T18:08:49.526219566Z INFO solana_bench_tps::bench]
Average max TPS: 49127.05, 0 nodes had 0 TPS
[2022-06-16T18:08:49.526222784Z INFO solana_bench_tps::bench]
Highest TPS: 49127.05 sampling period 1s max transactions: 1458235 clients: 1 drop rate: 0.00
[2022-06-16T18:08:49.526234852Z INFO solana_bench_tps::bench] Average TPS: 12067.749

[2022-06-17T18:01:57.359446137Z INFO solana_bench_tps::bench] Node address | Max TPS | Total Transactions
[2022-06-17T18:01:57.359457629Z INFO solana_bench_tps::bench] ---------------------+---------------+--------------------
[2022-06-17T18:01:57.359460137Z INFO solana_bench_tps::bench] http://35.230.100.91:8899 | 73725.34 | 1598124
[2022-06-17T18:01:57.359468570Z INFO solana_metrics::metrics] datapoint: bench-tps-lamport_balance balance=13746711861552240i
[2022-06-17T18:01:57.359472947Z INFO solana_bench_tps::bench]
Average max TPS: 73725.34, 0 nodes had 0 TPS
[2022-06-17T18:01:57.359518786Z INFO solana_bench_tps::bench]
Highest TPS: 73725.34 sampling period 1s max transactions: 1598124 clients: 1 drop rate: 0.00
[2022-06-17T18:01:57.359523574Z INFO solana_bench_tps::bench] Average TPS: 13298.548

@lijunwangs lijunwangs merged commit 61946a4 into solana-labs:master Jun 21, 2022
mergify bot pushed a commit that referenced this pull request Jun 21, 2022
Weight concurrent streams by stake for staked nodes
Ported changes from #25056 after address merge conflicts and some refactoring

(cherry picked from commit 61946a4)
mergify bot added a commit that referenced this pull request Jun 22, 2022
* Weight concurrent streams by stake (#25993)

Weight concurrent streams by stake for staked nodes
Ported changes from #25056 after address merge conflicts and some refactoring

(cherry picked from commit 61946a4)

* Updated quinn version to fix the comp issue with merge

* Fixed a missue Cargo.lock file

Co-authored-by: Lijun Wang <[email protected]>
@@ -49,7 +49,7 @@ pub(crate) fn configure_server(
let config = Arc::get_mut(&mut server_config.transport).unwrap();

// QUIC_MAX_CONCURRENT_STREAMS doubled, which was found to improve reliability
const MAX_CONCURRENT_UNI_STREAMS: u32 = (QUIC_MAX_CONCURRENT_STREAMS * 2) as u32;
const MAX_CONCURRENT_UNI_STREAMS: u32 = (QUIC_MAX_UNSTAKED_CONCURRENT_STREAMS * 2) as u32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated. i can't help but ask why this doubling is here rather than SDK where the variable is declared?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryleung-solana can you please remove the doubling now that streams don't have head-of-line blocking? #26086

@ryoqun
Copy link
Contributor

ryoqun commented Jun 29, 2022

(seems there's no tests in this pr...? ref: #26312)

@jstarry
Copy link
Member

jstarry commented Jun 30, 2022

I don't think it makes sense for unstaked connections to be able to use 128 concurrent streams each because they should be less prioritized than staked connections. In order for a staked connection to be allowed to use 128 concurrent streams, it would need (128 / 100,000) * 391M = 500,480 SOL staked. Only the top 100 staked validators have that much SOL (source: https://www.validators.app/validators?locale=en&network=mainnet&order=stake&page=4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QUIC server allows connections to use too many resources
6 participants