Node identity for bench #29929

KirillLykov · 2023-01-26T09:26:45Z

Problem

Currently node_id is generated randomly, this complicates testing if we want to use staked connection with bench-tps.
~~I guess it might be possible to get bind_address in the code using some wrapper for getifaddrs, yet it looks like we usually specify this as cli argument.~~

Summary of Changes

add two additional parameters bind_address and client_node_id to cli
add code to get total active stake and stake information from the network
use this data when constructing ConnectionCache

bench-tps/src/cli.rs

pgarg66 · 2023-01-26T16:30:43Z

bench-tps/src/main.rs

+                        .update_client_certificate(client_node_id, bind_address)
+                        .expect("Failed to update QUIC client certificates");
+
+                    let staked_nodes = Arc::new(RwLock::new(StakedNodes::default()));


We need to add an entry to map client's identity <-> stake. Otherwise, the client will treat itself as unstaked, and won't try to open more streams.
For reference:

solana/client/src/connection_cache.rs

Line 147 in 180ea1e

self.maybe_staked_nodes.as_ref().map_or(

In the unit tests, I see that both total_stake and pubkey_stake_map are set.
If this is true, shall I pass this information as cli arguments to bench-tps? Like:

let staked_nodes = Arc::new(RwLock::new(StakedNodes::default())); connection_cache.set_staked_nodes(&staked_nodes, &client_node_id.pubkey()); staked_nodes.write().unwrap().total_stake = total_stake; // <- here staked_nodes .write() .unwrap() .pubkey_stake_map .insert(client_node_id.pubkey(), stake); // <- here

Yes, taking these values as CLI argument would be good.

KirillLykov · 2023-01-27T16:14:23Z

For myself: don't forget to move ConnectionCache outside of the match expression to use these parameters regardless of client type

KirillLykov · 2023-02-01T17:07:59Z

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup.
Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

pgarg66 · 2023-02-01T17:23:45Z

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@KirillLykov, this function is used for computing number of streams a client can open.

solana/client/src/connection_cache.rs

Line 143 in c06053f

fn compute_max_parallel_streams(&self) -> usize {

It uses client's stake, and total stake in the computation. So need both values to have an ideal test condition.
Total stake should be equal to the amount that staked in the cluster. Local stake should be how much is staked on the node whose identity you are using in the client.

KirillLykov · 2023-02-02T12:11:00Z

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@KirillLykov, this function is used for computing number of streams a client can open.

solana/client/src/connection_cache.rs

Line 143 in c06053f

fn compute_max_parallel_streams(&self) -> usize {

It uses client's stake, and total stake in the computation. So need both values to have an ideal test condition. Total stake should be equal to the amount that staked in the cluster. Local stake should be how much is staked on the node whose identity you are using in the client.

In this case, a client can specify value of total_stake which is smaller than the real value of this parameter to increase number of concurrent streams on the client side to max (ie total_stale == stake). I suppose that in this case validator will drop extra streams from this client.

I wonder if there is a way to request total stake from network so there is no need in additional argument.

pgarg66 · 2023-02-02T16:28:59Z

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@KirillLykov, this function is used for computing number of streams a client can open.

solana/client/src/connection_cache.rs

Line 143 in c06053f

fn compute_max_parallel_streams(&self) -> usize {

It uses client's stake, and total stake in the computation. So need both values to have an ideal test condition. Total stake should be equal to the amount that staked in the cluster. Local stake should be how much is staked on the node whose identity you are using in the client.

In this case, a client can specify value of total_stake which is smaller than the real value of this parameter to increase number of concurrent streams on the client side to max (ie total_stale == stake). I suppose that in this case validator will drop extra streams from this client.

I wonder if there is a way to request total stake from network so there is no need in additional argument.

I think CLI has a few commands that can give you stake information. So client can use that to find the stake (total, as well it's own).

For example, these commands return stake information

solana stake-history
solana stakes

KirillLykov · 2023-02-06T09:29:21Z

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@KirillLykov, this function is used for computing number of streams a client can open.

solana/client/src/connection_cache.rs

Line 143 in c06053f

fn compute_max_parallel_streams(&self) -> usize {

It uses client's stake, and total stake in the computation. So need both values to have an ideal test condition. Total stake should be equal to the amount that staked in the cluster. Local stake should be how much is staked on the node whose identity you are using in the client.

In this case, a client can specify value of total_stake which is smaller than the real value of this parameter to increase number of concurrent streams on the client side to max (ie total_stale == stake). I suppose that in this case validator will drop extra streams from this client.
I wonder if there is a way to request total stake from network so there is no need in additional argument.

I think CLI has a few commands that can give you stake information. So client can use that to find the stake (total, as well it's own).

For example, these commands return stake information
solana stake-history
solana stakes

I thought about requesting this information in the code instead of taking it from cli, but probably better to go with these changes as it is and as a follow up add stake request in the code.

CriesofCarrots

Just a couple small comments on the cli integration. I'll defer to pgarg66 and lijunwangs as to whether the actual StakedNodes handling is correct.

bench-tps/src/main.rs

bench-tps/src/cli.rs

CriesofCarrots · 2023-02-06T18:44:38Z

bench-tps/src/cli.rs

+                .help(
+                    "Stake of the client_node_id",
+                ),
+        )
+        .arg(
+            Arg::with_name("client_node_total_stake")
+                .long("client-node-total-stake")
+                .value_name("LAMPORTS")
+                .takes_value(true)
+                .help(
+                    "Total stake of the client_node_id",


Can you edit these help texts to explain the difference between these two args?

Why do we need these two, can we obtain from the network?

I wanted to add it as follow up PR, but can add here, maybe even simpler.

My general concern is that I think that we don't need to set up CC parameters on client side. Because there is one way quic connections and client doesn't receive much so receive window should not really matter. I don't observe effect of setting up these parameter on the client in my limited experiments (haven't dig into that much)

KirillLykov · 2023-02-13T15:03:59Z

@lijunwangs I've got rid of total_stake and stake cli args and, instead, added code which gets this information from the network.

bench-tps/src/main.rs

lijunwangs · 2023-02-13T18:14:44Z

bench-tps/src/main.rs

+        .current
+        .iter()
+        .find(|&vote_account| vote_account.node_pubkey == node_id_as_str)
+        .map_or(Err(()), |value| {


Can we log this error if node is not found for better debug?

lijunwangs · 2023-02-13T18:29:39Z

bench-tps/src/main.rs

+        return ConnectionCache::with_udp(tpu_connection_pool_size);
+    }
+    if client_node_id.is_none() {
+        return ConnectionCache::new_with_client_options(


probably neater to call the ::new function like before in this case.

lijunwangs · 2023-02-13T18:31:29Z

bench-tps/src/cli.rs

@@ -513,5 +534,55 @@ pub fn extract_args(matches: &ArgMatches) -> Config {
        );
    }

+    if let Some(addr) = matches.value_of("bind_address") {
+        args.bind_address = solana_net_utils::parse_host(addr).unwrap_or_else(|e| {
+            eprintln!("failed to parse bind_address address: {e}");


--> Failed to parse bind_address:

lijunwangs · 2023-02-13T18:33:45Z

bench-tps/src/cli.rs

+    if let Ok(node_id) = read_keypair_file(node_id_path) {
+        args.client_node_id = Some(node_id);
+    } else if matches.is_present("client_node_id") {
+        panic!("could not parse identity path");


-> Could not parse identity path

Can you use match and actually print the real error returned from read_keypair_file as there can be multiple reasons for the errors like permission, file not found , not protected and etc.?

I think I can use clap validator for this case. I want to rewrite this cli parsing code in the follow up PR completely #30307.

lijunwangs · 2023-02-13T18:39:40Z

bench-tps/src/cli.rs

+                .value_name("PATH")
+                .takes_value(true)
+                .requires("json_rpc_url")
+                .help("File containing a node id (keypair) of a validator with active stake. This allows communicating with network using staked connection"),


-> "File containing the node identity keypair ..."

lijunwangs · 2023-02-13T18:44:38Z

bench-tps/src/main.rs

+    let total_active_stake: u64 = vote_accounts
+        .current
+        .iter()
+        .chain(vote_accounts.delinquent.iter())


I am not familiar with this business logic, why do we add the delinquent here?

Good question, I thought total active stake includes delinquent nodes stake (by doing experiments primarily). And also here it is done this way https://github.com/solana-labs/solana/blame/5108350710b42f8bc2950395a067dc1d15213ba4/cli/src/cluster_query.rs#L1903

Total stake is coming from StakedNodes structure https://github.com/solana-labs/solana/blob/master/streamer/src/streamer.rs#L30, @lijunwangs do you think it is supposed to take into account delinquent nodes stake?

I am not sure, the current validator code does not include the delinquent stake I think. Can you use delinquent stake to do communications/vote? If so, it should be included, otherwise we probably should not include.

If I understand the way it works in the TPU side, it looks like there is no filter on delinquent https://github.com/solana-labs/solana/blob/master/core/src/staked_nodes_updater_service.rs#L82
But I've asked on discord.

So it looks like although atm delinquent stakes are taken into account, they are going to be unstaked #24302
Due to that, have changed this code to take into account only normal nodes

lijunwangs · 2023-02-16T17:53:08Z

bench-tps/src/cli.rs

+                .takes_value(true)
+                .requires("json_rpc_url")
+                .validator(is_keypair)
+                .help("File containing the node id (keypair) of a validator with active stake. This allows communicating with network using staked connection"),


id --> identity. Normally id means identification in computer programs.

lijunwangs

Looks good other than the id vs identity naming. I look forward to another PR for it.

bench-tps/src/main.rs

* add beind_address and client_node_id to bench cli * use provided node_id and bind_address in connection cache * add two cli args client_node_stake and client_node_total_stake * update connection cache construction after upstream update * use ConnectionCache without Arc to use BackendConnectionCache * remove comments * Extend client_node_od cli arg help message * address PR comments * simplified staked_nodes creation * remove delinquent nodes when computing total stake at bench-tps

KirillLykov requested a review from pgarg66 January 26, 2023 09:34

This was referenced Jan 26, 2023

Pass node_id to the client godmodegalactus/mango_bencher#11

Closed

Test large transactions on private cluster #28063

Open

pgarg66 reviewed Jan 26, 2023

View reviewed changes

bench-tps/src/cli.rs Show resolved Hide resolved

pgarg66 reviewed Jan 26, 2023

View reviewed changes

KirillLykov force-pushed the node_identity_for_bench branch from c204726 to 79dffd8 Compare February 2, 2023 12:11

KirillLykov requested a review from pgarg66 February 2, 2023 12:12

KirillLykov requested a review from lijunwangs February 6, 2023 09:27

KirillLykov force-pushed the node_identity_for_bench branch 3 times, most recently from b659990 to 270d942 Compare February 6, 2023 15:21

KirillLykov requested a review from CriesofCarrots February 6, 2023 17:15

CriesofCarrots reviewed Feb 6, 2023

View reviewed changes

KirillLykov force-pushed the node_identity_for_bench branch 3 times, most recently from cf860d3 to c563daf Compare February 13, 2023 14:59

lijunwangs reviewed Feb 13, 2023

View reviewed changes

KirillLykov force-pushed the node_identity_for_bench branch from fc6625e to 7c39a9f Compare February 14, 2023 09:22

KirillLykov requested a review from lijunwangs February 15, 2023 07:31

lijunwangs reviewed Feb 16, 2023

View reviewed changes

KirillLykov force-pushed the node_identity_for_bench branch 3 times, most recently from d0bcf7a to 19384e5 Compare February 17, 2023 19:09

Kirill Lykov added 12 commits February 17, 2023 20:52

add beind_address and client_node_id to bench cli

04bdb8e

use provided node_id and bind_address in connection cache

f211470

add two cli args client_node_stake and client_node_total_stake

c3cb525

update connection cache construction after upstream update

9a074e8

use ConnectionCache without Arc to use BackendConnectionCache

2fc2811

save

cdd5555

save

8a6e31a

remove comments

3800869

Extend client_node_od cli arg help message

66ba602

address PR comments

14444eb

simplified staked_nodes creation

ba27243

remove delinquent nodes when computing total stake at bench-tps

ca2c4e3

KirillLykov force-pushed the node_identity_for_bench branch from 19384e5 to ca2c4e3 Compare February 17, 2023 19:52

lijunwangs approved these changes Feb 17, 2023

View reviewed changes

bench-tps/src/main.rs Show resolved Hide resolved

KirillLykov merged commit 069ebb8 into solana-labs:master Feb 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node identity for bench #29929

Node identity for bench #29929

KirillLykov commented Jan 26, 2023 •

edited

Loading

pgarg66 Jan 26, 2023

KirillLykov Jan 27, 2023

pgarg66 Jan 27, 2023

KirillLykov commented Jan 27, 2023

KirillLykov commented Feb 1, 2023

pgarg66 commented Feb 1, 2023

KirillLykov commented Feb 2, 2023

pgarg66 commented Feb 2, 2023

KirillLykov commented Feb 6, 2023

CriesofCarrots left a comment

CriesofCarrots Feb 6, 2023

lijunwangs Feb 6, 2023

KirillLykov Feb 7, 2023

KirillLykov commented Feb 13, 2023

lijunwangs Feb 13, 2023

KirillLykov Feb 14, 2023

lijunwangs Feb 13, 2023

KirillLykov Feb 14, 2023

lijunwangs Feb 13, 2023

KirillLykov Feb 14, 2023

lijunwangs Feb 13, 2023

KirillLykov Feb 13, 2023 •

edited

Loading

lijunwangs Feb 13, 2023

KirillLykov Feb 14, 2023

lijunwangs Feb 13, 2023

KirillLykov Feb 13, 2023 •

edited

Loading

KirillLykov Feb 14, 2023

lijunwangs Feb 16, 2023

KirillLykov Feb 17, 2023

KirillLykov Feb 17, 2023 •

edited

Loading

lijunwangs Feb 16, 2023

KirillLykov Feb 17, 2023

lijunwangs left a comment

Node identity for bench #29929

Node identity for bench #29929

Conversation

KirillLykov commented Jan 26, 2023 • edited Loading

Problem

Summary of Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KirillLykov commented Jan 27, 2023

KirillLykov commented Feb 1, 2023

pgarg66 commented Feb 1, 2023

KirillLykov commented Feb 2, 2023

pgarg66 commented Feb 2, 2023

KirillLykov commented Feb 6, 2023

CriesofCarrots left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KirillLykov commented Feb 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KirillLykov Feb 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KirillLykov Feb 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KirillLykov Feb 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lijunwangs left a comment

Choose a reason for hiding this comment

KirillLykov commented Jan 26, 2023 •

edited

Loading

KirillLykov Feb 13, 2023 •

edited

Loading

KirillLykov Feb 13, 2023 •

edited

Loading

KirillLykov Feb 17, 2023 •

edited

Loading