-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node identity for bench #29929
Node identity for bench #29929
Conversation
bench-tps/src/main.rs
Outdated
.update_client_certificate(client_node_id, bind_address) | ||
.expect("Failed to update QUIC client certificates"); | ||
|
||
let staked_nodes = Arc::new(RwLock::new(StakedNodes::default())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add an entry to map client's identity <-> stake. Otherwise, the client will treat itself as unstaked, and won't try to open more streams.
For reference:
solana/client/src/connection_cache.rs
Line 147 in 180ea1e
self.maybe_staked_nodes.as_ref().map_or( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the unit tests, I see that both total_stake
and pubkey_stake_map
are set.
If this is true, shall I pass this information as cli arguments to bench-tps? Like:
let staked_nodes = Arc::new(RwLock::new(StakedNodes::default()));
connection_cache.set_staked_nodes(&staked_nodes, &client_node_id.pubkey());
staked_nodes.write().unwrap().total_stake = total_stake; // <- here
staked_nodes
.write()
.unwrap()
.pubkey_stake_map
.insert(client_node_id.pubkey(), stake); // <- here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, taking these values as CLI argument would be good.
For myself: don't forget to move |
@pgarg66 I've added |
@KirillLykov, this function is used for computing number of streams a client can open. solana/client/src/connection_cache.rs Line 143 in c06053f
It uses client's stake, and total stake in the computation. So need both values to have an ideal test condition. |
In this case, a client can specify value of total_stake which is smaller than the real value of this parameter to increase number of concurrent streams on the client side to max (ie I wonder if there is a way to request total stake from network so there is no need in additional argument. |
c204726
to
79dffd8
Compare
I think CLI has a few commands that can give you stake information. So client can use that to find the stake (total, as well it's own). For example, these commands return stake information
|
I thought about requesting this information in the code instead of taking it from cli, but probably better to go with these changes as it is and as a follow up add stake request in the code. |
b659990
to
270d942
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple small comments on the cli integration. I'll defer to pgarg66 and lijunwangs as to whether the actual StakedNodes handling is correct.
bench-tps/src/cli.rs
Outdated
.help( | ||
"Stake of the client_node_id", | ||
), | ||
) | ||
.arg( | ||
Arg::with_name("client_node_total_stake") | ||
.long("client-node-total-stake") | ||
.value_name("LAMPORTS") | ||
.takes_value(true) | ||
.help( | ||
"Total stake of the client_node_id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you edit these help texts to explain the difference between these two args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these two, can we obtain from the network?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to add it as follow up PR, but can add here, maybe even simpler.
My general concern is that I think that we don't need to set up CC parameters on client side. Because there is one way quic connections and client doesn't receive much so receive window should not really matter. I don't observe effect of setting up these parameter on the client in my limited experiments (haven't dig into that much)
cf860d3
to
c563daf
Compare
@lijunwangs I've got rid of total_stake and stake cli args and, instead, added code which gets this information from the network. |
bench-tps/src/main.rs
Outdated
.current | ||
.iter() | ||
.find(|&vote_account| vote_account.node_pubkey == node_id_as_str) | ||
.map_or(Err(()), |value| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we log this error if node is not found for better debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
bench-tps/src/main.rs
Outdated
return ConnectionCache::with_udp(tpu_connection_pool_size); | ||
} | ||
if client_node_id.is_none() { | ||
return ConnectionCache::new_with_client_options( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably neater to call the ::new function like before in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
bench-tps/src/cli.rs
Outdated
@@ -513,5 +534,55 @@ pub fn extract_args(matches: &ArgMatches) -> Config { | |||
); | |||
} | |||
|
|||
if let Some(addr) = matches.value_of("bind_address") { | |||
args.bind_address = solana_net_utils::parse_host(addr).unwrap_or_else(|e| { | |||
eprintln!("failed to parse bind_address address: {e}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> Failed to parse bind_address:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
bench-tps/src/cli.rs
Outdated
if let Ok(node_id) = read_keypair_file(node_id_path) { | ||
args.client_node_id = Some(node_id); | ||
} else if matches.is_present("client_node_id") { | ||
panic!("could not parse identity path"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> Could not parse identity path
Can you use match and actually print the real error returned from read_keypair_file as there can be multiple reasons for the errors like permission, file not found , not protected and etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can use clap validator for this case. I want to rewrite this cli parsing code in the follow up PR completely #30307.
bench-tps/src/cli.rs
Outdated
.value_name("PATH") | ||
.takes_value(true) | ||
.requires("json_rpc_url") | ||
.help("File containing a node id (keypair) of a validator with active stake. This allows communicating with network using staked connection"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> "File containing the node identity keypair ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
bench-tps/src/main.rs
Outdated
let total_active_stake: u64 = vote_accounts | ||
.current | ||
.iter() | ||
.chain(vote_accounts.delinquent.iter()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with this business logic, why do we add the delinquent here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I thought total active stake includes delinquent nodes stake (by doing experiments primarily). And also here it is done this way https://github.com/solana-labs/solana/blame/5108350710b42f8bc2950395a067dc1d15213ba4/cli/src/cluster_query.rs#L1903
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total stake is coming from StakedNodes
structure https://github.com/solana-labs/solana/blob/master/streamer/src/streamer.rs#L30, @lijunwangs do you think it is supposed to take into account delinquent nodes stake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, the current validator code does not include the delinquent stake I think. Can you use delinquent stake to do communications/vote? If so, it should be included, otherwise we probably should not include.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the way it works in the TPU side, it looks like there is no filter on delinquent https://github.com/solana-labs/solana/blob/master/core/src/staked_nodes_updater_service.rs#L82
But I've asked on discord.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like although atm delinquent stakes are taken into account, they are going to be unstaked #24302
Due to that, have changed this code to take into account only normal nodes
fc6625e
to
7c39a9f
Compare
bench-tps/src/cli.rs
Outdated
.takes_value(true) | ||
.requires("json_rpc_url") | ||
.validator(is_keypair) | ||
.help("File containing the node id (keypair) of a validator with active stake. This allows communicating with network using staked connection"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id --> identity. Normally id means identification in computer programs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
d0bcf7a
to
19384e5
Compare
19384e5
to
ca2c4e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good other than the id vs identity naming. I look forward to another PR for it.
* add beind_address and client_node_id to bench cli * use provided node_id and bind_address in connection cache * add two cli args client_node_stake and client_node_total_stake * update connection cache construction after upstream update * use ConnectionCache without Arc to use BackendConnectionCache * remove comments * Extend client_node_od cli arg help message * address PR comments * simplified staked_nodes creation * remove delinquent nodes when computing total stake at bench-tps
Problem
Currently node_id is generated randomly, this complicates testing if we want to use staked connection with bench-tps.
I guess it might be possible to getbind_address
in the code using some wrapper forgetifaddrs
, yet it looks like we usually specify this as cli argument.Summary of Changes
bind_address
andclient_node_id
to cliConnectionCache