Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node identity for bench #29929

Merged
merged 12 commits into from
Feb 18, 2023

Conversation

KirillLykov
Copy link
Contributor

@KirillLykov KirillLykov commented Jan 26, 2023

Problem

Currently node_id is generated randomly, this complicates testing if we want to use staked connection with bench-tps.
I guess it might be possible to get bind_address in the code using some wrapper for getifaddrs, yet it looks like we usually specify this as cli argument.

Summary of Changes

  • add two additional parameters bind_address and client_node_id to cli
  • add code to get total active stake and stake information from the network
  • use this data when constructing ConnectionCache

.update_client_certificate(client_node_id, bind_address)
.expect("Failed to update QUIC client certificates");

let staked_nodes = Arc::new(RwLock::new(StakedNodes::default()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add an entry to map client's identity <-> stake. Otherwise, the client will treat itself as unstaked, and won't try to open more streams.
For reference:

self.maybe_staked_nodes.as_ref().map_or(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the unit tests, I see that both total_stake and pubkey_stake_map are set.
If this is true, shall I pass this information as cli arguments to bench-tps? Like:

                    let staked_nodes = Arc::new(RwLock::new(StakedNodes::default()));
                    connection_cache.set_staked_nodes(&staked_nodes, &client_node_id.pubkey());
                    staked_nodes.write().unwrap().total_stake = total_stake; // <- here

                    staked_nodes
                        .write()
                        .unwrap()
                        .pubkey_stake_map
                        .insert(client_node_id.pubkey(), stake); // <- here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, taking these values as CLI argument would be good.

@KirillLykov
Copy link
Contributor Author

For myself: don't forget to move ConnectionCache outside of the match expression to use these parameters regardless of client type

@KirillLykov
Copy link
Contributor Author

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup.
Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@pgarg66
Copy link
Contributor

pgarg66 commented Feb 1, 2023

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@KirillLykov, this function is used for computing number of streams a client can open.

fn compute_max_parallel_streams(&self) -> usize {

It uses client's stake, and total stake in the computation. So need both values to have an ideal test condition.
Total stake should be equal to the amount that staked in the cluster. Local stake should be how much is staked on the node whose identity you are using in the client.

@KirillLykov
Copy link
Contributor Author

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@KirillLykov, this function is used for computing number of streams a client can open.

fn compute_max_parallel_streams(&self) -> usize {

It uses client's stake, and total stake in the computation. So need both values to have an ideal test condition. Total stake should be equal to the amount that staked in the cluster. Local stake should be how much is staked on the node whose identity you are using in the client.

In this case, a client can specify value of total_stake which is smaller than the real value of this parameter to increase number of concurrent streams on the client side to max (ie total_stale == stake). I suppose that in this case validator will drop extra streams from this client.

I wonder if there is a way to request total stake from network so there is no need in additional argument.

@KirillLykov KirillLykov force-pushed the node_identity_for_bench branch from c204726 to 79dffd8 Compare February 2, 2023 12:11
@KirillLykov KirillLykov requested a review from pgarg66 February 2, 2023 12:12
@pgarg66
Copy link
Contributor

pgarg66 commented Feb 2, 2023

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@KirillLykov, this function is used for computing number of streams a client can open.

fn compute_max_parallel_streams(&self) -> usize {

It uses client's stake, and total stake in the computation. So need both values to have an ideal test condition. Total stake should be equal to the amount that staked in the cluster. Local stake should be how much is staked on the node whose identity you are using in the client.

In this case, a client can specify value of total_stake which is smaller than the real value of this parameter to increase number of concurrent streams on the client side to max (ie total_stale == stake). I suppose that in this case validator will drop extra streams from this client.

I wonder if there is a way to request total stake from network so there is no need in additional argument.

I think CLI has a few commands that can give you stake information. So client can use that to find the stake (total, as well it's own).

For example, these commands return stake information

solana stake-history
solana stakes

@KirillLykov
Copy link
Contributor Author

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@pgarg66 I've added total_stake and stake to the cli and also propagated them to the ConnectionCache setup. Yet I'm not sure now if we need to have stake and total_stake because StackedNodes data structure we set is aware only about this particular client and hence total_stake should be equal to stake (?)

@KirillLykov, this function is used for computing number of streams a client can open.

fn compute_max_parallel_streams(&self) -> usize {

It uses client's stake, and total stake in the computation. So need both values to have an ideal test condition. Total stake should be equal to the amount that staked in the cluster. Local stake should be how much is staked on the node whose identity you are using in the client.

In this case, a client can specify value of total_stake which is smaller than the real value of this parameter to increase number of concurrent streams on the client side to max (ie total_stale == stake). I suppose that in this case validator will drop extra streams from this client.
I wonder if there is a way to request total stake from network so there is no need in additional argument.

I think CLI has a few commands that can give you stake information. So client can use that to find the stake (total, as well it's own).

For example, these commands return stake information

solana stake-history
solana stakes

I thought about requesting this information in the code instead of taking it from cli, but probably better to go with these changes as it is and as a follow up add stake request in the code.

@KirillLykov KirillLykov force-pushed the node_identity_for_bench branch 3 times, most recently from b659990 to 270d942 Compare February 6, 2023 15:21
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small comments on the cli integration. I'll defer to pgarg66 and lijunwangs as to whether the actual StakedNodes handling is correct.

bench-tps/src/main.rs Outdated Show resolved Hide resolved
bench-tps/src/cli.rs Outdated Show resolved Hide resolved
Comment on lines 384 to 394
.help(
"Stake of the client_node_id",
),
)
.arg(
Arg::with_name("client_node_total_stake")
.long("client-node-total-stake")
.value_name("LAMPORTS")
.takes_value(true)
.help(
"Total stake of the client_node_id",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you edit these help texts to explain the difference between these two args?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these two, can we obtain from the network?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add it as follow up PR, but can add here, maybe even simpler.

My general concern is that I think that we don't need to set up CC parameters on client side. Because there is one way quic connections and client doesn't receive much so receive window should not really matter. I don't observe effect of setting up these parameter on the client in my limited experiments (haven't dig into that much)

@KirillLykov KirillLykov force-pushed the node_identity_for_bench branch 3 times, most recently from cf860d3 to c563daf Compare February 13, 2023 14:59
@KirillLykov
Copy link
Contributor Author

@lijunwangs I've got rid of total_stake and stake cli args and, instead, added code which gets this information from the network.

bench-tps/src/main.rs Show resolved Hide resolved
.current
.iter()
.find(|&vote_account| vote_account.node_pubkey == node_id_as_str)
.map_or(Err(()), |value| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log this error if node is not found for better debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return ConnectionCache::with_udp(tpu_connection_pool_size);
}
if client_node_id.is_none() {
return ConnectionCache::new_with_client_options(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably neater to call the ::new function like before in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -513,5 +534,55 @@ pub fn extract_args(matches: &ArgMatches) -> Config {
);
}

if let Some(addr) = matches.value_of("bind_address") {
args.bind_address = solana_net_utils::parse_host(addr).unwrap_or_else(|e| {
eprintln!("failed to parse bind_address address: {e}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> Failed to parse bind_address:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if let Ok(node_id) = read_keypair_file(node_id_path) {
args.client_node_id = Some(node_id);
} else if matches.is_present("client_node_id") {
panic!("could not parse identity path");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Could not parse identity path

Can you use match and actually print the real error returned from read_keypair_file as there can be multiple reasons for the errors like permission, file not found , not protected and etc.?

Copy link
Contributor Author

@KirillLykov KirillLykov Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can use clap validator for this case. I want to rewrite this cli parsing code in the follow up PR completely #30307.

.value_name("PATH")
.takes_value(true)
.requires("json_rpc_url")
.help("File containing a node id (keypair) of a validator with active stake. This allows communicating with network using staked connection"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> "File containing the node identity keypair ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

let total_active_stake: u64 = vote_accounts
.current
.iter()
.chain(vote_accounts.delinquent.iter())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with this business logic, why do we add the delinquent here?

Copy link
Contributor Author

@KirillLykov KirillLykov Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I thought total active stake includes delinquent nodes stake (by doing experiments primarily). And also here it is done this way https://github.com/solana-labs/solana/blame/5108350710b42f8bc2950395a067dc1d15213ba4/cli/src/cluster_query.rs#L1903

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total stake is coming from StakedNodes structure https://github.com/solana-labs/solana/blob/master/streamer/src/streamer.rs#L30, @lijunwangs do you think it is supposed to take into account delinquent nodes stake?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, the current validator code does not include the delinquent stake I think. Can you use delinquent stake to do communications/vote? If so, it should be included, otherwise we probably should not include.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the way it works in the TPU side, it looks like there is no filter on delinquent https://github.com/solana-labs/solana/blob/master/core/src/staked_nodes_updater_service.rs#L82
But I've asked on discord.

Copy link
Contributor Author

@KirillLykov KirillLykov Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like although atm delinquent stakes are taken into account, they are going to be unstaked #24302
Due to that, have changed this code to take into account only normal nodes

.takes_value(true)
.requires("json_rpc_url")
.validator(is_keypair)
.help("File containing the node id (keypair) of a validator with active stake. This allows communicating with network using staked connection"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id --> identity. Normally id means identification in computer programs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@KirillLykov KirillLykov force-pushed the node_identity_for_bench branch 3 times, most recently from d0bcf7a to 19384e5 Compare February 17, 2023 19:09
@KirillLykov KirillLykov force-pushed the node_identity_for_bench branch from 19384e5 to ca2c4e3 Compare February 17, 2023 19:52
Copy link
Contributor

@lijunwangs lijunwangs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than the id vs identity naming. I look forward to another PR for it.

bench-tps/src/main.rs Show resolved Hide resolved
@KirillLykov KirillLykov merged commit 069ebb8 into solana-labs:master Feb 18, 2023
nickfrosty pushed a commit to nickfrosty/solana that referenced this pull request Mar 12, 2023
* add beind_address and client_node_id to bench cli

* use provided node_id and bind_address in connection cache

* add two cli args client_node_stake and client_node_total_stake

* update connection cache construction after upstream update

* use ConnectionCache without Arc to use BackendConnectionCache

* remove comments

* Extend client_node_od cli arg help message

* address PR comments

* simplified staked_nodes creation

* remove delinquent nodes when computing total stake at bench-tps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants