Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix flaky test #3295

Merged
merged 4 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 29 additions & 5 deletions local-cluster/tests/local_cluster.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,9 @@ use {
client::AsyncClient,
clock::{self, Slot, DEFAULT_TICKS_PER_SLOT, MAX_PROCESSING_AGE},
commitment_config::CommitmentConfig,
epoch_schedule::{DEFAULT_SLOTS_PER_EPOCH, MINIMUM_SLOTS_PER_EPOCH},
epoch_schedule::{
DEFAULT_SLOTS_PER_EPOCH, MAX_LEADER_SCHEDULE_EPOCH_OFFSET, MINIMUM_SLOTS_PER_EPOCH,
},
genesis_config::ClusterType,
hard_forks::HardForks,
hash::Hash,
Expand Down Expand Up @@ -1536,20 +1538,42 @@ fn test_wait_for_max_stake() {
solana_logger::setup_with_default(RUST_LOG_FILTER);
let validator_config = ValidatorConfig::default_for_test();
let slots_per_epoch = MINIMUM_SLOTS_PER_EPOCH;
// Set this large enough to allow for skipped slots but still be able to
// make a root and derive the new leader schedule in time.
let stakers_slot_offset = slots_per_epoch.saturating_mul(MAX_LEADER_SCHEDULE_EPOCH_OFFSET);
bw-solana marked this conversation as resolved.
Show resolved Hide resolved
// Reduce this so that we can complete the test faster by advancing through
// slots/epochs faster. But don't make it too small because it can make us
// susceptible to skipped slots and the cluster getting stuck.
let ticks_per_slot = 16;
bw-solana marked this conversation as resolved.
Show resolved Hide resolved
let mut config = ClusterConfig {
cluster_lamports: DEFAULT_CLUSTER_LAMPORTS,
node_stakes: vec![DEFAULT_NODE_STAKE; 4],
validator_configs: make_identical_validator_configs(&validator_config, 4),
slots_per_epoch,
stakers_slot_offset: slots_per_epoch,
stakers_slot_offset,
ticks_per_slot,
..ClusterConfig::default()
};
let cluster = LocalCluster::new(&mut config, SocketAddrSpace::Unspecified);
let client = RpcClient::new_socket(cluster.entry_point_info.rpc().unwrap());

assert!(client
.wait_for_max_stake(CommitmentConfig::default(), 33.0f32)
.is_ok());
steviez marked this conversation as resolved.
Show resolved Hide resolved
// This is based on the percentage of stake that is allowed to be activated
// each epoch.
let num_expected_epochs = 14;
Copy link

@AshwinSekar AshwinSekar Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanna be fancy can derive this from the constant in sdk:

let num_expected_epochs = 3f64.log(1. + NEW_WARMUP_COOLDOWN_RATE).ceil() as u32 + 1;

The +1 on the end is a buffer in case stake doesn't start activating immediately.

Copy link
Author

@bw-solana bw-solana Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3 here is essentially the 3 non-bootstrap validators, right?

Does this compute the time for the stake to completely activate? Because I think the test just waits for the bootstrap node stake to fall below 1/3, which would be significantly sooner

No, actually I think the 3 is right. It just represents increasing the total amount of stake 3x, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not sure if we need the +1 because this isn't trying to find the epoch in which we will hit the threshold. It's just computing number of epochs it will take once we start activating the stake

Copy link

@AshwinSekar AshwinSekar Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya the 3 is from the 1/3. Original equation is
image
Where:

  • S is DEFAULT_NODE_STAKE the bootstrap node's stake
  • R is 9% aka NEW_WARMUP_COOLDOWN_RATE
  • X is num_expected_epochs

Essentially the denominator represents the total stake of the system at each epoch, it increases by 9%

let expected_test_duration = config.poh_config.target_tick_duration
* ticks_per_slot as u32
* slots_per_epoch as u32
* num_expected_epochs;
// Make the timeout double the expected duration to provide some margin.
// Especially considering tests may be running in parallel.
let timeout = expected_test_duration * 2;
if let Err(err) = client.wait_for_max_stake_below_threshold_with_timeout(
CommitmentConfig::default(),
33.0f32,
timeout,
) {
panic!("wait_for_max_stake failed: {:?}", err);
}
assert!(client.get_slot().unwrap() > 10);
}

Expand Down
39 changes: 38 additions & 1 deletion rpc-client/src/nonblocking/rpc_client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2179,8 +2179,37 @@ impl RpcClient {
&self,
commitment: CommitmentConfig,
max_stake_percent: f32,
) -> ClientResult<()> {
self.wait_for_max_stake_below_threshold_with_timeout_helper(
commitment,
max_stake_percent,
None,
)
.await
}

pub async fn wait_for_max_stake_below_threshold_with_timeout(
&self,
commitment: CommitmentConfig,
max_stake_percent: f32,
timeout: Duration,
) -> ClientResult<()> {
self.wait_for_max_stake_below_threshold_with_timeout_helper(
commitment,
max_stake_percent,
Some(timeout),
)
.await
}

async fn wait_for_max_stake_below_threshold_with_timeout_helper(
&self,
commitment: CommitmentConfig,
max_stake_percent: f32,
timeout: Option<Duration>,
bw-solana marked this conversation as resolved.
Show resolved Hide resolved
) -> ClientResult<()> {
let mut current_percent;
let start = Instant::now();
loop {
let vote_accounts = self.get_vote_accounts_with_commitment(commitment).await?;

Expand All @@ -2197,12 +2226,20 @@ impl RpcClient {
current_percent = 100f32 * max as f32 / total_active_stake as f32;
if current_percent < max_stake_percent {
break;
bw-solana marked this conversation as resolved.
Show resolved Hide resolved
Copy link

@ilya-bobyr ilya-bobyr Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style

If you are going to be modifying this, I suggest writing it completely in the imperative style or completely in the functional style. A mix of two styles is negatively affecting my ability to read the code, as it does not work fully with either of the existing intuitions that I have.

Imperative style:

use itertools::chain;
/* ... */

            let mut max = 0;
            let mut total = 0;
            for account in chain(&vote_accounts.current, &vote_accounts.delinquent) {
                let activated_stake = account.activated_stake;
                max = std::cmp::max(max, activated_stake);
                total += activated_stake;
            }

Functional style:

use itertools::chain;
/* ... */
            let (max, total) = chain(&vote_accounts.current, &vote_accounts.delinquent)
                .map(|account| account.activated_stake)
                .fold((0, 0), |(max, total), stake| {
                    (std::cmp::max(max, stake), total + stake)
                });

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay if we do this in a follow-up? Hoping to limit to functional change for this commit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely.
This was a suggestion in case you were going to change this code.

} else if let Some(timeout) = timeout {
if start.elapsed() > timeout {
return Err(ClientErrorKind::Custom(
"timed out waiting for max stake to drop".to_string(),
)
.into());
}
}

info!(
"Waiting for stake to drop below {} current: {:.1}",
max_stake_percent, current_percent
);
sleep(Duration::from_secs(10)).await;
sleep(Duration::from_secs(5)).await;
bw-solana marked this conversation as resolved.
Show resolved Hide resolved
}
Ok(())
}
Expand Down
15 changes: 15 additions & 0 deletions rpc-client/src/rpc_client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1900,6 +1900,21 @@ impl RpcClient {
self.invoke((self.rpc_client.as_ref()).wait_for_max_stake(commitment, max_stake_percent))
}

pub fn wait_for_max_stake_below_threshold_with_timeout(
&self,
commitment: CommitmentConfig,
max_stake_percent: f32,
timeout: Duration,
) -> ClientResult<()> {
self.invoke(
(self.rpc_client.as_ref()).wait_for_max_stake_below_threshold_with_timeout(
commitment,
max_stake_percent,
timeout,
),
)
}

/// Returns information about all the nodes participating in the cluster.
///
/// # RPC Reference
Expand Down
Loading