-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compat expansion; forge refactor #13302
Conversation
lots of `&foo` and `&mut foo` replaced with Arc<Mutex<Foo>>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Arc<std::sync::Mutex<LocalAccount>> -> Arc<LocalAccount> because it contains an atomic counter which hides mutability and is safe, no additional mutex needed or desired Some back to just &LocalAccount Add tokio Handle to NetworkContextSynchronizer and use it for async-ness inside NetworkTest run() implementations. NetworkContextSynchronizer Arc<Mutex<NetworkContext<'t>>> -> Arc<tokio::sync::Mutex<NetworkContext<'t>>> because tokio contaminates and enblobifiies all
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The most likely failure mode (which I encountered several times while getting the tests I have run to pass) is that some function declares a new tokio Runtime and calls I have probably missed some cases where this will happen, and there will probably be test failures that then take 30-60 minutes each of developer time to fix. I know I caught like a dozen sites where this would have happened; I probably missed some. If there's a tag we can apply to this PR to "run all forge the tests", we should do that? This change creates a better signal from the reference vs Arc? Yeah, there will probably be more of that. This much is just what I needed to get the compat change I wanted. It looks like a big change because it touched some interfaces that were propagated to many parts of the code. If you pretend that changing the signature of a function that's implemented 18 times is 'one change' then it's smaller? :-D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for changing all these, having all these be async is so much cleaner
// Wrap LocalAccount in Arc+Mutex | ||
// let account_arcs : Vec<Arc<LocalAccount>> = accounts_to_use.into_iter().map(Arc::new).collect(); | ||
// get txns | ||
let txns = accounts_to_use | ||
.iter_mut() | ||
.iter() | ||
.flat_map(|account| self.generator.generate_transactions(account, 1)) | ||
.collect(); | ||
// let txns = accounts_to_use | ||
// .iter_mut() | ||
// .flat_map(|account| { | ||
// | ||
// self.generator.generate_transactions(account, 1) | ||
// }) | ||
// .collect(); | ||
|
||
// back to plain LocalAccount, add to accounts | ||
// let accounts_to_use = account_arcs.into_iter().map(|account| { | ||
// Arc::into_inner(account).unwrap() | ||
// }).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is all the commented code here for ? is it needed, or should it be deleted?
testsuite/forge-cli/src/main.rs
Outdated
let duration = if args.suite == "compat" { | ||
// TODO: if this needs to be more perminent than hacking into this branch, edit | ||
// .github/workflows/docker-build-test.yaml | ||
Duration::from_secs(30 * 60) | ||
} else { | ||
Duration::from_secs(args.duration_secs as u64) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why do we need to do this, doesn't compat have it's own duration specified?
pub fn swarm(&mut self) -> &mut dyn Swarm { | ||
self.swarm | ||
} | ||
// pub fn swarm(&mut self) -> &mut dyn Swarm { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
// let validator_clients = { | ||
// swarm.read().await.get_validator_clients_with_names() | ||
// }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
@@ -142,7 +146,8 @@ pub async fn test_consensus_fault_tolerance( | |||
} | |||
|
|||
if new_epoch_on_cycle { | |||
swarm.aptos_public_info().reconfig().await; | |||
// swarm.read().await.aptos_public_info().reconfig().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
// because we are doing failure testing, we should be sending | ||
// traffic to nodes that are alive. | ||
if ctx.swarm().full_nodes().count() > 0 { | ||
let full_nodes_count = { ctx.swarm.read().await.full_nodes().count() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you have braces here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make sure I get exactly the scope I want on the read-lock object
@igor-aptos found a bunch of things my brain had clearly glossed over after reading the diff too many times. Fixed those and some more cleanups I found from doing another fresh read through. Changing the |
for the compat, for testing you can do it here, but before landing - you should be changing it in the compat config file, not hardcode in the code, correct? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
Expand "compat" forge test to simultaneously do traffic generation, gather stats, and run a gradual upgrade of validator nodes. Lots of refactor to get there, replacing lots of `&Foo` and `&mut Foo` with `Arc<Mutex<Foo>>` and hidden internal mutability to make things multithread capable.
Description
Expand "compat" forge test to simultaneously do traffic generation, gather stats, and run a gradual upgrade of validator nodes. Ensure that TPS stays high enough during upgrade.
Lots of refactor to get there, replacing lots of
&Foo
and&mut Foo
withArc<Mutex<Foo>>
to make things multithread capable.Type of Change
Which Components or Systems Does This Change Impact?
How Has This Been Tested?
This is tests. This PR running tests in forge clusters is the test.
Key Areas to Review
Is this idiomatic Rust? Is there a better way?
Checklist