Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thread manager crate to agave #3890

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

alexpyattaev
Copy link

Problem

There exist a variety of perf issues related to unorganized thread pools that spawn far more threads than are useful on a given machine, this was identified by firedancer and is relatively easy to fix in agave.

Idea is to eventually address the needs of

Summary of Changes

Added new agave-thread-manager crate, which is to be gradually hooked into the agave itself. It will be used to centralize thread pool creation such that their core allocations can be controlled, and total thread count can gradually be reduced to match core count as closely as possible. Crate comes with its own benchmarks and tests to establish config policies.

@alexpyattaev alexpyattaev force-pushed the thread_manager branch 3 times, most recently from da038b3 to 309350b Compare December 8, 2024 07:50
@alexpyattaev alexpyattaev marked this pull request as ready for review December 8, 2024 22:05
@@ -48,11 +48,12 @@ fn main() -> anyhow::Result<()> {
println!("Running {exp}");
let mut conffile = PathBuf::from(env!("CARGO_MANIFEST_DIR"));
conffile.push(exp);
let conffile = std::fs::File::open(conffile)?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, we don't specify configurations in files. Doing it in code instead for the example/test cases, or with command line flags for other cases. Sometimes use yml (for accounts information for example).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got you, you want to read thread-manager/examples/core_contention_contending_set.toml with it.
Yeah, could you maybe have a method that reads this file and used in both example and production environment in the future?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file will never be read in prod, its just for examples which are also benchmarks/tests. I'm not even sure we will use toml in prod for this yet, though it is likely.

let cfg: RuntimeManagerConfig = serde_json::from_reader(conffile)?;
let mut buf = String::new();
std::fs::File::open(conffile)?.read_to_string(&mut buf)?;
let cfg: RuntimeManagerConfig = toml::from_str(&buf)?;
//println!("Loaded config {}", serde_json::to_string_pretty(&cfg)?);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use log / tracing crate instead of print

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

//println!("Loaded config {}", serde_json::to_string_pretty(&cfg)?);

let rtm = RuntimeManager::new(cfg).unwrap();
let rtm = ThreadManager::new(cfg).unwrap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: rtm, cfg,tok is uncommon. we typically use the full descriptive names. Lile rtm -> thread_manager

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@KirillLykov
Copy link

I think you need to explain broader audience proposed changes in the context. Maybe by setting up a meeting with involved parties.

@alexpyattaev
Copy link
Author

Yes, a meeting would be necessary eventually, but I also need the code to be reasonably good before a million people look at it=)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants