Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate busiest tasks per tokio-console #4583

Closed
3 tasks
Tracked by #4747
conradoplg opened this issue Jun 8, 2022 · 7 comments
Closed
3 tasks
Tracked by #4747

Investigate busiest tasks per tokio-console #4583

conradoplg opened this issue Jun 8, 2022 · 7 comments
Labels
C-enhancement Category: This is an improvement I-slow Problems with performance or responsiveness

Comments

@conradoplg
Copy link
Collaborator

conradoplg commented Jun 8, 2022

Motivation

tokio-console, for which we added support in #4519, is reporting a bunch of busy tasks but it's not clear what they are. We should investigate it.

On a syncing-pre-checkpoint node, it's looking like this in my machine:

image

The target is tokio::task::blocking and fields are

│fn=tokio::runtime::thread_pool::worker::Launch::launch::{{closure}}                                                                                                                                              │
│kind=blocking                                                                                                                                                                                                    │
│task.id=2                                                                                                                                                                                                        │
│spawn.location=<cargo>/tokio-1.18.2/src/runtime/thread_pool/worker.rs:358:13 

CPU usage seems to be at 200% which does not seem to reflect that. There are 24 of those tasks and my CPU has 24 cores, so it feels like CPU usage should be much higher? Not sure.

Specifications

Investigations

Missing tokio-console features

Related Work

@conradoplg conradoplg added C-enhancement Category: This is an improvement S-needs-triage Status: A bug report needs triage P-Medium ⚡ I-slow Problems with performance or responsiveness labels Jun 8, 2022
@oxarbitrage
Copy link
Contributor

CPU usage seems to be at 200% which does not seem to reflect that. There are 24 of those tasks and my CPU has 24 cores, so it feels like CPU usage should be much higher? Not sure.

There is a typo and this should say 20% ?

@conradoplg
Copy link
Collaborator Author

CPU usage seems to be at 200% which does not seem to reflect that. There are 24 of those tasks and my CPU has 24 cores, so it feels like CPU usage should be much higher? Not sure.

There is a typo and this should say 20% ?

In top (or htop), 200% means 2 cores at 100%

BTW, this seems to have changed. Now I get ~1000% CPU usage near the tip, and the sync seems to get stuck or take a super long time. Still no idea what's happening.

@teor2345
Copy link
Contributor

teor2345 commented Jul 13, 2022

Currently we've seen these blocking tasks:

  • state writes (CommitBlock and CommitFinalizedBlock), waiting on file access, needs to be put in tokio::spawn_blocking()
    • this might also have some CPU-bound tasks in it

@teor2345
Copy link
Contributor

teor2345 commented Jul 18, 2022

CPU profiles seem to show these functions are the busiest.
(I listed functions with CPU over 10%, including children, in approximate order.)

This CPU-bound work shouldn't be running in tokio futures. It needs to be moved to the rayon CPU thread pool, inside blocking tokio threads.

We mainly saw issues with jubjub cryptography, but the same issue can happen with Orchard-heavy blocks and pallas.

Deserialization (in zebra-network or zebra-state):

Verification (in zebra-consensus):

Note commitment tree updates (in zebra-state, either finalized or non-finalized):

Example code for rayon in tokio blocking threads:

/// Flush the batch using a thread pool, and return the result via the channel.
/// This function returns a future that becomes ready when the batch is completed.
fn flush_spawning(batch: BatchVerifier, tx: Sender) -> impl Future<Output = ()> {
// Correctness: Do CPU-intensive work on a dedicated thread, to avoid blocking other futures.
tokio::task::spawn_blocking(|| {
// TODO:
// - spawn batches so rayon executes them in FIFO order
// possible implementation: return a closure in a Future,
// then run it using scope_fifo() in the worker task,
// limiting the number of concurrent batches to the number of rayon threads
rayon::scope_fifo(|s| s.spawn_fifo(|_s| Self::verify(batch, tx)))
})
.map(|join_result| join_result.expect("panic in ed25519 batch verifier"))
}

CPU usage data:

image

image

@dconnolly
Copy link
Contributor

CPU profiles seem to show these functions are the busiest. (I listed functions with CPU over 10%, including children, in approximate order.)

This CPU-bound work shouldn't be running in tokio futures. It needs to be moved to the rayon CPU thread pool, inside blocking tokio threads.

Verification (in zebra-consensus):

  • groth16::DescriptionWrapper::try_from()

We can probably wrap up this call by doing the batch.queue() in a rayon thread, as I think that's the thing that requires groth16::DescriptionWrapper::try_from():

self.batch.queue(item);

Note commitment tree updates (in zebra-state, either finalized or non-finalized):

  • Note commitment tree append and root

    • incrementalmerkletree::bridgetree::Frontier::append()

Yep we probably can wrap up all write calls to the Frontier in a blocking rayon thread call

@teor2345
Copy link
Contributor

It's more efficient to wrap the highest-level call that includes all the CPU-heavy code.

And every rayon thread needs to be wrapped in a tokio::spawn_blocking() thread - it's often easier to do them together.

@teor2345
Copy link
Contributor

We've done enough on this for now, it's the note commitment trees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: This is an improvement I-slow Problems with performance or responsiveness
Projects
None yet
Development

No branches or pull requests

5 participants