Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecutorService should expose metrics #34998

Closed
ahus1 opened this issue Jul 25, 2023 · 16 comments · Fixed by #35483
Closed

ExecutorService should expose metrics #34998

ahus1 opened this issue Jul 25, 2023 · 16 comments · Fixed by #35483

Comments

@ahus1
Copy link
Contributor

ahus1 commented Jul 25, 2023

Description

Quarkus starts an ExecutorService which by default has 200 threads and an unbounded queue. Via configuration options the maximum size can be changed, and a queue length can be defined.

To monitor this thread pool in a production or load test environment, metrics on this thread pool would be helpful.

When running Keycloak which is based on Quarkus, the metrics aren't exposed, while for example Vert.x thread pool metrics are exposed.

Helpful metrics would be:

  • completedTaskCount
  • rejectedTaskCount
  • submittedTaskCount
  • queueSize
  • activeCount
  • maximumPoolSize
  • maximumQueueSize

Although the max values don't change during runtime, it would be helpful to show them as the upper limit in a Dashboard afterwards (for example Grafana).

Implementation ideas

All those values are already available as part of the EnhancedQueueExecutor implementation, and they need to be published via Micrometer.

The metrics rejectedTaskCount and submittedTaskCount could be two different labels values of one metric, as they can be summed up, but I don't want to make things more complicated if this would hold this feature back.

@ahus1 ahus1 added the kind/enhancement New feature or request label Jul 25, 2023
@quarkus-bot
Copy link

quarkus-bot bot commented Jul 25, 2023

/cc @ebullient (metrics), @jmartisk (metrics)

@ahus1 ahus1 changed the title ExecutorService shoud expose metrics ExecutorService should expose metrics Jul 25, 2023
@geoand
Copy link
Contributor

geoand commented Jul 25, 2023

@ebullient WDYT about this?

It seems reasonable to me, so if you also think it makes sense, I can have a look

@ebullient
Copy link
Member

I think the micrometer repo has a binder for the executor service. I did not add it because there is an interesting question about which executor pool to monitor..

@geoand
Copy link
Contributor

geoand commented Jul 31, 2023

According to this issue, we already have the data for the Event Loop but we are missing it for the pool used for blocking tasts

@geoand
Copy link
Contributor

geoand commented Aug 1, 2023

I think the micrometer repo has a binder for the executor service.

The class is io.micrometer.core.instrument.binder.jvm.ExecutorServiceMetrics

@ahus1
Copy link
Contributor Author

ahus1 commented Aug 1, 2023

For me as a Quarkus user in Keycloak, there is the "main" executor pool I am interested in which I can configure for example via QUARKUS_THREAD_POOL_MAX_THREADS

@geoand
Copy link
Contributor

geoand commented Aug 1, 2023

Yeah, that's the one @ahus1

@geoand
Copy link
Contributor

geoand commented Aug 1, 2023

I did a quick look into what we currently provide in way of Executor metrics, and it seems like Vert.x (and therefore Quarkus) only exposes metrics for the worker pool, not the event loop (which I guess makes sense given the nature of the latter) - see here and here

So it seems to like what this ticket asks for is already provided, WDYT?

@sschu
Copy link
Contributor

sschu commented Aug 8, 2023

@geoand What do you mean by "given the nature of the latter"? Something in the direction of "only does non-blocking stuff and is ideally set to number of cores and should never have a waiting queue"? That would be my guess but I am not sure.

Nice avatar btw. ;)

@geoand
Copy link
Contributor

geoand commented Aug 8, 2023

I am just guessing as to why the event loop doesn't have metrics and my guess is that because there are constantly very short lived runnables going into it and being executed, it might be too much overhead to track them.
I would also guess that the one metric that would make sense for the event loop is the number of items in the queue.

Nice avatar btw. ;)

Same 😎

@ahus1
Copy link
Contributor Author

ahus1 commented Aug 13, 2023

@geoand - thank you for the pointer, I had a look and this is what I found:

  • Vert.x creates a decorator to collect the metrics, see WorkerContext
  • Metrics like worker_pool_queue_size, worker_pool_active, worker_pool_queue_delay* are available
  • The metric worker_pool_ratio seems to be off, as it uses the Vert.x configuration variable of the worker pool (20) as a base and not the one of the thread pool injected by Quarkus (which is in my case the default of 200).
  • The executor is available directly from Quarkus as well, and Keycloak is submitting things to the pool without going via the Vert.x facade. This prevents the metrics from those submissions to be recorded.

I can have a look at Keycloak, so it uses the Vert.x facade, so that the metrics are recorded.

It would be great if the Quarkus team could fill the workerPoolSize in the Vert.x options to match the size of the executor, so that the worker_pool_ratio would be correct.

@geoand
Copy link
Contributor

geoand commented Aug 22, 2023

It would be great if the Quarkus team could fill the workerPoolSize in the Vert.x options to match the size of the executor, so that the worker_pool_ratio would be correct.

I will have a look

@geoand
Copy link
Contributor

geoand commented Aug 22, 2023

@ahus1 I assume your analysis is for prod mode, not dev mode, correct?

@ahus1
Copy link
Contributor Author

ahus1 commented Aug 22, 2023

@geoand - yes, I started it in prod mode, I double-checked just now. For me it is part of Keycloak, and I started it with:

export DEBUG=true
export DEBUG_SUSPEND=y
export KEYCLOAK_ADMIN=admin
export KEYCLOAK_ADMIN_PASSWORD=admin
bin/kc.sh --verbose start --http-enabled true --hostname-strict false  --metrics-enabled true --health-enabled true

I debugged the startup down to

new VertxBuilder(options)
                    .threadFactory(vertxThreadFactory)
                    .executorServiceFactory(new QuarkusExecutorFactory(conf, launchMode))

in VertxBuilder. The options contains a workerPoolSize of 20 by default, while the thread pool handed in there has 200 threads.

Looking at the code, the configuration quarkus.vertx.worker-pool-size doesn't do much (?), and quarkus.thread-pool.max-threads should be the real value here.

In VertxImpl, it then picks the executor, and creates the metrics from the workerPoolSize given in the options:

    ExecutorService workerExec = executorServiceFactory.createExecutor(workerThreadFactory, workerPoolSize, workerPoolSize);
    PoolMetrics workerPoolMetrics = metrics != null ? metrics.createPoolMetrics("worker", "vert.x-worker-thread", options.getWorkerPoolSize()) : null;

geoand added a commit to geoand/quarkus that referenced this issue Aug 23, 2023
This is done by setting the relevant property
in VertxOptions to the size of the Quarkus
ExecutorService that is actually by Vertx
(in prod mode).
The reason we update VertOptions is because
PoolMetrics uses it to calculate the blocking
pool size.

Closes: quarkusio#34998
@geoand
Copy link
Contributor

geoand commented Aug 23, 2023

@ahus1 mind giving #35483 a try?

In my tests, it behaved as it should (and as you describe)

geoand added a commit to geoand/quarkus that referenced this issue Aug 23, 2023
This is done by setting the relevant property
in VertxOptions to the size of the Quarkus
ExecutorService that is actually by Vertx
(in prod mode).
The reason we update VertOptions is because
PoolMetrics uses it to calculate the blocking
pool size.

Closes: quarkusio#34998
geoand added a commit to geoand/quarkus that referenced this issue Aug 23, 2023
This is done by setting the relevant property
in VertxOptions to the size of the Quarkus
ExecutorService that is actually by Vertx
(in prod mode).
The reason we update VertOptions is that
PoolMetrics uses it to calculate the blocking
pool size.

Closes: quarkusio#34998
geoand added a commit to geoand/quarkus that referenced this issue Aug 23, 2023
This is done by setting the relevant property
in VertxOptions to the size of the Quarkus
ExecutorService that is actually by Vertx
(in prod mode).
The reason we update VertOptions is that
PoolMetrics uses it to calculate the blocking
pool size.

Closes: quarkusio#34998
geoand added a commit to geoand/quarkus that referenced this issue Aug 24, 2023
This is done by setting the relevant property
in VertxOptions to the size of the Quarkus
ExecutorService that is actually by Vertx
(in prod mode).
The reason we update VertOptions is that
PoolMetrics uses it to calculate the blocking
pool size.

Closes: quarkusio#34998
geoand added a commit that referenced this issue Aug 24, 2023
Properly report Vertx worker pool size
@ahus1
Copy link
Contributor Author

ahus1 commented Aug 24, 2023

Created a follow-up issue to capture the metric of rejected submissions to the pool which is currently not exposed: #35540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants