-
-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
track blocked thread event and report statistics in metrics endpoint #1649
Conversation
I think that's a good idea. |
I had quite a few outages due to blocked threads. So far the logs indicated that certain inputs will cause PlantUml/Ditaa to hang or go into infinite loop. I'm looking out for those bad inputs, but in the meantime, I'm adding as the first step to improve reliability of my server. |
Are you using the latest version? We are now using single executable binaries (produced by GraalVM) for both PlantUML and Ditaa so it shouldn't block threads anymore. |
yes, I'm on 0.22 |
private final Map<String, Instant> eventLoopStats = new ConcurrentHashMap<>(); | ||
private final Map<String, Instant> workerStats = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned since we don't do eviction since is basically a memory leak. We could use something like https://github.com/ben-manes/caffeine to use a time-based eviction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I switched to Caffeine's Cache to store the stats.
I was reluctant to use Caffeine as it was not included in the project and I didn't expect the 2 maps to grow above 100 entries.
if (blockedThreadChecker != null) { | ||
data.put("blockedWorkerPercentage", blockedThreadChecker.blockedWorkerThreadPercentage()); | ||
data.put("blockedEventLoopPercentage", blockedThreadChecker.blockedEventLoopThreadPercentage()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should export data using Prometheus format on /metrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't use Prometheus on my server but I'm happy to help. Can you create another issue and assign to me? I'll work on it when I have the bandwidth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've created an issue
Hi @ggrossetie could you please review and merge this PR? |
@ggrossetie Any eta on a release which includes this? ❤️ |
I will try to push a new release before my vacation (mid January) |
Vert.x monitors and reports blocked thread. This PR will include that info in the /healthcheck endpoint so we can use that info to restart Kroki when perfomance degrades too much due to many blocked threads.