-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many FileWatcher threads #486
Comments
I've just run a simple application in Cloud Foundry that looks like the following:
The full output can be found here but as of version 1.8:
How are you using the |
Nope. I'm totally wrong. This is a legit bug and reproduced. The cache scope isn't sufficient. |
Even as I say that, I'm not actually sure that it's a bug. If you take a look at the default implementation, you'll see that public void test() throws NoSuchAlgorithmException, KeyStoreException {
TrustManagerFactory factory = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
for (int i = 0; i < 500; i++) {
System.out.println("Getting TrustStore");
factory.init((KeyStore) null);
factory.getTrustManagers();
}
System.out.printf("Thread Count: %d%n", ManagementFactory.getThreadMXBean().getThreadCount());
} while the following does not: public void test() throws NoSuchAlgorithmException, KeyStoreException {
for (int i = 0; i < 500; i++) {
System.out.println("Getting TrustStore");
TrustManagerFactory factory = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
factory.init((KeyStore) null);
factory.getTrustManagers();
}
System.out.printf("Thread Count: %d%n", ManagementFactory.getThreadMXBean().getThreadCount());
} I'm inclined to keep our code matching the default implementation unless you believe that there is a popular networking library that does not properly cache the |
Hmmmm very strange. Now (since Sunday), no matter how many times I try to deploy, I don't see this named thread anymore. Very odd. However, when I did see it, I saw this in the deploy logs:
It stood out because they were ERR logs (in red). But again, I can't seem to see that named thread anymore. I haven't changed code. Just redeployed. Not sure if that helps. |
The I do see what appears to be two parallel initializations of the |
I guess I spoke too soon. I am now able to reproduce these threads. These threads build up only when I see those ERR logs at beginning of deployment. On other deployments (of same app), those ERR logs do not appear at all and these apps have no longevity issue. I believe this problem didn't show itself until I bumped up the ceiling of my thread pool to 400. Maybe it was always there? If X is my pool size, I'm incurring an additional 0.3X threads. So if I lower my pool size to 200, for example, this issue goes away. Since the VM can definitely handle 200 + 60 threads. I'll try and find out why these logs sometimes show and sometimes don't. That might give a clue. Thanks |
The |
It would if my each of my new threads trigger a thread creation on your end. Something akin to #459 |
We are also facing the same issue. Our server ran into memory issue, more precisely thread creation issues
I logged into the server (running on PWS), and a thread dump revealed that there hundreds (528 to be precise) of stacks like this one
This causes the server to reach the hard cap of 1000, and prevents new thread creation. |
I'm attaching the complete thread dump if it can help. |
@lejeunen Do you have any idea what the lifecycle of your I’m interested in getting a better understanding of the instances where this occurs. I’m looking to find out if there are specific frameworks or specific framework recommendations that end up in this non-cached situation. |
We do not manipulate the TrustManagerFactory directly, i.e. in our application code. We do use many libraries though, see below. The server where we observe the issue is our gateway/edge server (in a microservice sense), so it gets all the incoming REST requests from our web client (a Javascript app, mostly run from Chrome), and forwards them to other microservices. To achieve this, we use the Spring Cloud stack, where Zuul reverse proxy is used together with a discovery service (Eureka, also bundled with Spring Cloud) to forward requests to other services. In these other services, also based on Spring cloud but without the reverse proxy part, we only see a few threads (3-4) related to the file watcher. In our gateway server log, we see many logs related to this (as detailed in #459) whenever our web client sends requests to the server.
|
We have a branch with a more recent version of Spring Boot, and therefore Spring Cloud, and the problem does not seem to happen on that environment! We are conducting more tests to confirm this. |
Which spring boot are you using? I'm using 1.5.2. however, this problem is intermittent. Out of 5 deployments, 2 of them will have this issue. Maybe a certain version will guarantees the fix. |
We have not seen it happen yet with Boot 1.5.6 |
@lejeunen Is it possible that your gateway server (where you see those many logs) is running on a different version of the Java Buildpack from the application application without the reverse proxy? I think at this point, I’m going to find someway to cache the “watcher” bit of the security provider no matter what the behavior of the network clients are. I’m just gathering more information at this point to make sure I cache in the proper location. |
@nebhale No we use the same buildpack version as the other services. |
@lejeunen Understood. Thanks for the feedback. |
I've changed the design of the |
Hi,
I'm trying to use my resources as efficiently as possible. Recently, I did some load testing and noticed that I've been hitting the system limit of threads and it caused my app to crash. However, the CPU and memory were just fine. It's a software/config limit rather than an actual physical limit.
I dumped all my threads and seen that almost a third (100+) of my threads were named "file-watcher-/etc/ssl/certs/ca-certificates.crt". Each of these thread's stacktrace gives me:
Is there some kind of leak here? Threads seemed to be spawning from FileWatcher uncontrollably * gasp *
There also seemed to be a similar issue in the recent past: #459
However, the symptom of each request having logged a watcher instance is not happening with my issue.
The text was updated successfully, but these errors were encountered: