Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" #722

Closed
donbeave opened this issue Dec 20, 2021 · 14 comments · Fixed by #806
Closed
Labels
Milestone

Comments

@donbeave
Copy link

Describe the bug
Some of our applications have been updated to the latest version 7.0.1 from 6.6 version. Later we found out one of the applications have a memory leak. We used heap dump to analyze it with Eclipse Memory Analyzer (https://www.eclipse.org/mat/) and found out that potentially this memory leak was introduced in the latest version of logstash-logback-encoder,

  • logstash-logback-encoder version: 7.0.1
  • logback version: 1.2.3
  • jackson version: 2.12.5
  • java version: 17.0.1

This is our logstash logback configuration:

    <property name="LOG_FILE" value="${LOG_FILE:-${LOG_DIR}/spring.log}"/>

    <appender name="FILE"
              class="ch.qos.logback.core.rolling.RollingFileAppender">
        <encoder class="net.logstash.logback.encoder.LogstashEncoder">
            <timeZone>UTC</timeZone>
            <throwableConverter class="net.logstash.logback.stacktrace.ShortenedThrowableConverter">
                <maxDepthPerThrowable>30</maxDepthPerThrowable>
                <exclude>sun\.reflect\..*\.invoke.*</exclude>
                <exclude>net\.sf\.cglib\.proxy\.MethodProxy\.invoke</exclude>
                <rootCauseFirst>true</rootCauseFirst>
                <inlineHash>true</inlineHash>
            </throwableConverter>
        </encoder>
        <file>${LOG_FILE}</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <cleanHistoryOnStart>${LOG_FILE_CLEAN_HISTORY_ON_START:-false}</cleanHistoryOnStart>
            <fileNamePattern>${LOG_FILE}.%d{yyyy-MM-dd}.%i.gz</fileNamePattern>
            <maxFileSize>${LOG_FILE_MAX_SIZE:-10MB}</maxFileSize>
            <maxHistory>${LOG_FILE_MAX_HISTORY:-7}</maxHistory>
            <totalSizeCap>${LOG_FILE_TOTAL_SIZE_CAP:-0}</totalSizeCap>
        </rollingPolicy>
    </appender>

And here are some memory leak reports:

Screenshot 2021-12-20 at 22 10 51

Screenshot 2021-12-20 at 22 11 27

Please let me know if I need to provide any additional information.

@brenuart
Copy link
Collaborator

brenuart commented Dec 20, 2021

Hmm... doesn't look good :-(

A few questions to help us investigate:

  • LLE maintains one JsonFormatter per thread until the thread dies which is fine in typical servlet environments for instance. The threadValues should contain about the same number of elements as you have running threads (possibly a bit more, car. thread detection mechanism below). How many running threads do you have? Is it always a new one? Do you have an idea of the size of the threadValues collection? The size should be equal to the number of JsonFormatter you have in memory. You should also have the same amount of ThreadLocalReusableByteBuffer... Do you confirm?

  • We maintain a WeakReference to each thread that acquired a value. These refs are added in a ReferenceQueue by the JVM when it is garbaged collected. This is how we detect a thread is dead and it is now safe to dispose the resources that have been allocated to it. Note that this is a "passive" process, ie there is no dedicated thread allocated to this task. Instead, this house keeping process takes place every time an event is logged. If there is no GC event or no logging activity, nothing is collected and resources are not disposed. Does the heap size decrease after a GC?

  • Do you have a minimal application I could use to reproduce the scenario?

As you understand, the worst case scenario is logging from a new thread every time + large heap size causing very few GC. In this case you are likely to see the heap usage growing until the first GC event. Is this your case?

PS: I have not been able to reproduce the scenario (yet) - hence all my questions...

@brenuart brenuart added the status/waiting-for-feedback Waiting for feedback from the original author. Issue will be closed if feedback is not received soon label Dec 20, 2021
@brenuart
Copy link
Collaborator

As far as I can tell from the screenshots, there is about 41800 items in the threadValues collection. Knowing that these items have a retained size of roughly 15k per thread, we get a total of 650Mb which is what the reports says.
However, this means you have 41800 thread instances in memory as well. I doubt they are all alive... The only way to get so many items in the collection is these thread instances have not yet been GC and are still in memory. Do you confirm?

@brenuart
Copy link
Collaborator

@donbeave Any news about this issue?

@donbeave
Copy link
Author

@brenuart Sorry for the late response. We didn't do experiments during December and the begging of January, as we blocked the deployment of new features from the master. But before that, we tried some and found that it's not related to the version of logstash-logback-encoder, so it looks like the problem is somewhere else on our side. I'll be back here after we do some more experiments.

@brenuart
Copy link
Collaborator

@donbeave Do you confirm the issue is outside of logback-logstash-encoder?
Can we close this issue or do you have any additional information?

@brenuart
Copy link
Collaborator

brenuart commented Mar 7, 2022

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

@brenuart brenuart closed this as completed Mar 7, 2022
@brenuart brenuart removed the status/waiting-for-feedback Waiting for feedback from the original author. Issue will be closed if feedback is not received soon label Mar 7, 2022
@tommyulfsparre
Copy link

tommyulfsparre commented Apr 22, 2022

@brenuart We might have been able to reproduce this issue here: https://github.com/tommyulfsparre/repro-logstash-encoder but it is only reproducible running Java 17, 18 not 15 or 11 and only when using the ForkJoinPool common pool. Where the threadValues grows unbounded.

@klaraward
Copy link

klaraward commented Apr 22, 2022

As a comment to @tommyulfsparre above, it is not reproducible with Java 16 either.

And as you can see, the repro uses version 7.1.1

@brenuart brenuart reopened this Apr 23, 2022
@brenuart
Copy link
Collaborator

brenuart commented Apr 24, 2022

Hi @tommyulfsparre, @klaraward,

Thanks for the little test case - I could reproduce the issue!

I could trace it down to a change introduced in Java 17 in how the ForkJoinPool behaves and more specifically the common ForkJoinPool. To make it short, since Java 17 the common ForkJoinPool is configured with innocuous workers. This kind of worker is clearing the thread locals before executing the submitted task, probably to make the thread appears as if it was a brand new thread instead of a thread reused from the pool. See this change for more information.

Note that the problem disappears if you use a new ForkJoinPool() instead of a ForkJoinPool.common()... Only the latter is using innocuous workers...

This of course break assumptions made by our ThreadLocalHolder utility class. Here we make the assumption that the Thread's threadLocal map is not cleared until the thread dies - which is not true anymore when the thread comes out of a common ForkJoinPool.

I'll have a look at how we can fix this and avoid the memory leak. However, ThreadLocal can't be used anymore to keep track of resources allocated to the thread. Unfortunately we rely on this mechanism to reuse resources like buffers and JsonGenerator instead of creating new instances for every ILoggingEvent... I'm not yet convinced I'll be able to restore this functionality when the thread comes out of a common ForkJoinPool...

Anyway, stay tuned...

@brenuart brenuart changed the title Probably memory leak in 7.0.1 version Memory leak in 7.0.1 version when logging from a thread coming out of a "ForkJoinPool.common()" Apr 24, 2022
@brenuart brenuart changed the title Memory leak in 7.0.1 version when logging from a thread coming out of a "ForkJoinPool.common()" Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" (_innocuous_ worker) Apr 24, 2022
@brenuart brenuart changed the title Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" (_innocuous_ worker) Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" Apr 24, 2022
@tommyulfsparre
Copy link

tommyulfsparre commented Apr 26, 2022

@brenuart thanks for looking into this!

We have filed an issue with openJDK to see if this is an intended change or not.

@spkrka
Copy link

spkrka commented May 3, 2022

I've been thinking a bit on how to make ThreadLocal in Java 17 behave the same way as in older Java and I ended up with a subclass of ThreadLocal that uses a fallback map keyed on thread id and keeps track of thread deaths to clean up the fallback map (both to prevent assigning old values to a new thread, and to avoid leaking data)

If you think this seems useful, feel free to use it as is or modify it: spotify/sparkey-java#55

@caesar-ralf
Copy link
Contributor

From the conversation at the issue in open JDK it seems that this behaviour is what is expected and the only change coming is of better documentation. The team at Spotify will try to see if we can come up with a solution for this problem, but if you have already something @brenuart please let us know.

@brenuart
Copy link
Collaborator

brenuart commented May 6, 2022

Hi @caesar-ralf ,
I haven't had the time yet to have a look at your approach and think about the issue. However I checked the issue you created against openjdk and it looks like the observed behaviour of the ForkJoinPool is as expected. This means we definitely have something to do to adresse this use case: provide a workaround or add something about it in the documentation.
I'll keep updating this issue with our decision when I have the time to come back to it.
Thanks for your investigations and feedback.

caesar-ralf added a commit to caesar-ralf/logstash-logback-encoder that referenced this issue May 9, 2022
In newer versions of java, specifically 16 and over, the `ThreadLocal` can
be cleared before the Thread is killed. This breaks the assumption of
`ThreadLocalHolder` which expected the `initializeThread` method to be
executed only once per `Thread`.

This commit changes the `threadValues` to become a map that holds which
thread the value is associated with so we can avoid recreating new
values when one already exists.

Fixes logfellow#722
@caesar-ralf
Copy link
Contributor

Found a potential solution for the problem @brenuart, if you can take a look! It's based on what @spkrka did in Sparkey

caesar-ralf added a commit to caesar-ralf/logstash-logback-encoder that referenced this issue May 9, 2022
In newer versions of java, specifically 16 and over, the `ThreadLocal` can
be cleared before the Thread is killed. This breaks the assumption of
`ThreadLocalHolder` which expected the `initializeThread` method to be
executed only once per `Thread`.

This commit changes the `threadValues` to become a map that holds which
thread the value is associated with so we can avoid recreating new
values when one already exists.

Fixes logfellow#722
caesar-ralf added a commit to caesar-ralf/logstash-logback-encoder that referenced this issue May 18, 2022
In newer versions of java, specifically 16 and over, the `ThreadLocal` can
be cleared before the Thread is killed. This breaks the assumption of
`ThreadLocalHolder` which expected the `initializeThread` method to be
executed only once per `Thread`.

This commit changes the `threadValues` to become a map that holds which
thread the value is associated with so we can avoid recreating new
values when one already exists.

Fixes logfellow#722
caesar-ralf added a commit to caesar-ralf/logstash-logback-encoder that referenced this issue May 18, 2022
In newer versions of java, specifically 16 and over, the `ThreadLocal` can
be cleared before the Thread is killed. This breaks the assumption of
`ThreadLocalHolder` which expected the `initializeThread` method to be
executed only once per `Thread`.

This commit changes the `threadValues` to become a map that holds which
thread the value is associated with so we can avoid recreating new
values when one already exists.

Fixes logfellow#722
brenuart pushed a commit that referenced this issue May 18, 2022
In newer versions of java, specifically 16 and over, the `ThreadLocal` can
be cleared before the Thread is killed. This breaks the assumption of
`ThreadLocalHolder` which expected the `initializeThread` method to be
executed only once per `Thread`.

This commit changes the `threadValues` to become a map that holds which
thread the value is associated with so we can avoid recreating new
values when one already exists.

Fixes #722
@brenuart brenuart added this to the 7.1.2 milestone May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants