Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" #722

donbeave · 2021-12-20T19:23:28Z

Describe the bug
Some of our applications have been updated to the latest version 7.0.1 from 6.6 version. Later we found out one of the applications have a memory leak. We used heap dump to analyze it with Eclipse Memory Analyzer (https://www.eclipse.org/mat/) and found out that potentially this memory leak was introduced in the latest version of logstash-logback-encoder,

logstash-logback-encoder version: 7.0.1
logback version: 1.2.3
jackson version: 2.12.5
java version: 17.0.1

This is our logstash logback configuration:

    <property name="LOG_FILE" value="${LOG_FILE:-${LOG_DIR}/spring.log}"/>

    <appender name="FILE"
              class="ch.qos.logback.core.rolling.RollingFileAppender">
        <encoder class="net.logstash.logback.encoder.LogstashEncoder">
            <timeZone>UTC</timeZone>
            <throwableConverter class="net.logstash.logback.stacktrace.ShortenedThrowableConverter">
                <maxDepthPerThrowable>30</maxDepthPerThrowable>
                <exclude>sun\.reflect\..*\.invoke.*</exclude>
                <exclude>net\.sf\.cglib\.proxy\.MethodProxy\.invoke</exclude>
                <rootCauseFirst>true</rootCauseFirst>
                <inlineHash>true</inlineHash>
            </throwableConverter>
        </encoder>
        <file>${LOG_FILE}</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <cleanHistoryOnStart>${LOG_FILE_CLEAN_HISTORY_ON_START:-false}</cleanHistoryOnStart>
            <fileNamePattern>${LOG_FILE}.%d{yyyy-MM-dd}.%i.gz</fileNamePattern>
            <maxFileSize>${LOG_FILE_MAX_SIZE:-10MB}</maxFileSize>
            <maxHistory>${LOG_FILE_MAX_HISTORY:-7}</maxHistory>
            <totalSizeCap>${LOG_FILE_TOTAL_SIZE_CAP:-0}</totalSizeCap>
        </rollingPolicy>
    </appender>

And here are some memory leak reports:

Please let me know if I need to provide any additional information.

The text was updated successfully, but these errors were encountered:

brenuart · 2021-12-20T20:45:38Z

Hmm... doesn't look good :-(

A few questions to help us investigate:

LLE maintains one JsonFormatter per thread until the thread dies which is fine in typical servlet environments for instance. The threadValues should contain about the same number of elements as you have running threads (possibly a bit more, car. thread detection mechanism below). How many running threads do you have? Is it always a new one? Do you have an idea of the size of the threadValues collection? The size should be equal to the number of JsonFormatter you have in memory. You should also have the same amount of ThreadLocalReusableByteBuffer... Do you confirm?
We maintain a WeakReference to each thread that acquired a value. These refs are added in a ReferenceQueue by the JVM when it is garbaged collected. This is how we detect a thread is dead and it is now safe to dispose the resources that have been allocated to it. Note that this is a "passive" process, ie there is no dedicated thread allocated to this task. Instead, this house keeping process takes place every time an event is logged. If there is no GC event or no logging activity, nothing is collected and resources are not disposed. Does the heap size decrease after a GC?
Do you have a minimal application I could use to reproduce the scenario?

As you understand, the worst case scenario is logging from a new thread every time + large heap size causing very few GC. In this case you are likely to see the heap usage growing until the first GC event. Is this your case?

PS: I have not been able to reproduce the scenario (yet) - hence all my questions...

brenuart · 2021-12-21T08:27:55Z

As far as I can tell from the screenshots, there is about 41800 items in the threadValues collection. Knowing that these items have a retained size of roughly 15k per thread, we get a total of 650Mb which is what the reports says.
However, this means you have 41800 thread instances in memory as well. I doubt they are all alive... The only way to get so many items in the collection is these thread instances have not yet been GC and are still in memory. Do you confirm?

brenuart · 2021-12-28T14:35:27Z

@donbeave Any news about this issue?

donbeave · 2022-01-11T01:38:09Z

@brenuart Sorry for the late response. We didn't do experiments during December and the begging of January, as we blocked the deployment of new features from the master. But before that, we tried some and found that it's not related to the version of logstash-logback-encoder, so it looks like the problem is somewhere else on our side. I'll be back here after we do some more experiments.

brenuart · 2022-01-24T19:01:03Z

@donbeave Do you confirm the issue is outside of logback-logstash-encoder?
Can we close this issue or do you have any additional information?

brenuart · 2022-03-07T11:36:21Z

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

tommyulfsparre · 2022-04-22T14:21:15Z

@brenuart We might have been able to reproduce this issue here: https://github.com/tommyulfsparre/repro-logstash-encoder but it is only reproducible running Java 17, 18 not 15 or 11 and only when using the ForkJoinPool common pool. Where the threadValues grows unbounded.

klaraward · 2022-04-22T14:34:06Z

As a comment to @tommyulfsparre above, it is not reproducible with Java 16 either.

And as you can see, the repro uses version 7.1.1

brenuart · 2022-04-24T12:59:26Z

Hi @tommyulfsparre, @klaraward,

Thanks for the little test case - I could reproduce the issue!

I could trace it down to a change introduced in Java 17 in how the ForkJoinPool behaves and more specifically the common ForkJoinPool. To make it short, since Java 17 the common ForkJoinPool is configured with innocuous workers. This kind of worker is clearing the thread locals before executing the submitted task, probably to make the thread appears as if it was a brand new thread instead of a thread reused from the pool. See this change for more information.

Note that the problem disappears if you use a new ForkJoinPool() instead of a ForkJoinPool.common()... Only the latter is using innocuous workers...

This of course break assumptions made by our ThreadLocalHolder utility class. Here we make the assumption that the Thread's threadLocal map is not cleared until the thread dies - which is not true anymore when the thread comes out of a common ForkJoinPool.

I'll have a look at how we can fix this and avoid the memory leak. However, ThreadLocal can't be used anymore to keep track of resources allocated to the thread. Unfortunately we rely on this mechanism to reuse resources like buffers and JsonGenerator instead of creating new instances for every ILoggingEvent... I'm not yet convinced I'll be able to restore this functionality when the thread comes out of a common ForkJoinPool...

Anyway, stay tuned...

tommyulfsparre · 2022-04-26T13:15:57Z

@brenuart thanks for looking into this!

We have filed an issue with openJDK to see if this is an intended change or not.

spkrka · 2022-05-03T12:08:46Z

I've been thinking a bit on how to make ThreadLocal in Java 17 behave the same way as in older Java and I ended up with a subclass of ThreadLocal that uses a fallback map keyed on thread id and keeps track of thread deaths to clean up the fallback map (both to prevent assigning old values to a new thread, and to avoid leaking data)

If you think this seems useful, feel free to use it as is or modify it: spotify/sparkey-java#55

caesar-ralf · 2022-05-04T11:36:17Z

From the conversation at the issue in open JDK it seems that this behaviour is what is expected and the only change coming is of better documentation. The team at Spotify will try to see if we can come up with a solution for this problem, but if you have already something @brenuart please let us know.

brenuart · 2022-05-06T06:02:12Z

Hi @caesar-ralf ,
I haven't had the time yet to have a look at your approach and think about the issue. However I checked the issue you created against openjdk and it looks like the observed behaviour of the ForkJoinPool is as expected. This means we definitely have something to do to adresse this use case: provide a workaround or add something about it in the documentation.
I'll keep updating this issue with our decision when I have the time to come back to it.
Thanks for your investigations and feedback.

In newer versions of java, specifically 16 and over, the `ThreadLocal` can be cleared before the Thread is killed. This breaks the assumption of `ThreadLocalHolder` which expected the `initializeThread` method to be executed only once per `Thread`. This commit changes the `threadValues` to become a map that holds which thread the value is associated with so we can avoid recreating new values when one already exists. Fixes logfellow#722

caesar-ralf · 2022-05-09T15:24:25Z

Found a potential solution for the problem @brenuart, if you can take a look! It's based on what @spkrka did in Sparkey

In newer versions of java, specifically 16 and over, the `ThreadLocal` can be cleared before the Thread is killed. This breaks the assumption of `ThreadLocalHolder` which expected the `initializeThread` method to be executed only once per `Thread`. This commit changes the `threadValues` to become a map that holds which thread the value is associated with so we can avoid recreating new values when one already exists. Fixes logfellow#722

In newer versions of java, specifically 16 and over, the `ThreadLocal` can be cleared before the Thread is killed. This breaks the assumption of `ThreadLocalHolder` which expected the `initializeThread` method to be executed only once per `Thread`. This commit changes the `threadValues` to become a map that holds which thread the value is associated with so we can avoid recreating new values when one already exists. Fixes #722

brenuart added the status/waiting-for-feedback Waiting for feedback from the original author. Issue will be closed if feedback is not received soon label Dec 20, 2021

brenuart closed this as completed Mar 7, 2022

brenuart removed the status/waiting-for-feedback Waiting for feedback from the original author. Issue will be closed if feedback is not received soon label Mar 7, 2022

brenuart reopened this Apr 23, 2022

brenuart added the type/bug label Apr 24, 2022

brenuart changed the title ~~Probably memory leak in 7.0.1 version~~ Memory leak in 7.0.1 version when logging from a thread coming out of a "ForkJoinPool.common()" Apr 24, 2022

brenuart changed the title ~~Memory leak in 7.0.1 version when logging from a thread coming out of a "ForkJoinPool.common()"~~ Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" (_innocuous_ worker) Apr 24, 2022

brenuart changed the title ~~Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" (_innocuous_ worker)~~ Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" Apr 24, 2022

caesar-ralf mentioned this issue May 9, 2022

Fixes memory leak on JDK 16+ #806

Merged

brenuart closed this as completed in #806 May 18, 2022

brenuart added this to the 7.1.2 milestone May 18, 2022

KristofferSmedlund mentioned this issue Oct 30, 2024

"collision in threads table" using CompletableFutures and Java 16+ scalyr/scalyr-logback#29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" #722

Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" #722

donbeave commented Dec 20, 2021

brenuart commented Dec 20, 2021 •

edited

Loading

brenuart commented Dec 21, 2021

brenuart commented Dec 28, 2021

donbeave commented Jan 11, 2022

brenuart commented Jan 24, 2022

brenuart commented Mar 7, 2022

tommyulfsparre commented Apr 22, 2022 •

edited

Loading

klaraward commented Apr 22, 2022 •

edited

Loading

brenuart commented Apr 24, 2022 •

edited

Loading

tommyulfsparre commented Apr 26, 2022 •

edited

Loading

spkrka commented May 3, 2022

caesar-ralf commented May 4, 2022

brenuart commented May 6, 2022

caesar-ralf commented May 9, 2022

Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" #722

Memory leak in 7.0.1 when logging from a thread coming out of a "ForkJoinPool.common()" #722

Comments

donbeave commented Dec 20, 2021

brenuart commented Dec 20, 2021 • edited Loading

brenuart commented Dec 21, 2021

brenuart commented Dec 28, 2021

donbeave commented Jan 11, 2022

brenuart commented Jan 24, 2022

brenuart commented Mar 7, 2022

tommyulfsparre commented Apr 22, 2022 • edited Loading

klaraward commented Apr 22, 2022 • edited Loading

brenuart commented Apr 24, 2022 • edited Loading

tommyulfsparre commented Apr 26, 2022 • edited Loading

spkrka commented May 3, 2022

caesar-ralf commented May 4, 2022

brenuart commented May 6, 2022

caesar-ralf commented May 9, 2022

brenuart commented Dec 20, 2021 •

edited

Loading

tommyulfsparre commented Apr 22, 2022 •

edited

Loading

klaraward commented Apr 22, 2022 •

edited

Loading

brenuart commented Apr 24, 2022 •

edited

Loading

tommyulfsparre commented Apr 26, 2022 •

edited

Loading