-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PubSubEndpoint.channels and patterns contain duplicate binary channel/pattern names #911
Comments
When the Reliability in DefaultEndpoint is to Reliability.AT_MOST_ONCE , this issue does not occur. Which points to something in the lifecycle of the RetryListener. |
Thanks a lot for the report. The issue can be related to overall disconnects and Lettuce's retry queue. Commands, that are issued while Redis is not available, are buffered. The request queue is by default unbounded which can consume a lot of memory with an appropriate load/Redis downtime. You can do the following two:
Care to give a try with the most applicable setting? |
ClientOptions.requestQueueSizeThis does not work. I have verified that the request queue size actually is 10, and it even shows up in an error message after the Test setup:mvn spring-boot:run -Dspring-boot.run.jvmArguments="-Xmx512m" And results in:
before
Some thoughtsWhat I find interesting (but may be completely expected) is that after every reconnect it seems that there is an extra SUBSCRIBE and PSUBSCRIBE command executed. I would think a single subscription should be sufficient (after a reconnect). Over time, I see that the following 2 log lines occur more often on every reconnect:
What is interesting that ithey are all the same channel:
Another fun fact: it seems that old-gen heap space increases in steps, exponentially. Every step is significantly larger than the previous one:
ClientOptions.disconnectedBehaviorThis is not an option that is appropriate for me, but I have tested it. It gives a very irregular memory profile, and it generates a lot of objects on old-gen heap space, but it does not seem to crash the JVM with an OOM. While this does not seem to generate an OOM in a unused application, I do find it strange that the amount of old-gen heap space increases so dramatically. CodeClientOptions were configured using the spring-data-redis facilities:
|
Regarding resubscription: Lettuce re-subscribes to patterns and channels if there have been previous registrations to restore subscription state. Thanks a lot for testing the options. I'm puzzled about Looking at the profiler graph, there's an excessive number of |
Checked with a queue size of 3. That stabilizes on 240 MB old-gen used. No crash. When profiling the memory dump I do see that a significant amount of memory is held by the lettuce When checking the paths from gc roots, I find that there 5 distinct trees: 3 x type = SUBSCRIBE and 2 x type = PSUBSCRIBE. Command type=SUBSCRIBEThe data in the Command is retained mostly in This is the case of Command entry points that are strongky reachable, suggesting to me that this is actually the same. Calculating the paths for these 3 entry points:
It does look there are directly or indirectly, multiple paths to the retained commands. Command type=PSUBSCRIBEThis is very similar, except that the key is 'spring:session:'
I have no clue why I don't see a stack local entry for this one, as the data is just as stack local as the SUBSCRIBE. |
Maybe the short version: There appear to be over a million rows in |
How many subscriptions are there? It seems to me that you might be subscribed to over a million channels/patterns. |
With every 15 seconds a connection reset, then after a minute or 2 there are:
And a few minutes later:
and after 7 minutes we're at 300.000 channels. |
Okay, so I think I found what's happening here.
As you can see, I'm not exactly sure who is really causing that problem, however we should place a fix in Lettuce to properly compare channel and pattern names. |
Awesome. If you want me to test/verify a fix I'd be happy to. |
Thanks a lot for running these kind of tests. These issues typically go unnoticed. |
Lettuce now stores pattern and channel names to which a connection is subscribed to with a comparison wrapper. This allows to compute hashCode and check for equality for built-in channel name types that do not support equals/hashCode for their actual content, in particular byte arrays. Using byte[] for channel names prevented a proper equality check regarding the binary content and caused duplicates in the channel list. With every subscription, channel names were added in a quadraric amount at excessive memory cost.
Lettuce now stores pattern and channel names to which a connection is subscribed to with a comparison wrapper. This allows to compute hashCode and check for equality for built-in channel name types that do not support equals/hashCode for their actual content, in particular byte arrays. Using byte[] for channel names prevented a proper equality check regarding the binary content and caused duplicates in the channel list. With every subscription, channel names were added in a quadraric amount at excessive memory cost.
I pushed a change and a new build is available ( |
Works like a charm. Thanks! |
Thanks for verifying the fix. |
Hi, i got this trouble too. We use spring-boot-starter-data-redis 2.4.1 to connect redis. 5.0.7. My behavior Dump message: 468 instances of "reactor.core.publisher.FluxFlatMap$FlatMapMain", loaded by "sun.misc.Launcher$AppClassLoader @ 0x6f1b4b0f8" occupy 522,742,480 (82.04%) bytes. These instances are referenced from one instance of "java.util.concurrent.ConcurrentLinkedDeque", loaded by "" It will lead to old-gen gc and outofmemory. I found it will cause many of reconnect commands to be added to the disconnectBuffer without restriction. I'm not sure this is my mistake configuration or a lettuce's issues. |
Bug Report
When using spring-boot 2 (we've seen it in 2.0.6 and 2.1.0) with spring-session backed by spring-data-redis backed by lettuce, we see application crashes due to out-of-memory problems. These issues seem to be triggered by reconnection attempts.
Memory profiling does show a huge increase in old-gen space usage in a frequent reconnection scenario.
Current Behavior
We use spring-session with redis through lettuce, and they die about once a day due to memory issues. They did not when they were still spring-boot 1 / jedis based.
The issue was tracked to lettuce, and there is a very likely link to the reconnects our haproxy does when the connection is unused for a while. But not always by a long shot ; it seems to happen after some number of reconnects.
We have been able to reproduce with a stock-standard spring-boot application and a docker-based haproxy/redis configuration.
Input Code
https://github.com/lwiddershoven/lettuce-memory-issue
Contains the application, the docker-compose file for haproxy/redis , some logging from the error, screenshots from the YourKit profiles, and the pom for a Jedis based application to compare behaviour.
Expected behavior/code
I expect that memory usage, after gc, is the same before and after a reconnection cycle.
Environment
Possible Solution
n/a
Additional context
It seems that the PubSubEndpoint disconnectedBuffer is pretty large, and there appear to be a lot of AsyncCommands in memory for a system that is, from an end-user perspective, doing nothing.
The text was updated successfully, but these errors were encountered: