Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection leak when there is a master/slave node fail over #342

Closed
3 tasks
long-xuan-nguyen opened this issue Aug 25, 2016 · 4 comments
Closed
3 tasks
Labels
for: stackoverflow A question that is better suited to stackoverflow.com

Comments

@long-xuan-nguyen
Copy link

long-xuan-nguyen commented Aug 25, 2016

I encountered an issue when there is a change in topology

Setup: 20 master nodes, each with 2 slaves. Running 4.1.1.Final. ReadFrom is read from slaves

Sequence of events:

  • M1 went down; redis process got killed completely
  • S1 got promoted to be new master
  • S2 now starts syncing from S1, the new master

During the time window when S2 is syncing from S1, there are many connections opened from a host to S2, eventually causing Redis to hit its client count limit and reject other connections. I see a bunch of IOException Connection reset by peer in the log.

Looking into the code (https://github.com/mp911de/lettuce/blob/4.2.x/src/main/java/com/lambdaworks/redis/cluster/PooledClusterConnectionProvider.java#L520), here is what i think is happening:

  • Cluster topology refresh happens
  • Requests for reading against S2 come in; new connection is initialized
  • Before this new connection is passed back to the LoadingCache, readonly() is invoked on the sync interface.
  • This command fails with error "Loading redis is loading dataset in memory"
  • The exception got propagated back to the caller, resulting in a failure. The connection created in second step is now orphaned and not accessible thru the cache
  • The loading process takes a few minutes and with enough TPS, S2 is completely saturated with connections.

Is the fix here to close the connection when readonly() fails?

Would be nice if Redis "cluster nodes" command provides a flag for this state when Redis is loading in RAM so we can exclude from topology. An alternative is to somehow trigger an event when this happens (first time seeing Redis loading error on a connection) and have the topology refresher exclude the node from the topology. These are just wish-list items though. Would really appreciate if you can provide feedback on the bug and the proposed fix.

Note that when S2 is saturated, we're also seeing very weird behavior in another set of servers where they purely perform writes which should never establish any connection to slaves except for topology refresh. These hosts would have ConnectionWatchdog throwing exceptions like NullPointerException, cannot initialize channel ... Unfortunately, we don't have debug/trace log enabled so cant really debug more that that. I am assuming this issue is related to https://github.com/mp911de/lettuce/issues/278 ? After a while, both the writer hosts and reader hosts got into a state where this exception is seen everywhere

21 Aug 2016 08:59:17,579 [WARN]  (lettuce-nioEventLoop-3-152) com.lambdaworks.redis.protocol.CommandHandler: [/10.91.86.192:49672 -> /10.162.234.169:6379] Unexpected exception during request: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 5133828096, max: 5145362432)
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 5133828096, max: 5145362432)
    at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:613)
    at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:567)
    at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:699)
    at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:688)
    at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:237)
    at io.netty.buffer.PoolArena.allocate(PoolArena.java:213)
    at io.netty.buffer.PoolArena.allocate(PoolArena.java:141)
    at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:262)
    at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
    at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:170)
    at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:131)
    at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:73)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:111)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:571)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:512)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:426)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:398)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:805)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:145)
    at java.lang.Thread.run(Thread.java:745)

Thanks,

Make sure that:

  • You have read the contribution guidelines.

  • You specify the lettuce version and environment so it's obvious which version is affected

  • You provide a reproducible test case (either descriptive of as JUnit test) if it's a bug or the expected behavior differs from the actual behavior.

@mp911de mp911de added the for: stackoverflow A question that is better suited to stackoverflow.com label Aug 25, 2016
@mp911de
Copy link
Collaborator

mp911de commented Aug 25, 2016

You're tackling a few issues here. Let's untangle these and see what we have here:

  1. Clients exceed server capacity: If a node goes down and the remaining nodes are not able to deal with the capacity, then you might have a capacity issue. If you don't strictly require to read from slaves only, then you can either switch to ReadFrom.NEAREST or you provide an own implementation for ReadFrom.
  2. Orphaned Connections: That's a bug that requires a fix, I created Close connections in PooledClusterConnectionProvider on connection failures #343 to fix that issue.
  3. Flag for a slave that synchronizes: I'm not aware of any flags that could be used for that. The output of CLUSTER NODES is limited to nodeId/endpoint/ping/health/slot details and does not provide any details about lag. Synchronization is an ongoing async process so it's not possible to determine with a high accuracy why a node is syncing and what should be considered a faulty state.
  4. ConnectionWatchdog throws NullPointerException: That's a bug that was fixed with lettuce 4.2.2.Finalwith ConnectionWatchdog should be the last handler #335 and CommandHandler notifications called out of order #315. Sometimes the reconnect is faster than the disconnect notification. That leads to the mentioned behavior. Please upgrade to 4.2.2.Final and create a ticket if the issue if the problem persists.

@long-xuan-nguyen
Copy link
Author

  1. We already have our own ReadFrom implementation which basically prefer slave over master
  2. Thanks for the fix. Is there any other place where this could potentially be an issue?
  3. in "info" command output, you can have tell whether the node is doing a full sync by looking at the flag "master_sync_in_progress". If only this flag got exposed as part of "cluster nodes", we will be able to exclude this node. Alternatively, we can call "info" on each node but that seems excessive. Would be nice to have Lettuce detect this error "LOADING" the first time and have topology reacts to this event and automatically exclude this node from further requests until this is done. This can be achieved with a background process to detect when the node is done syncing by checking this flag from "info" command once every 10 seconds or something
  4. I thought so too. Upgraded our library to 4.2.2 and will report back if there is any issue

Thanks for your quick turnaround on this

@mp911de
Copy link
Collaborator

mp911de commented Aug 25, 2016

1+3. I think these things are related to each other. Removing the node from the topology might fit for your case, but not for others. Adding further configuration possibilities comes at a complexity and I'm not convinced yet that this is a common requirement. RedisClusterClient publishes a topology changed event. You can listen to these and query the nodes yourself. By using the ReadFrom API you're able to implement a selector that ignores these nodes in the candidate selection.
2. I need to check MasterSlave to verify the bug does not occur there as well, thanks for the hint.

@mp911de
Copy link
Collaborator

mp911de commented Sep 3, 2016

Is there anything I may assist you within the scope of this ticket or can I close it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: stackoverflow A question that is better suited to stackoverflow.com
Projects
None yet
Development

No branches or pull requests

2 participants