Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lettuce not able to reconnect automatically to SSL+authenticated ElastiCache node #1201

Closed
chadlwilson opened this issue Jan 5, 2020 · 7 comments
Labels
type: feature A new feature
Milestone

Comments

@chadlwilson
Copy link

Bug Report

I raised this initially at spring-projects/spring-boot#19436 however I have managed to get trace logs now, and it seems more suitable to raise here.

Current Behavior

When our AWS ElastiCache primary Redis node is restarted, Lettuce's automatic reconnection doesn't seem to leave connections in an authenticated, usable state. We just get repeated NOAUTH Authentication Required errors on the connections which don't appear to be recoverable.

It seems I can replicate this reliably by restarting the master node from AWS Console, but have not been able to replicate it with a local test, local Redis within Docker Compose.

Relevant snippet is below - full logs with trace information at https://gist.github.com/chadlwilson/a35bd624775c278dc4bdfe7d2347b8c5

{"ts":"2020-01-05T22:57:24.962+08:00","level":"INFO","thread":"lettuce-eventExecutorLoop-1-3","logger":"io.lettuce.core.protocol.ConnectionWatchdog","message":"Reconnecting, last destination was master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379","throwable":{}}
{"ts":"2020-01-05T22:57:24.964+08:00","level":"DEBUG","thread":"lettuce-eventExecutorLoop-1-3","logger":"io.lettuce.core.protocol.ReconnectionHandler","message":"Reconnecting to Redis at master.redis.oiwzdu.apse1.cache.amazonaws.com:6379","throwable":{}}
{"ts":"2020-01-05T22:57:24.993+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, [id: 0x42742e08] (inactive), chid=0x2] channelRegistered()","throwable":{}}
{"ts":"2020-01-05T22:57:25.029+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] channelActive()","throwable":{}}
{"ts":"2020-01-05T22:57:25.029+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.DefaultEndpoint","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, epid=0x1] activateEndpointAndExecuteBufferedCommands 0 command(s) buffered","throwable":{}}
{"ts":"2020-01-05T22:57:25.030+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.DefaultEndpoint","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, epid=0x1] activating endpoint","throwable":{}}
{"ts":"2020-01-05T22:57:25.030+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.DefaultEndpoint","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, epid=0x1] write() writeAndFlush command TracedCommand [type=AUTH, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.AsyncCommand]","throwable":{}}
{"ts":"2020-01-05T22:57:25.030+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] write(ctx, TracedCommand [type=AUTH, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.AsyncCommand], promise)","throwable":{}}
{"ts":"2020-01-05T22:57:25.031+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandEncoder","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379] writing command TracedCommand [type=AUTH, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.AsyncCommand]","throwable":{}}
{"ts":"2020-01-05T22:57:25.031+08:00","level":"TRACE","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandEncoder","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379] Sent: *2\r\n$4\r\nAUTH\r\n$16\r\nREDACTED","throwable":{}}
{"ts":"2020-01-05T22:57:25.033+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.DefaultEndpoint","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, epid=0x1] write() done","throwable":{}}
{"ts":"2020-01-05T22:57:25.033+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.DefaultEndpoint","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, epid=0x1] flushCommands()","throwable":{}}
{"ts":"2020-01-05T22:57:25.033+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.DefaultEndpoint","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, epid=0x1] flushCommands() Flushing 0 commands","throwable":{}}
{"ts":"2020-01-05T22:57:25.033+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.ConnectionWatchdog","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, last known addr=master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379] channelActive()","throwable":{}}
{"ts":"2020-01-05T22:57:25.034+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] channelActive() done","throwable":{}}
{"ts":"2020-01-05T22:57:25.034+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.ConnectionWatchdog","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, last known addr=master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379] userEventTriggered(ctx, SslHandshakeCompletionEvent(SUCCESS))","throwable":{}}
{"ts":"2020-01-05T22:57:25.034+08:00","level":"INFO","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.ReconnectionHandler","message":"Reconnected to master.redis.oiwzdu.apse1.cache.amazonaws.com:6379, Channel channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379","throwable":{}}
{"ts":"2020-01-05T22:57:25.035+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.ConnectionWatchdog","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, last known addr=master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379] userEventTriggered(ctx, io.lettuce.core.ConnectionEvents$Activated@1e686cf1)","throwable":{}}
{"ts":"2020-01-05T22:57:25.035+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] Received: 47 bytes, 1 commands in the stack","throwable":{}}
{"ts":"2020-01-05T22:57:25.036+08:00","level":"TRACE","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] Buffer: -ERR Client sent AUTH, but no password is set","throwable":{}}
{"ts":"2020-01-05T22:57:25.038+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] Stack contains: 1 commands","throwable":{}}
{"ts":"2020-01-05T22:57:25.038+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.RedisStateMachine","message":"Decode LatencyMeteredCommand [type=AUTH, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.TracedCommand]","throwable":{}}
{"ts":"2020-01-05T22:57:25.038+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.RedisStateMachine","message":"Decoded LatencyMeteredCommand [type=AUTH, output=StatusOutput [output=null, error='ERR Client sent AUTH, but no password is set'], commandType=io.lettuce.core.protocol.TracedCommand], empty stack: true","throwable":{}}
{"ts":"2020-01-05T22:57:30.654+08:00","level":"DEBUG","thread":"XNIO-2 task-16","logger":"io.lettuce.core.protocol.DefaultEndpoint","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, epid=0x1] write() writeAndFlush command TracedCommand [type=INFO, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.AsyncCommand]","throwable":{}}
{"ts":"2020-01-05T22:57:30.654+08:00","level":"DEBUG","thread":"XNIO-2 task-16","logger":"io.lettuce.core.protocol.DefaultEndpoint","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, epid=0x1] write() done","throwable":{}}
{"ts":"2020-01-05T22:57:30.654+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] write(ctx, TracedCommand [type=INFO, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.AsyncCommand], promise)","throwable":{}}
{"ts":"2020-01-05T22:57:30.655+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandEncoder","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379] writing command TracedCommand [type=INFO, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.AsyncCommand]","throwable":{}}
{"ts":"2020-01-05T22:57:30.655+08:00","level":"TRACE","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandEncoder","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379] Sent: *1\r\n$4\r\nINFO","throwable":{}}
{"ts":"2020-01-05T22:57:30.658+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] Received: 34 bytes, 1 commands in the stack","throwable":{}}
{"ts":"2020-01-05T22:57:30.658+08:00","level":"TRACE","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] Buffer: -NOAUTH Authentication required.","throwable":{}}
{"ts":"2020-01-05T22:57:30.658+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] Stack contains: 1 commands","throwable":{}}
{"ts":"2020-01-05T22:57:30.658+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.RedisStateMachine","message":"Decode LatencyMeteredCommand [type=INFO, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.TracedCommand]","throwable":{}}
{"ts":"2020-01-05T22:57:30.658+08:00","level":"DEBUG","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.RedisStateMachine","message":"Decoded LatencyMeteredCommand [type=INFO, output=StatusOutput [output=null, error='NOAUTH Authentication required.'], commandType=io.lettuce.core.protocol.TracedCommand], empty stack: true","throwable":{}}

Then repeatedly logs the below on attempts to use the connection -

i.l.c.RedisCommandExecutionException: NOAUTH Authentication required.
	at i.l.c.ExceptionFactory.createExecutionException(ExceptionFactory.java:135)
	at i.l.c.ExceptionFactory.createExecutionException(ExceptionFactory.java:108)
	at i.l.c.p.AsyncCommand.completeResult(AsyncCommand.java:120)
	at i.l.c.p.AsyncCommand.complete(AsyncCommand.java:111)
	at i.l.c.p.CommandWrapper.complete(CommandWrapper.java:59)
	at i.l.c.p.CommandWrapper.complete(CommandWrapper.java:59)
	at i.l.c.p.CommandHandler.complete(CommandHandler.java:654)
	at i.l.c.p.CommandHandler.decode(CommandHandler.java:614)
	at i.l.c.p.CommandHandler.channelRead(CommandHandler.java:565)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
	at i.n.c.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
	at i.n.h.ssl.SslHandler.unwrap(SslHandler.java:1478)
	at i.n.h.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1227)
	at i.n.h.ssl.SslHandler.decode(SslHandler.java:1274)
	at i.n.h.c.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:503)
	at i.n.h.c.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442)
	at i.n.h.c.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:281)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
	at i.n.c.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
	at i.n.c.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
	at i.n.c.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
	at i.n.c.e.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
	at i.n.c.e.EpollEventLoop.processReady(EpollEventLoop.java:502)
	at i.n.c.e.EpollEventLoop.run(EpollEventLoop.java:407)
	at i.n.u.c.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)
	at i.n.u.i.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at i.n.u.c.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 common frames omitted
Wrapped by: o.s.d.r.RedisSystemException: Error in execution; nested exception is io.lettuce.core.RedisCommandExecutionException: NOAUTH Authentication required.
	at o.s.d.r.c.l.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:54)
	at o.s.d.r.c.l.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:52)
	at o.s.d.r.c.l.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41)
	at o.s.d.r.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
	at o.s.d.r.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
	at o.s.d.r.c.l.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:270)
	at o.s.d.r.c.l.LettuceServerCommands.convertLettuceAccessException(LettuceServerCommands.java:571)
	at o.s.d.r.c.l.LettuceServerCommands.info(LettuceServerCommands.java:215)
	at o.s.d.r.c.DefaultedRedisConnection.info(DefaultedRedisConnection.java:1291)
	at o.s.b.a.r.RedisHealthIndicator.doHealthCheck(RedisHealthIndicator.java:64)
	at o.s.b.a.h.AbstractHealthIndicator.health(AbstractHealthIndicator.java:82)
	at o.s.b.a.h.HealthIndicator.getHealth(HealthIndicator.java:37)
	at o.s.b.a.h.HealthEndpointWebExtension.getHealth(HealthEndpointWebExtension.java:95)
	at o.s.b.a.h.HealthEndpointWebExtension.getHealth(HealthEndpointWebExtension.java:43)
	at o.s.b.a.h.HealthEndpointSupport.getContribution(HealthEndpointSupport.java:108)
	at o.s.b.a.h.HealthEndpointSupport.getAggregateHealth(HealthEndpointSupport.java:119)
	at o.s.b.a.h.HealthEndpointSupport.getContri...

Expected behavior/code

I would expect the reconnect to be handled cleanly and be able to authenticate properly.

From looking at the logs, it seems that during reconnection, Lettuce receives a response from Redis/AWS ElastiCache that indicates it should no longer send the password/auth token for future attempts.

Environment

  • Lettuce version(s): 5.2.1.RELEASE
  • Redis version: 5.0.5 (AWS ElasticCache)

Being used within a Spring Boot/Spring Data Redis project

  • Spring Boot 2.2.2.RELEASE
  • Spring Data Redis 2.2.3.RELEASE
  • Spring Framework 5.2.2.RELEASE

Settings

  • SSL enabled
  • Authentication required

Possible Solution

None known.

@chadlwilson chadlwilson added the type: bug A general bug label Jan 5, 2020
@mp911de
Copy link
Collaborator

mp911de commented Jan 8, 2020

Thanks for the excellent report, it contains all details to trace down the issue.

The connected server behaves weirdly. During the reconnect, we receive the following response ERR Client sent AUTH, but no password is set. Five seconds later, on the same connection, we see NOAUTH Authentication required.

Below are both log events that indicate what's going on. Please ask your AWS/Redis team to fix the issue on the Redis side.

This issue is hard to trace especially if you didn't had debug logging enabled. It makes sense to log command failures that result out of reconnection otherwise, these issues aren't visible.

{"ts":"2020-01-05T22:57:25.036+08:00","level":"TRACE","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] Buffer: -ERR Client sent AUTH, but no password is set","throwable":{}}

{"ts":"2020-01-05T22:57:30.658+08:00","level":"TRACE","thread":"lettuce-epollEventLoop-4-2","logger":"io.lettuce.core.protocol.CommandHandler","message":"[channel=0x3f9c9f7b, /172.23.8.187:41932 -> master.redis.oiwzdu.apse1.cache.amazonaws.com/172.23.11.219:6379, chid=0x2] Buffer: -NOAUTH Authentication required.","throwable":{}}

@mp911de mp911de added type: feature A new feature and removed type: bug A general bug labels Jan 8, 2020
@mp911de mp911de added this to the 5.2.2 milestone Jan 8, 2020
mp911de added a commit that referenced this issue Jan 13, 2020
Lettuce now logs failures of asynchronously fired commands during connection activation. Previously, failures during e.g. reconnect went unnoticed and could result in a subsequent NOAUTH Authentication required errors although the password was provided.
mp911de added a commit that referenced this issue Jan 13, 2020
Lettuce now logs failures of asynchronously fired commands during connection activation. Previously, failures during e.g. reconnect went unnoticed and could result in a subsequent NOAUTH Authentication required errors although the password was provided.
@mp911de
Copy link
Collaborator

mp911de commented Jan 13, 2020

Logging on AUTH failure (and a few other commands) during reconnect is now in place.

@mp911de mp911de closed this as completed Jan 13, 2020
mp911de added a commit that referenced this issue Jan 13, 2020
Lettuce now logs failures of asynchronously fired commands during connection activation.
@chadlwilson
Copy link
Author

Hi @mp911de - thanks for looking at this, and improving the logging. I will try and raise this with AWS Support. I suspect it relates to some kind of race condition where the AWS side starts the server and accepts connections before it has fully enabled/enforced the authentication token, leading to the messed up state.

I certain understand your perspective, however a difficulty here from a user perspective is that we have essentially instructed the driver to authenticate using a token/password, yet the driver essentially overrides the user request based on the confusing response from the server, rather than treating the connection as invalid and then trying to reconnect (with authentication enabled) and honouring the user configuration.

This leaves us in a bit of a difficult position in terms of workarounds to force a reconnect - is there any way you would suggest is best to deal with this in the short-term?

@mp911de
Copy link
Collaborator

mp911de commented Jan 14, 2020

Thanks for the additional details. Lettuce does not has the notion of an invalid connection when reconnecting by default. Instead, you can enable PingBeforeActivateConnection in ClientOptions. Lettuce will issue a PING command on reconnect to check whether the remote peer replies correctly. When authentication is configured, the PING command is replaced with AUTH. I'm sorry this option didn't come earlier to my mind. The command is deprecated with the outlook that a connection handshake is required with Redis 6. We might un-deprecate it if it turns out to be helpful in setups like yours.

If the activation PING fails, Lettuce disconnects and re-attempts connecting.

Other than that, we have no means to catch command failures during command activation as we assume that activation commands (of which AUTH is as well an activation command) will succeed if they worked earlier.

Can you check whether enabling PING on connection activation can help in your setup?

@chadlwilson
Copy link
Author

Thanks a lot for the suggestion! I have tried that out and it seems to workaround the problem which is great. The log for this working is at https://gist.github.com/chadlwilson/bcbf2f13964dd31e0117b16f9de6f073

    @Bean
    @ConditionalOnProperty("spring.redis.ping-before-activate-connection")
    @SuppressWarnings("deprecation") // Suggested in https://github.com/lettuce-io/lettuce-core/issues/1201
    public LettuceClientConfigurationBuilderCustomizer pingBuilderCustomizer() {
        log.info("Redis Connection ping-before-activate enabled");
        return builder -> builder.clientOptions(
            // Default ClientOptions values here were propagated from LettuceClientConfigurationBuilder since
            // it doesn't really allow the ClientOptions builder itself to be injected.
            ClientOptions
                .builder()
                .timeoutOptions(TimeoutOptions.enabled())
                .pingBeforeActivateConnection(true)
                .build()
        );
    }

In the meantime, AWS Support have got back to me and said that they can replicate the issue and have raised to the ElastiCache team for further discussion/investigation.

@chadlwilson
Copy link
Author

So what do you think about the value of un-deprecating this command given this information? :-)

@mp911de
Copy link
Collaborator

mp911de commented Jan 15, 2020

Yeah, makes sense. I filed a ticket for it.

mp911de added a commit that referenced this issue Jan 31, 2020
pingBeforeActivateConnection turns out to be useful when using RESP2 in cases where the connection authentication should be delayed. Another case is when the PING should be used as auth PING to test whether a reconnect was successful.

Related ticket: #1201.
mp911de added a commit that referenced this issue Jan 31, 2020
pingBeforeActivateConnection turns out to be useful when using RESP2 in cases where the connection authentication should be delayed. Another case is when the PING should be used as auth PING to test whether a reconnect was successful.

Related ticket: #1201.
mp911de added a commit that referenced this issue Jan 31, 2020
pingBeforeActivateConnection turns out to be useful when using RESP2 in cases where the connection authentication should be delayed. Another case is when the PING should be used as auth PING to test whether a reconnect was successful.

Related ticket: #1201.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A new feature
Projects
None yet
Development

No branches or pull requests

2 participants