Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling dead Sentinel slaves #669

Closed
vleushin opened this issue Dec 13, 2017 · 4 comments
Closed

Handling dead Sentinel slaves #669

vleushin opened this issue Dec 13, 2017 · 4 comments
Labels
type: bug A general bug
Milestone

Comments

@vleushin
Copy link

vleushin commented Dec 13, 2017

lettuce 5.0.2.BUILD-SNAPSHOT in Kubernetes in 1.8.0. Sentinel keeps unreachable nodes, which is normal. After master pod kill, lettuce unable to update topology (sentinel worked fine), trying ping dead node and retries forever.

BEFORE taking down master (note: there already 2 dead slave nodes when I started application and lettuce is working fine):

127.0.0.1:26379> SENTINEL master redis-service
 1) "name"
 2) "redis-service"
 3) "ip"
 4) "10.233.101.47"
 5) "port"
 6) "6379"
 7) "runid"
 8) "6faabe3ea6174f1fad1417c82ea1ff4de1854e76"
 9) "flags"
10) "master"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "876"
19) "last-ping-reply"
20) "876"
21) "down-after-milliseconds"
22) "10000"
23) "info-refresh"
24) "2829"
25) "role-reported"
26) "master"
27) "role-reported-time"
28) "54231354"
29) "config-epoch"
30) "6"
31) "num-slaves"
32) "3"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
37) "failover-timeout"
38) "60000"
39) "parallel-syncs"
40) "1"
127.0.0.1:26379> SENTINEL slaves redis-service
1)  1) "name"
    2) "10.233.98.122:6379"
    3) "ip"
    4) "10.233.98.122"
    5) "port"
    6) "6379"
    7) "runid"
    8) "e835e5508e4976b7c9b1652f69ee1b023482c599"
    9) "flags"
   10) "slave"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "950"
   19) "last-ping-reply"
   20) "950"
   21) "down-after-milliseconds"
   22) "10000"
   23) "info-refresh"
   24) "2527"
   25) "role-reported"
   26) "slave"
   27) "role-reported-time"
   28) "54116925"
   29) "master-link-down-time"
   30) "0"
   31) "master-link-status"
   32) "ok"
   33) "master-host"
   34) "10.233.101.47"
   35) "master-port"
   36) "6379"
   37) "slave-priority"
   38) "100"
   39) "slave-repl-offset"
   40) "65575218"
2)  1) "name"
    2) "10.233.101.45:6379"
    3) "ip"
    4) "10.233.101.45"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave"
   11) "link-pending-commands"
   12) "100"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "54267642"
   17) "last-ok-ping-reply"
   18) "54267642"
   19) "last-ping-reply"
   20) "54267642"
   21) "s-down-time"
   22) "54257615"
   23) "down-after-milliseconds"
   24) "10000"
   25) "info-refresh"
   26) "1513165498275"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "54267642"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"
3)  1) "name"
    2) "10.233.98.107:6379"
    3) "ip"
    4) "10.233.98.107"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave"
   11) "link-pending-commands"
   12) "100"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "54267642"
   17) "last-ok-ping-reply"
   18) "54267642"
   19) "last-ping-reply"
   20) "54267642"
   21) "s-down-time"
   22) "54257615"
   23) "down-after-milliseconds"
   24) "10000"
   25) "info-refresh"
   26) "1513165498275"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "54267642"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"

AFTER:

127.0.0.1:26379> SENTINEL master redis-service
 1) "name"
 2) "redis-service"
 3) "ip"
 4) "10.233.98.122"
 5) "port"
 6) "6379"
 7) "runid"
 8) "e835e5508e4976b7c9b1652f69ee1b023482c599"
 9) "flags"
10) "master"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "171"
19) "last-ping-reply"
20) "171"
21) "down-after-milliseconds"
22) "10000"
23) "info-refresh"
24) "5749"
25) "role-reported"
26) "master"
27) "role-reported-time"
28) "206891"
29) "config-epoch"
30) "7"
31) "num-slaves"
32) "4"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
37) "failover-timeout"
38) "60000"
39) "parallel-syncs"
40) "1"
127.0.0.1:26379> SENTINEL slaves redis-service
1)  1) "name"
    2) "10.233.101.45:6379"
    3) "ip"
    4) "10.233.101.45"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave,disconnected"
   11) "link-pending-commands"
   12) "3"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "121847"
   17) "last-ok-ping-reply"
   18) "121847"
   19) "last-ping-reply"
   20) "121847"
   21) "s-down-time"
   22) "111821"
   23) "down-after-milliseconds"
   24) "10000"
   25) "info-refresh"
   26) "1513165735452"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "121847"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"
2)  1) "name"
    2) "10.233.101.47:6379"
    3) "ip"
    4) "10.233.101.47"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave"
   11) "link-pending-commands"
   12) "14"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "121847"
   17) "last-ok-ping-reply"
   18) "121847"
   19) "last-ping-reply"
   20) "121847"
   21) "s-down-time"
   22) "111821"
   23) "down-after-milliseconds"
   24) "10000"
   25) "info-refresh"
   26) "1513165735452"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "121847"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"
3)  1) "name"
    2) "10.233.98.107:6379"
    3) "ip"
    4) "10.233.98.107"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave"
   11) "link-pending-commands"
   12) "14"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "121847"
   17) "last-ok-ping-reply"
   18) "121847"
   19) "last-ping-reply"
   20) "121847"
   21) "s-down-time"
   22) "111821"
   23) "down-after-milliseconds"
   24) "10000"
   25) "info-refresh"
   26) "1513165735452"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "121847"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"
4)  1) "name"
    2) "10.233.101.48:6379"
    3) "ip"
    4) "10.233.101.48"
    5) "port"
    6) "6379"
    7) "runid"
    8) "59191dbfbefe571b11c543792cd737e1b2e80a1c"
    9) "flags"
   10) "slave"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "569"
   19) "last-ping-reply"
   20) "569"
   21) "down-after-milliseconds"
   22) "10000"
   23) "info-refresh"
   24) "1195"
   25) "role-reported"
   26) "slave"
   27) "role-reported-time"
   28) "51520"
   29) "master-link-down-time"
   30) "0"
   31) "master-link-status"
   32) "ok"
   33) "master-host"
   34) "10.233.98.122"
   35) "master-port"
   36) "6379"
   37) "slave-priority"
   38) "100"
   39) "slave-repl-offset"
   40) "65617551"

And lettuce trying to ping 10.233.101.45 node, unable to refresh topology.

@mp911de mp911de added the type: bug A general bug label Dec 13, 2017
@mp911de mp911de added this to the Lettuce 4.4.3 milestone Dec 13, 2017
@mp911de
Copy link
Collaborator

mp911de commented Dec 13, 2017

Thanks a lot. Host 10.233.101.45 is reported with flags s_down,slave,disconnected which should Lettuce stop from connecting/requesting a PING. Looks like a bug in the client.

mp911de added a commit that referenced this issue Dec 19, 2017
Lettuce now considers Sentinel messages on channels +sdown, -sdown and +slave as signals to refresh topology. This change allows to add new nodes during runtime and close connections to nodes that are not available (connectable). Temporary failures to connect a slave result in closing the client connection until the node is reachable again.
mp911de added a commit that referenced this issue Dec 19, 2017
Lettuce now considers Sentinel messages on channels +sdown, -sdown and +slave as signals to refresh topology. This change allows to add new nodes during runtime and close connections to nodes that are not available (connectable). Temporary failures to connect a slave result in closing the client connection until the node is reachable again.
mp911de added a commit that referenced this issue Dec 19, 2017
Lettuce now considers Sentinel messages on channels +sdown, -sdown and +slave as signals to refresh topology. This change allows to add new nodes during runtime and close connections to nodes that are not available (connectable). Temporary failures to connect a slave result in closing the client connection until the node is reachable again.
mp911de added a commit that referenced this issue Dec 19, 2017
Lettuce now considers Sentinel messages on channels +sdown, -sdown and +slave as signals to refresh topology. This change allows to add new nodes during runtime and close connections to nodes that are not available (connectable). Temporary failures to connect a slave result in closing the client connection until the node is reachable again.
@mp911de
Copy link
Collaborator

mp911de commented Dec 19, 2017

The issue is caused by not considering messages on -sdown, +sdown and +slave channels. These messages were not used to refresh topology. That's fixed now and available in all affected builds. Care to give 5.0.2.BULD-SNAPSHOT a spin?

@mp911de mp911de closed this as completed Dec 19, 2017
@vleushin
Copy link
Author

@mp911de Tried this version:

#Tue Dec 19 14:35:43 UTC 2017
version=5.0.2.BUILD-SNAPSHOT

It did not work!. Logs exploded (notice the timestamp) with this error messages (ip of dead slaves):

	2017-12-21 11:43:30.225 [WARN] [lettuce-epollEventLoop-49-2] [i.l.c.m.MasterSlaveTopologyRefresh] - Unable to connect to RedisURI [host='10.233.98.107', port=6379]
java.util.concurrent.CompletionException: io.netty.channel.ConnectTimeoutException: connection timed out: /10.233.98.107:6379
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]
	at io.lettuce.core.AbstractRedisClient.lambda$initializeChannelAsync$1(AbstractRedisClient.java:275) ~[lettuce-core-5.0.2.BUILD-SNAPSHOT.jar:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:600) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]
	at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /10.233.98.107:6379
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:598) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]
	... 8 more
2017-12-21 11:43:30.225 [WARN] [lettuce-epollEventLoop-49-2] [i.l.c.m.MasterSlaveTopologyRefresh] - Unable to connect to RedisURI [host='10.233.101.48', port=6379]
java.util.concurrent.CompletionException: io.netty.channel.ConnectTimeoutException: connection timed out: /10.233.101.48:6379
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]
	at io.lettuce.core.AbstractRedisClient.lambda$initializeChannelAsync$1(AbstractRedisClient.java:275) ~[lettuce-core-5.0.2.BUILD-SNAPSHOT.jar:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:600) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]
	at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /10.233.101.48:6379
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:598) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]
	... 8 more
2017-12-21 11:43:30.226 [WARN] [lettuce-epollEventLoop-22-3] [i.l.c.m.MasterSlaveTopologyRefresh] - Unable to connect to RedisURI [host='10.233.98.107', port=6379]
java.util.concurrent.CompletionException: io.netty.channel.ConnectTimeoutException: connection timed out: /10.233.98.107:6379
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]
	at io.lettuce.core.AbstractRedisClient.lambda$initializeChannelAsync$1(AbstractRedisClient.java:275) ~[lettuce-core-5.0.2.BUILD-SNAPSHOT.jar:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:600) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]
	at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /10.233.98.107:6379
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:598) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]
	... 8 more
2017-12-21 11:43:30.226 [WARN] [lettuce-epollEventLoop-31-2] [i.l.c.m.MasterSlaveTopologyRefresh] - Unable to connect to RedisURI [host='10.233.101.45', port=6379]
java.util.concurrent.CompletionException: io.netty.channel.ConnectTimeoutException: connection timed out: /10.233.101.45:6379
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]
	at io.lettuce.core.AbstractRedisClient.lambda$initializeChannelAsync$1(AbstractRedisClient.java:275) ~[lettuce-core-5.0.2.BUILD-SNAPSHOT.jar:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:600) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]
	at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ~[netty-common-4.1.16.Final.jar:4.1.16.Final]
	at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /10.233.101.45:6379
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:598) ~[netty-transport-native-epoll-4.1.16.Final-linux-x86_64.jar:4.1.16.Final]

topology was not updated, and there is endless error spam in logs.

@mp911de
Copy link
Collaborator

mp911de commented Dec 22, 2017

From the stack traces, it looks that the connected sentinel did not recognize the slave is actually down. You could trace the issue down yourself by connecting to all sentinels with PSUBSCRIBE * and see which one emits the down message last and see whether connects stop after the last +sdown message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

2 participants