Client cannot handle DNS updates #144

pulse00 · 2015-10-02T10:56:52Z

We're running a disque cluster on aws. Today we've tested a complete failure of a single node. The setup looks like this

node1.example.com running one disque node
node2.example.com running another disque node
client.example.com running the client (in our case a spinach client)

both nodes have been clustered using disque cluster meet ... .

Now the outage test looked like this:

Terminate instance node1.example.com
Re-provision instance node1.example.com (get's a new IP address on AWS)
Join the cluster again using disque cluster meet node1.example.com 7711 on node2.example.com
Terminate node2.example.com
Re-provision node2.example.com (get's a new IP address on AWS)
Join the cluster by running disque cluster meet node2.example.com 7711 on node1.example.com

From the disque perspective this test was successfull, however spinach could not handle the DNS change:

Cannot connect: io.netty.channel.ConnectTimeoutException: connection timed out: node1.example.com/10.0.0.1:7711 [c.l.r.p.ConnectionWatchdog]
Reconnecting, last destination was node1.example.com/10.0.0.1:7711 [c.l.r.p.ConnectionWatchdog]

I'm creating this issue in this repository, because i think the underlying ConnectionWatchdog handles the failover scenario.

The text was updated successfully, but these errors were encountered:

mp911de · 2015-10-02T11:07:33Z

You're totally right here. Hostname to IP resolution is handled by the Java mechanisms which then means, once the hostname is looked up, the address is cached for the duration of the JVM uptime.

Custom DNS lookups are possible but require additional infrastructure, like providing a DNS address. Although I like the failover (and DNS lookup), I see that it has natural limits. True that these failovers can be handled on DNS level. However, I see also following things:

Maintain the cluster topology within the client (like for Redis Cluster, so you require at least one stable server from which the cluster view is obtained. It's all on IP level (not hostnames) then, and we're fine)
That functionality could be built into lettuce. My usual experience from the past was: Then just restart that client. It's not nice, and it does not feel cloud native.
The functionality could be covered perhaps on RedisURI/DisqueURI level since they cache the SocketAddress. You could provide one, that performs the lookup just in time and handles caching on its own.
I also had an exposed SocketAddressProvider API on my mind. It's an API that is used already, and it could be made public. Users could bring their own implementation.

We should collect our thoughts and discuss these.

mp911de · 2015-10-09T05:12:49Z

Just checked, the netty DNS resolver is part of netty 4.1/5.0 and not 4.0.

The JavaVM allows to tweak DNS caching by setting -Dnetworkaddress.cache.ttl=0 or -Dsun.net.inetaddr.ttl=0, see also http://docs.oracle.com/javase/6/docs/technotes/guides/net/properties.html#nct for details.

Update: Using system properties works only if not using a security manager. If using a security manager following properties need to be set before the first TCP connection is initiated (otherwise they are ignored):

static {
        java.security.Security.setProperty("networkaddress.cache.ttl", "60");
        java.security.Security.setProperty("networkaddress.cache.negative.ttl", "10");
}

This means not caching the SocketAddress could help to meet your needs.

mp911de · 2015-10-14T20:07:05Z

@pulse00 Could you give it a try by subclassing DisqueURI.DisqueHostAndPort and overriding the method getResolvedAddress() to always return a new InetSocketAddress together with the network cache settings property? Maybe also a variant of https://github.com/mp911de/spinach/blob/hello-cluster/src/main/java/biz/paluch/spinach/impl/SocketAddressSupplierFactory.java#L41 could help to leverage the DNS issue (dynamic topology updates to be done).

mp911de · 2015-10-22T21:18:38Z

@pulse00 Perhaps https://github.com/mp911de/spinach/wiki/SocketAddress-Supplier-API can help you. Just pushed the code which allows to update the cluster view, no more relying on DNS once the initial connect is done.

pulse00 · 2015-10-23T07:41:31Z

cool, thanks. we'll give that a try and let you know if it solved the issue.

mp911de · 2015-10-25T19:37:25Z

To cope with the Java DNS resolution settings, I would change to not cache the InetSocketAddress within the client. The RedisURI.getResolvedAddress method can still be overridden to implement a custom lookup if needed.

The SocketAddress obtained by getResolvedAddress is no longer cached. This is to allow dynamic DNS updates at the time of establishing a connection. This change introduces the possibility to change the Redis connection point details because the connection point details are obtained directly from the RedisURI argument.

The SocketAddress obtained by getResolvedAddress is no longer cached. This is to allow dynamic DNS updates at the time of establishing a connection. This change introduces the possibility to change the Disque connection point details because the connection point details are obtained directly from the DisqueURI argument. Reference: mp911de/lettuce#144

mp911de · 2015-11-30T20:00:35Z

@pulse00 any update on this topic from your side?

pulse00 · 2015-12-03T08:26:49Z

@mp911de sorry for the delay - we'll be testing failover scenarios in our infrastructure the next couple of days, i hope i can give you some feedback then.

mp911de · 2015-12-03T08:27:20Z

np, thx for the update.

mzapletal · 2016-01-15T15:21:09Z

Hi Marc, we are taking up this issue again. Since spring-data-redis still relies on lettuce 3.3 we cannot switch to the latest release of lettuce 4.0 including your change. Hence we sublassed RedisURI as you have suggested. Since EpollProvider.newSocketAddress(getSocket()) has package level access we put the subclassed RedisURI in a com.lambdaworks.redis package :)

I'll keep you posted as soon as we have conducted our outage test.

mp911de · 2016-01-15T15:57:54Z

The fix is (will be) part of the 3.4 release (see 7a7cb47). Give Spring Data Redis with the 3.4-SNAPSHOT build a try.

mzapletal · 2016-01-15T16:27:33Z

Thanks Marc, we will do that

oklahomer · 2016-02-19T06:03:38Z

I was thinking to employ 3.4.1, but just noticed this "Lettuce 3.4 " labeled issue was still open. I think this issue is already fixed, merged and already a part of 3.4.1.Final.
Is it right?

mp911de · 2016-02-19T06:51:36Z

Sorry for the confusion. I just didn't clean it up properly. The issue is fixed with 3.4 and the fix is also part of 3.4.1. There are however two bugs (#194, #196) that are quite ugly so maybe you wait until 3.4.2 which I planned to release for next week.

oklahomer · 2016-02-19T07:19:40Z

Thanks for the updates, and I'm looking forward to having 3.4.2 😄

mp911de added the type: feature A new feature label Oct 2, 2015

mp911de added the in progress label Oct 25, 2015

mp911de mentioned this issue Oct 25, 2015

Do not cache InetSocketAddress in DisqueURI mp911de/spinach#16

Closed

mp911de added this to the Lettuce 3.4 milestone Jan 21, 2016

mp911de closed this as completed Feb 19, 2016

mp911de removed the in progress label Feb 19, 2016

johnou mentioned this issue Dec 28, 2017

DNSMonitor doesn't work with ClusterServersConfig redisson/redisson#1213

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client cannot handle DNS updates #144

Client cannot handle DNS updates #144

pulse00 commented Oct 2, 2015

mp911de commented Oct 2, 2015

mp911de commented Oct 9, 2015

mp911de commented Oct 14, 2015

mp911de commented Oct 22, 2015

pulse00 commented Oct 23, 2015

mp911de commented Oct 25, 2015

mp911de commented Nov 30, 2015

pulse00 commented Dec 3, 2015

mp911de commented Dec 3, 2015

mzapletal commented Jan 15, 2016

mp911de commented Jan 15, 2016

mzapletal commented Jan 15, 2016

oklahomer commented Feb 19, 2016

mp911de commented Feb 19, 2016

oklahomer commented Feb 19, 2016

Client cannot handle DNS updates #144

Client cannot handle DNS updates #144

Comments

pulse00 commented Oct 2, 2015

mp911de commented Oct 2, 2015

mp911de commented Oct 9, 2015

mp911de commented Oct 14, 2015

mp911de commented Oct 22, 2015

pulse00 commented Oct 23, 2015

mp911de commented Oct 25, 2015

mp911de commented Nov 30, 2015

pulse00 commented Dec 3, 2015

mp911de commented Dec 3, 2015

mzapletal commented Jan 15, 2016

mp911de commented Jan 15, 2016

mzapletal commented Jan 15, 2016

oklahomer commented Feb 19, 2016

mp911de commented Feb 19, 2016

oklahomer commented Feb 19, 2016