StaleDNS Plugin Feedback #277

marlongionazwift · 2022-11-28T14:09:31Z

marlongionazwift
Nov 28, 2022

Hello!

I'm testing the "StaleDns" plugin and wanted to provide some feedback. The test I'm doing is: I'm running 10 threads inserting data into a table at every 1-2 seconds each, them I keep calling failovers in order "to see what happens". (I'm using Hikari)

I could see that some of the connections created after the failover was still pointing to the wrong instance. I suspected the problem was with the topology cache, so I changed the plugin code to "forceRefreshHostList" instead of "refreshHostList" and I could see that the number of errors significantly reduced. I believe that, because of the cache, there will always be a small time window after the failover in which the cached topology will be outdated. Connections created during this time window will have problems.
The number of errors reduced, but I could still see some connections pointing to the wrong instance sometimes. In this case I just changed the code to always return a connection pointing to the writer instance (i.e. never use the cluster endpoint). This solved the problem, but in this case I don't know to explain the cause.

With these 2 changes my tests then started running very well. 100% of the connections are working after every failover I call.

Regards!

davecramer · 2022-11-28T16:10:10Z

davecramer
Nov 28, 2022
Maintainer

Any chance you can share your test code ?

0 replies

marlongionazwift · 2022-11-28T19:46:07Z

marlongionazwift
Nov 28, 2022
Author

Yes. That's very simple.

(I'm pointing to version "1.1.0" of the wrapper in pom.xml because that's the version with my modifications)

testing.zip

0 replies

davecramer · 2022-12-01T21:45:56Z

davecramer
Dec 1, 2022
Maintainer

@marlongionazwift finally got around to looking at your code. Thanks!
So while your solution works it may come with a performance hit.
I'm wondering if we don't want to invalidate the cache when the connection fails?

3 replies

marlongionazwift Dec 2, 2022
Author

Yes, the change I made was just to validate my assumption that it was a topology cache problem, but indeed it is a performance hit.

About invalidating the cache, maybe it could work. But there is a small time window after a failover in which the topology has no writer, right? It is important to check how the wrapper/cache would behave in this case. (thinking in the scenario where the cache is invalidated but it tries to populate when there's no writer yet.)

davecramer Dec 2, 2022
Maintainer

But there is a small time window after a failover in which the topology has no writer, right?

Currently Aurora MySQL keeps the readers active. I don't believe Aurora Postgres does. Although as I understand it that is on the roadmap.

Do you have evidence to the contrary ?

marlongionazwift Dec 2, 2022
Author

No, that's a scenario I didn't tested. It was just me thinking:

a) The current topology is cached.
b) A failover happened. (at this moment we have the wrong topology cached in hands)
c) We tried to update the topology, but there's no writer at this moment.
d) A new connection was requested.

I think it's important to check how the wrapper will behave in c) and d) steps.

davecramer · 2022-12-02T16:48:07Z

davecramer
Dec 2, 2022
Maintainer

so I spent some time looking at this.

I added
ds.setExceptionOverrideClassName(HikariCPSQLException.class.getName());
and
targetDataSourceProps.setProperty("wrapperPlugins", "failover,auroraHostList,auroraStaleDns");

and it works as advertised.

it's also important to

} catch (FailoverSuccessSQLException e) {
     // reconfigure connection as necessary
}

3 replies

marlongionazwift Dec 2, 2022
Author

Yes, my idea was to test with the staleDns plugin only. I believe the failover plugin somehow forces the refresh of the topology, right? And also most part of the connections that would be created using the "staleDns" plugin will actually be just replaced by the failover plugin.

But, anyway, don't you think "staleDns" plugin should be able to work more independently ?

davecramer Dec 2, 2022
Maintainer

@sergiyvamz ??

hsuamz Dec 7, 2022
Maintainer

Thanks for adding additional detail @marlongionazwift and @davecramer. @sergiyvamz is out sick, so a response will be delayed. We have someone on the team taking a look and will continue this conversation soon.

karenc-bq · 2022-12-13T02:29:11Z

karenc-bq
Dec 13, 2022
Maintainer

Hi @marlongionazwift, thank you for raising these issues.

We are currently working on a fix for the first issue you pointed out and investigating the second issue. For visibility, we don't believe the second issue is caused by the Stale DNS plugin.

Thank you for your patience.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StaleDNS Plugin Feedback #277

{{title}}

Replies: 5 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

StaleDNS Plugin Feedback #277

marlongionazwift Nov 28, 2022

Replies: 5 comments · 6 replies

davecramer Nov 28, 2022 Maintainer

marlongionazwift Nov 28, 2022 Author

davecramer Dec 1, 2022 Maintainer

marlongionazwift Dec 2, 2022 Author

davecramer Dec 2, 2022 Maintainer

marlongionazwift Dec 2, 2022 Author

davecramer Dec 2, 2022 Maintainer

marlongionazwift Dec 2, 2022 Author

davecramer Dec 2, 2022 Maintainer

hsuamz Dec 7, 2022 Maintainer

karenc-bq Dec 13, 2022 Maintainer

marlongionazwift
Nov 28, 2022

Replies: 5 comments 6 replies

davecramer
Nov 28, 2022
Maintainer

marlongionazwift
Nov 28, 2022
Author

davecramer
Dec 1, 2022
Maintainer

marlongionazwift Dec 2, 2022
Author

davecramer Dec 2, 2022
Maintainer

marlongionazwift Dec 2, 2022
Author

davecramer
Dec 2, 2022
Maintainer

marlongionazwift Dec 2, 2022
Author

davecramer Dec 2, 2022
Maintainer

hsuamz Dec 7, 2022
Maintainer

karenc-bq
Dec 13, 2022
Maintainer