StaleDNS Plugin Feedback #277
Replies: 5 comments 6 replies
-
Any chance you can share your test code ? |
Beta Was this translation helpful? Give feedback.
-
Yes. That's very simple. (I'm pointing to version "1.1.0" of the wrapper in pom.xml because that's the version with my modifications) |
Beta Was this translation helpful? Give feedback.
-
@marlongionazwift finally got around to looking at your code. Thanks! |
Beta Was this translation helpful? Give feedback.
-
so I spent some time looking at this. I added and it works as advertised. it's also important to
|
Beta Was this translation helpful? Give feedback.
-
Hi @marlongionazwift, thank you for raising these issues. We are currently working on a fix for the first issue you pointed out and investigating the second issue. For visibility, we don't believe the second issue is caused by the Stale DNS plugin. Thank you for your patience. |
Beta Was this translation helpful? Give feedback.
-
Hello!
I'm testing the "StaleDns" plugin and wanted to provide some feedback. The test I'm doing is: I'm running 10 threads inserting data into a table at every 1-2 seconds each, them I keep calling failovers in order "to see what happens". (I'm using Hikari)
I could see that some of the connections created after the failover was still pointing to the wrong instance. I suspected the problem was with the topology cache, so I changed the plugin code to "forceRefreshHostList" instead of "refreshHostList" and I could see that the number of errors significantly reduced. I believe that, because of the cache, there will always be a small time window after the failover in which the cached topology will be outdated. Connections created during this time window will have problems.
The number of errors reduced, but I could still see some connections pointing to the wrong instance sometimes. In this case I just changed the code to always return a connection pointing to the writer instance (i.e. never use the cluster endpoint). This solved the problem, but in this case I don't know to explain the cause.
With these 2 changes my tests then started running very well. 100% of the connections are working after every failover I call.
Regards!
Beta Was this translation helpful? Give feedback.
All reactions