Wrapper with Connection Pooling #256
-
Hello! I'm testing the wrapper with Hikari, and what I'm seeing is that the wrapper is working as expected with the connection I'm using in the moment the failover happens, but all the other connections in the pool will remain pointing to the "cluster endpoint" and will have to wait for DNS to update in order to work. Is this really what we should expect or am I probably doing something wrong ? Just wanted to confirm because, if that's true, we won't be able to really achieve a "fast failover" when using connection pooling with the wrapper. Regards, |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 2 replies
-
@marlongionazwift Thanks for the report. I don't think you are doing anything wrong. I'm trying to figure out how to deal with this scenario though. We almost need to track which connections Hikari has open in the driver and either 1) invalidate all of them or 2) figure out how to fail them all over. Thinking this some more we would need to invalidate all of them as the session would need to be reset. Interesting! |
Beta Was this translation helpful? Give feedback.
-
Hello, @davecramer, thanks for your response! Regarding your first point: I was able to configure a validation query in Hikari to evict "read-only" connections, but that does not solve the problem yet, since the new connections will be opened pointing to the cluster endpoint and we will continue depending on the DNS refresh. Regarding point 2: It seems that failing over all the current connections would solve the problem just partially, because new opened connections would still depend on DNS refresh 2 things I thought about:
Regards, |
Beta Was this translation helpful? Give feedback.
-
Hello @marlongionazwift, @davecramer I wanted to bring some ideas and I hope they might help us to choose a right direction. Usually, connection pools perform a connection check before returning it to a user application. I can assume that such a validity check may happen when the failover process at the DB cluster isn't yet over. I'd suspect that in such a case it can trigger the failover process in the driver and lead into, sooner or later, a valid connection. The other possible scenario is when such a validity check happens after the DB cluster failover is over. This case is a bit tricky because it's not clear whether a physical connection to a database node has survived or not. I'd expect that failover on the DB cluster will close all open connections since the DB node should be reconfigured with a new role and needs to be restarted. The validity check on a closed connection leads to connection eviction from a connection pool. The connection pool can move to another idle connection and eventually return a valid connection to a user app. If the physical connection is survived over DB failover (and I'm quite dubious about probability of such scenario), a user application may be getting a valid/healthy connection to the same node it was connected before DB failover. However, there's a high chance that a role of the node has changed and that may cause dramatic side effects. A quick summary of the cases I mentioned above:
It seems to me that cases Case All mentioned above is my understanding of how things work and they need a practical confirmation. As for opening new connections with a cluster endpoint, related to case About invalidating all affected connections in a pool. More investigation is needed to determine if this is a robust solution, however it depends on a particular connection pool and API it provides. I'd like to see a public method that accept a list of connections to evict from the pool, or a method to evict all connections that match some criteria like connection url. I'm not sure if any of such public API exists in popular connection pool implementations. |
Beta Was this translation helpful? Give feedback.
-
From what I can tell HikariCP does not do anything onBorrow() it does have a keepAlive setting where it will call isValid() periodically. |
Beta Was this translation helpful? Give feedback.
-
Hello, @davecramer and @sergiyvamz! I tested the "auroraStaleDns" and its working very well. With that I was able to recover pretty nicely (and fast) from a failover. I said that "other connections in the pool remain poiting to the cluster endpoint" but that's not really what is happening (since after the failover all connections are dead, like @sergiyvamz said), what happened is that new connections were created before DNS refresh, so they were pointing to the old writer. If I understood correctly, those new connections are being created because it seems that the "connection override class" for Hikari is not being called when executing the validation query, only when executing a normal query from the client application, making the broken connections to be evicted from the pool and making the pool create new connections (which will be pointing to the wrong instance). |
Beta Was this translation helpful? Give feedback.
@marlongionazwift Thanks for the report. I don't think you are doing anything wrong. I'm trying to figure out how to deal with this scenario though. We almost need to track which connections Hikari has open in the driver and either 1) invalidate all of them or 2) figure out how to fail them all over. Thinking this some more we would need to invalidate all of them as the session would need to be reset. Interesting!