Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathfinder prevents the PostgreSQL container from shutting down gracefully. #323

Open
jmontleon opened this issue Mar 30, 2023 · 2 comments

Comments

@jmontleon
Copy link
Member

jmontleon commented Mar 30, 2023

If the postgres container needs to be shut down for any reason (could be a node being drained for maintenance, upgrade, etc. as an example) Pathfinder appears to prevent postgres from shutting down gracefully. We always run up to the grace period timeout on the container and postgres gets killed leading to risk of corruption or data loss.

After some investigation it's my understanding that the container runtime for Kubernetes/OpenShift sends SIGTERM to stop the process. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination

PostgreSQL will wait forever for all connections to terminate before shutting down when it receives SIGTERM
https://www.postgresql.org/docs/current/server-shutdown.html

We're not setting a grace period on the PostgreSQL container so we're getting the default terminationGracePeriodSeconds: 30.

Meanwhile the quarkus.datasource.jdbc.idle-removal-interval default is 5m.

I upped the grace period to about 120s and reduced the idle-removal-interval to 60 seconds using the environment variable and the DB has thus far stopped cleanly after about 80-90 seconds each time. I think we need to come up with a reasonable set of values for both of these with the grace period being a fair bit larger than the removal interval and I'm looking for some input for what would be acceptable on the JDBC side.

I'm also curious if we have any means to keep the use of any single connection in the pool fairly short as I think the grace period needs to exceed non-idle time + idle time with some reasonable duration to spare.

@jmontleon
Copy link
Member Author

Any other approach to allow cleanly shutting down the DB would also be welcome as well.

@PhilipCattanach
Copy link

I have asked @m-brophy to look into this. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants