Trino pods goes down instantly while autoscale factor causes pods to terminate even if terminationGracePeriodseconds
is set to 300 seconds
#23775
Replies: 9 comments
-
Is this about the Trino Helm chart? If yes, can you include the values to reproduce this? |
Beta Was this translation helpful? Give feedback.
-
it is about Trino Helm Chart. Attaching deployment config and values file for reproducing the issue. |
Beta Was this translation helpful? Give feedback.
-
Which chart version you're using? How do you apply the changes you included in In the latest chart version, you have to set |
Beta Was this translation helpful? Give feedback.
-
we are using helm chart version: |
Beta Was this translation helpful? Give feedback.
-
That's very old. I don't know how the chart was structured back then, and I can't help anymore. Can you try using the latest version? |
Beta Was this translation helpful? Give feedback.
-
we have upgraded the helm chart to 0.25.0, and the |
Beta Was this translation helpful? Give feedback.
-
I checked that the default Trino Docker image entrypoint doesn't handle signals sent to the container in any special way. The Trino server also doesn't do this. To handle graceful shutdown, you have to configure the pod's lifecycle in the |
Beta Was this translation helpful? Give feedback.
-
HI, we have set lifecycle prestop hook and
also we have set
we still see the worker pods getting terminated abruptly without being in terminating state for 300s which is causing queries to fail. is there anything else that needs to be set to make sure pods have graceful shutdown. |
Beta Was this translation helpful? Give feedback.
-
if any of the tasks take longer than the termination grace period then queries are going to fail. See docs at https://trino.io/docs/current/admin/graceful-shutdown.html which explain how graceful shutdown works. The grace period hence needs to be at-least as long as the longest tasks (for simplicity assume queries) that execute on your cluster. |
Beta Was this translation helpful? Give feedback.
-
we have set
terminationGracePeriodSeconds
to 300s in trino coordinator and worker nodes. during autoscaling when the number of worker pods increase and decrease, pods terminate instantly without waiting for the queries in the pod to terminate.we have set
shutdown.grace-period=300s
in trino cooridnator and worker also.Expectation is the trino worker pods must wait for 300sec untill tasks in the worker complete instead of terminating instantly.
we have set
starburstWorkerShutdownGracePeriodSeconds: 300
which corresponds toshutdown.grace-period=300s
anddeploymentTerminationGracePeriodSeconds: 300
which corresponds toterminationGracePeriodSeconds
in starburst and the worker pods terminate after 300sec waiting for query tasks to run to completion as expected.Beta Was this translation helpful? Give feedback.
All reactions