Trino pods goes down instantly while autoscale factor causes pods to terminate even if `terminationGracePeriodseconds` is set to 300 seconds #23775

hsushmitha · 2024-06-24T09:45:26Z

hsushmitha
Jun 24, 2024

we have set terminationGracePeriodSeconds to 300s in trino coordinator and worker nodes. during autoscaling when the number of worker pods increase and decrease, pods terminate instantly without waiting for the queries in the pod to terminate.
we have set shutdown.grace-period=300s in trino cooridnator and worker also.
Expectation is the trino worker pods must wait for 300sec untill tasks in the worker complete instead of terminating instantly.

we have set starburstWorkerShutdownGracePeriodSeconds: 300 which corresponds to shutdown.grace-period=300s and deploymentTerminationGracePeriodSeconds: 300 which corresponds to terminationGracePeriodSeconds in starburst and the worker pods terminate after 300sec waiting for query tasks to run to completion as expected.

nineinchnick · 2024-06-24T09:55:23Z

nineinchnick
Jun 24, 2024
Collaborator

Is this about the Trino Helm chart? If yes, can you include the values to reproduce this?

0 replies

hsushmitha · 2024-06-24T12:58:58Z

hsushmitha
Jun 24, 2024
Author

it is about Trino Helm Chart. Attaching deployment config and values file for reproducing the issue.

values.txt
deployment-coordinator.txt
deployment-worker.txt

0 replies

nineinchnick · 2024-06-24T13:24:31Z

nineinchnick
Jun 24, 2024
Collaborator

Which chart version you're using? How do you apply the changes you included in deployment-*.txt files?

In the latest chart version, you have to set coordinator.terminationGracePeriodSeconds and worker.terminationGracePeriodSeconds. See https://trinodb.github.io/charts/charts/trino/

0 replies

hsushmitha · 2024-06-25T07:08:31Z

hsushmitha
Jun 25, 2024
Author

we are using helm chart version: trino-0.8.0 we do helm upgrade trino . -f values.yaml -n trino and deploy the changes. the above attached files are yaml files.. since we couldn't attach yaml files we attached txt file version.

0 replies

nineinchnick · 2024-06-25T08:01:41Z

nineinchnick
Jun 25, 2024
Collaborator

That's very old. I don't know how the chart was structured back then, and I can't help anymore. Can you try using the latest version?

0 replies

hsushmitha · 2024-07-19T11:30:10Z

hsushmitha
Jul 19, 2024
Author

we have upgraded the helm chart to 0.25.0, and the terminationGracePeriodSeconds is set to 300s. but still the trino pods are terminating instantly without being in terminating state for 300s.

0 replies

nineinchnick · 2024-07-19T11:50:52Z

nineinchnick
Jul 19, 2024
Collaborator

I checked that the default Trino Docker image entrypoint doesn't handle signals sent to the container in any special way. The Trino server also doesn't do this. To handle graceful shutdown, you have to configure the pod's lifecycle in the worker.lifecycle section. See the Helm chart docs for an example.

0 replies

hsushmitha · 2024-09-24T11:46:38Z

hsushmitha
Sep 24, 2024
Author

HI, we have set lifecycle prestop hook and terminationGracePeriodSeconds in values.yaml

  lifecycle:
  # worker.lifecycle -- To enable [graceful
  # shutdown](https://trino.io/docs/current/admin/graceful-shutdown.html),
  # define a lifecycle preStop like bellow, Set the
  # `terminationGracePeriodSeconds` to a value greater than or equal to the
  # configured `shutdown.grace-period`. Configure `shutdown.grace-period` in
  # `additionalConfigProperties` as `shutdown.grace-period=2m` (default is 2
  # minutes). Also configure `accessControl` because the `default` system
  # access control does not allow graceful shutdowns.
  # @raw
  # Example:
  # ```yaml
    preStop:
      exec:
        command: ["/bin/sh", "-c", "curl -v -X PUT -d '\"SHUTTING_DOWN\"' -H \"Content-type: application/json\" -H \"X-Trino-User: trino\" http://localhost:8080/v1/info/state"]
  # ```

  terminationGracePeriodSeconds: 300

also we have set shutdown.grace-period in additionalWorkerConfigProperties

additionalWorkerConfigProperties:
  - shutdown.grace-period=300s

we still see the worker pods getting terminated abruptly without being in terminating state for 300s which is causing queries to fail. is there anything else that needs to be set to make sure pods have graceful shutdown.

0 replies

hashhar · 2024-10-14T14:51:40Z

hashhar
Oct 14, 2024
Collaborator

if any of the tasks take longer than the termination grace period then queries are going to fail.

See docs at https://trino.io/docs/current/admin/graceful-shutdown.html which explain how graceful shutdown works.

The grace period hence needs to be at-least as long as the longest tasks (for simplicity assume queries) that execute on your cluster.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trino pods goes down instantly while autoscale factor causes pods to terminate even if `terminationGracePeriodseconds` is set to 300 seconds #23775

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Trino pods goes down instantly while autoscale factor causes pods to terminate even if terminationGracePeriodseconds is set to 300 seconds #23775

hsushmitha Jun 24, 2024

Replies: 9 comments

nineinchnick Jun 24, 2024 Collaborator

hsushmitha Jun 24, 2024 Author

nineinchnick Jun 24, 2024 Collaborator

hsushmitha Jun 25, 2024 Author

nineinchnick Jun 25, 2024 Collaborator

hsushmitha Jul 19, 2024 Author

nineinchnick Jul 19, 2024 Collaborator

hsushmitha Sep 24, 2024 Author

hashhar Oct 14, 2024 Collaborator

Trino pods goes down instantly while autoscale factor causes pods to terminate even if `terminationGracePeriodseconds` is set to 300 seconds #23775

hsushmitha
Jun 24, 2024

nineinchnick
Jun 24, 2024
Collaborator

hsushmitha
Jun 24, 2024
Author

nineinchnick
Jun 24, 2024
Collaborator

hsushmitha
Jun 25, 2024
Author

nineinchnick
Jun 25, 2024
Collaborator

hsushmitha
Jul 19, 2024
Author

nineinchnick
Jul 19, 2024
Collaborator

hsushmitha
Sep 24, 2024
Author

hashhar
Oct 14, 2024
Collaborator