proper graceful shutdown settings #381

guzzijones · 2023-12-01T17:09:18Z

We have all the proper graceful shutdown settings for actionrunner and workflow, but we are still seeing action executions get stuck in a running state. At a mininum they should be abandoned per actionrunner code.

I do notice that we are performing a query
coordinator.get_members(service.encode("utf-8")).get()

and then adding to a counter to determine if we are past the expiration.
What may be happening is that if the action_runner.graceful_shutdown config is set to say 100 seconds over 100 seconds need to actually pass before the logic to abandon executions is called

This would mean that we could set the terminationGracePeriodSeconds to say 300 seconds longer (or some long time) than the action_runner.graceful_shutdown seconds to ensure that the pod is alive long enough to let the abandon process finish.

I have made this change so we can monitor our prod cluster.

A code fix would be to use the action_runner.graceful_shutdown config to calculate and end date time and then check against that in the while loop

The text was updated successfully, but these errors were encountered:

guzzijones · 2023-12-04T18:30:34Z

After making this change for a shorter action_runner.graceful_shutdown we still see the actions getting stuck in a running state.

guzzijones · 2023-12-05T00:34:23Z

Let this be a lesson to everyone. Do NOT put inline comments in your config file:

...
 return opt.type(value)","host":"stackstorm","severity":"info","facility":"user",
{"timestamp":"2023-12-04T18:23:26+00:00","message":" 2023-12-04T18:23:19.7932543Z stdout F   File \"/opt/stackstorm/st2/lib/python3.8/site-packages/oslo_config/types.py\", line 145, in __call__","host":"st
{"timestamp":"2023-12-04T18:23:26+00:00","message":" 2023-12-04T18:23:19.79325789Z stdout F     value = int(value)","host":"stackstorm","severity":"info","facility":"user","sysl
{"timestamp":"2023-12-04T18:23:26+00:00","message":" 2023-12-04T18:23:19.793259422Z stdout F ValueError: invalid literal for int() with base 10: '610 # 10 mins'","host":"stackstorm
...

guzzijones · 2023-12-05T00:56:51Z

Never comment anything ever. EVER.

guzzijones · 2023-12-05T18:37:15Z

Whelp it turns out I STILL am not seeing graceful shutdowns. All executions immediately get abandoned.
I did some more digging.
Here we check the st2actionrunner service if it has any members.
But in the config for st2 coordination | service_registry defaults to FALSE.
So set service_registry = True in the coorination settings in the config.
We should probably make this the default?

guzzijones · 2023-12-08T21:43:46Z

I can confirm turning on the service registry fixed my graceful shutdown.

cognifloyd · 2024-04-11T20:50:05Z

But in the config for st2 coordination | service_registry defaults to FALSE. So set service_registry = True in the coorination settings in the config. We should probably make this the default?

This looks like the right place to add anything in the chart:

stackstorm-k8s/templates/configmaps_st2-conf.yaml

Lines 18 to 21 in cc94bd0

    
               {{- if index .Values "redis" "enabled" }} 
        
               [coordination] 
        
               url = redis://{{ template "stackstorm-ha.redis-password" $ }}{{ template "stackstorm-ha.redis-nodes" $ }} 
        
               {{- end }}

We could do this:

     {{- if index .Values "redis" "enabled" }}
     [coordination]
+    service_registry = True
     url = redis://{{ template "stackstorm-ha.redis-password" $ }}{{ template "stackstorm-ha.redis-nodes" $ }}
     {{- end }}

I do not use the redis subchart, so that would not change the default for me. I pass this in via st2.config in values. So, we could also do something like:

+    [coordination]
+    service_registry = True
     {{- if index .Values "redis" "enabled" }}
-    [coordination]
     url = redis://{{ template "stackstorm-ha.redis-password" $ }}{{ template "stackstorm-ha.redis-nodes" $ }}
     {{- end }}

Which option would you prefer? Or something else entirely?

ajjonen · 2024-09-06T19:21:32Z

second option is probably best

guzzijones closed this as completed Dec 5, 2023

guzzijones reopened this Dec 5, 2023

guzzijones mentioned this issue Dec 6, 2023

Death of st2actionrunner process causes action to remain running forever StackStorm/st2#4716

Closed

cognifloyd added feature K8s st2 labels Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proper graceful shutdown settings #381

proper graceful shutdown settings #381

guzzijones commented Dec 1, 2023

guzzijones commented Dec 4, 2023

guzzijones commented Dec 5, 2023 •

edited

Loading

guzzijones commented Dec 5, 2023

guzzijones commented Dec 5, 2023 •

edited

Loading

guzzijones commented Dec 8, 2023

cognifloyd commented Apr 11, 2024 •

edited

Loading

ajjonen commented Sep 6, 2024

proper graceful shutdown settings #381

proper graceful shutdown settings #381

Comments

guzzijones commented Dec 1, 2023

guzzijones commented Dec 4, 2023

guzzijones commented Dec 5, 2023 • edited Loading

guzzijones commented Dec 5, 2023

guzzijones commented Dec 5, 2023 • edited Loading

guzzijones commented Dec 8, 2023

cognifloyd commented Apr 11, 2024 • edited Loading

ajjonen commented Sep 6, 2024

guzzijones commented Dec 5, 2023 •

edited

Loading

guzzijones commented Dec 5, 2023 •

edited

Loading

cognifloyd commented Apr 11, 2024 •

edited

Loading