-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group shutdown_delay #6746
Group shutdown_delay #6746
Conversation
50f59a1
to
718d716
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start!
598cf73
to
91d8faa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does deregistering a service out from underneath Consul Connect effect Envoy? I'm concerned it could detect the deregistration and immediately stop proxying which would defeat the purpose of shutdown_delay for Connect tasks.
We'd also still need to open an issue to cover the case where shutdown_delay is used without a service.
Alternatively we could propagate the group shutdown_delay
to each task and make a new delay prekill hook for tasks that runs after the service hook. I think that addresses all of the use cases in #6704 and allows users to set either a group delay that applies to all tasks or a per-task delay for fine grained control (eg kill the log shipper last).
It also avoids conflating shutdown delay hooks with services which is the source of the original issue: we coupled the logic instead of making them distinct hooks.
Don't forget docs for the new field.
ee1e180
to
e8ba20f
Compare
The way that this is currently implemented will delay shutdown regardless if there are registered group services or not, so it should cover the case linked in #6704. I'll look into how it affects envoy. It's implemented the same way that individual task shutdown_delays deregister from consul |
@drewbailey @nickethier could we make |
e8ba20f
to
2381202
Compare
@schmichael I'm confused, I thought the purpose of the |
-- @drewbailey Right, sorry I wasn't clear. I meant to say this implementation doesn't fix the bug where a task's shutdown_delay is ignored if the task lacks a service. We need to file a new issue/PR for that if this PR closes the original issue.
-- @djenriquez Yes, but we need to ensure Envoy won't immediately stop proxying inflight tasks. I've never tested what the behavior of deregistering a service is on requests inflight in Envoy.
-- @jippi As this is currently implemented I think that would cause a double delay: first at the group level, then each individual task's level. I also don't think we propagate parameters "up" stanzas, so this seems like a potentially surprising behavior. Which brings up a good point: as implemented the group and task shutdown delays are completely distinct. When an allocation is stopping for whatever reason, first the group delay applies, then the tasks delays, then tasks are killed. That may be what users want, but we need to document it carefully as usually settings with the same name at different levels simply propagate down from job -> group -> task. |
b70e5ec
to
e055b40
Compare
Any updates with this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just a couple comments but nothing I need to re-review. Remember the changelog entry (can be a followup PR).
@@ -4736,6 +4740,8 @@ type TaskGroup struct { | |||
|
|||
// Volumes is a map of volumes that have been requested by the task group. | |||
Volumes map[string]*VolumeRequest | |||
|
|||
ShutdownDelay *time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be a pointer as there's no difference between nil
and 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needed to be a pointer so that a job diff knew if it was set or not, otherwise I was getting
Failed
=== RUN TestJobDiff
--- FAIL: TestJobDiff (0.00s)
diff_test.go:1196: case 19: got:
Job "" (Edited):
Group "bam" (Added):
"Count" (Added): "" => "1"
"ShutdownDelay" (Added): "" => "0"
4571038
to
37a6373
Compare
copy struct values ensure groupserviceHook implements RunnerPreKillhook run deregister first test that shutdown times are delayed move magic number into variable
more explicit test case, remove select statement
update docs, address pr comments ensure pointer is not nil use pointer for diff tests, set vs unset
37a6373
to
22d521c
Compare
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Adds a shutdown delay option to the group stanza to allow time between de-registering a consul service and a task group being shutting down.