-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allocator: Less aggressive retry #2021
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2021 +/- ##
==========================================
+ Coverage 53.66% 53.71% +0.05%
==========================================
Files 109 109
Lines 18991 19008 +17
==========================================
+ Hits 10191 10210 +19
+ Misses 7578 7564 -14
- Partials 1222 1234 +12 Continue to review full report at Codecov.
|
manager/allocator/network.go
Outdated
@@ -401,12 +416,22 @@ func (a *Allocator) doNetworkAlloc(ctx context.Context, ev events.Event) { | |||
case state.EventCreateNode, state.EventUpdateNode, state.EventDeleteNode: | |||
a.doNodeAlloc(ctx, ev) | |||
case state.EventCreateTask, state.EventUpdateTask, state.EventDeleteTask: | |||
a.doTaskAlloc(ctx, ev) | |||
a.doTaskAlloc(ctx, ev, nc.pendingTasks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't doTaskAlloc(ctx,ev)
retrieve pendingTasks
on its own via ctx.nc.pendingTasks
?
manager/allocator/network.go
Outdated
func (a *Allocator) procUnallocatedTasksNetwork(ctx context.Context) { | ||
nc := a.netCtx | ||
allocatedTasks := make([]*api.Task, 0, len(nc.unallocatedTasks)) | ||
func (a *Allocator) procTasksNetwork(ctx context.Context, toAllocate map[string]*api.Task, quiet bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If working on the nc retrieved from the context is equivalent, would it make sense to write this method as
func (a *Allocator) procTasksNetwork(ctx context.Context, onRetryInterval bool) {
nc := a.netCtx
quiet := false
toAllocate := nc.pendingTasks
if onRetryInterval {
toAllocate = nc.unallocatedTasks
quiet = true
}
...
Logic looks good to me. |
6e78fc2
to
456c2ec
Compare
Updated, thanks |
manager/allocator/network.go
Outdated
allocatedTasks := make([]*api.Task, 0, len(nc.unallocatedTasks)) | ||
quiet := false | ||
toAllocate := nc.pendingTasks | ||
allocatedTasks := make([]*api.Task, 0, len(toAllocate)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should go below the if block, after which we know what toAllocate
points to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Instead of retrying unallocated tasks, services, and networks every time data changes in the store, limit these retries to every 5 minutes. When a repeated attempt to allocate one of these objects fails, log it at the debug log level, to reduce noise in the logs. Signed-off-by: Aaron Lehmann <[email protected]>
456c2ec
to
513d028
Compare
Looks good to me |
Do we want to handle the potentially impossible case in which we don't get a commit? e.g. we receive a commit (and turns out that it free'ed up an IP address), we're above the 5 minutes limit so we don't try, and no other commit comes after so we don't allocate the task. |
I think that's a very good point. I had considered this but didn't want to add too much complexity, especially because I think this should be backported. Do you think it's a good idea to add a timer that triggers after 5 minutes if no commits happen during that interval? |
I think it's such a rare case that we may not need to bother ... I guess it depends if the fix would be extremely tiny? Can this simply be another switch case with a time.After? |
Or a timer that we reset every time we receive a commit |
Or maybe we shouldn't bother :) This is going to be so rare that the code to handle this case this may be buggy and we'll never notice |
Yeah, let's not bother. I liked the suggestion of adding a |
LGTM |
Instead of retrying unallocated tasks, services, and networks every time data changes in the store, limit these retries to every 5 minutes.
When a repeated attempt to allocate one of these objects fails, log it at the debug log level, to reduce noise in the logs.
cc @alexmavr @yongtang @aboch