-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node endpoints: do not create evals for sysbatch jobs #23858
Conversation
nomad/node_endpoint.go
Outdated
@@ -1695,6 +1695,12 @@ func (n *Node) createNodeEvals(node *structs.Node, nodeIndex uint64) ([]string, | |||
} | |||
jobIDs[alloc.JobNamespacedID()] = struct{}{} | |||
|
|||
// If it's a sysbatch job, skip it. Sysbatch job evals should only | |||
// ever be created by structs.EvalTriggerPeriodicJob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm having a hard time tracking down the relationship between sysbatch
and EvalTriggerPeriodicJob
. sysbatch jobs aren't necessarily periodic.
I think the following logic is appropriate: non-periodic sysbatch jobs are "one shot": whatever nodes are eligible when they're scheduled, are the nodes they run on. We don't guarantee that nodes that were down
when the sysbatch job was submitted ever receive the job if they come back and re-register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! So the comment needs some re-phrasing I think, but the logic still holds. If it's a system batch job, and it's periodic, it'll get triggered by periodic-job
and should not be triggered by anything else. If it's not periodic, job-register
is the appropriate trigger and again, I believe the system scheduler should not be called on node update events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the expansive comment!
node-update
triggers should never trigger sysbatch allocations, these should only ever be create byperiodic-job
orjob-register
.An example scenario is: an allocation spawned by a sysbatch periodic job is running on a node, the allocation gets stopped, GC runs, the node becomes ineligible and eligible again, all within the parent sysbatch job period window. If this happens,
node-update
will trigger the system scheduler and prematurely start an allocation. This is not a desired behavior, and in fact a bug.Ref: https://hashicorp.atlassian.net/browse/NET-9323