Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node endpoints: do not create evals for sysbatch jobs #23858

Merged
merged 3 commits into from
Aug 27, 2024

Conversation

pkazmierczak
Copy link
Contributor

@pkazmierczak pkazmierczak commented Aug 22, 2024

node-update triggers should never trigger sysbatch allocations, these should only ever be create by periodic-job or job-register.

An example scenario is: an allocation spawned by a sysbatch periodic job is running on a node, the allocation gets stopped, GC runs, the node becomes ineligible and eligible again, all within the parent sysbatch job period window. If this happens, node-update will trigger the system scheduler and prematurely start an allocation. This is not a desired behavior, and in fact a bug.

Ref: https://hashicorp.atlassian.net/browse/NET-9323

@pkazmierczak pkazmierczak self-assigned this Aug 22, 2024
@pkazmierczak pkazmierczak added this to the 1.8.4 milestone Aug 22, 2024
@pkazmierczak pkazmierczak added backport/ent/1.6.x+ent Changes are backported to 1.6.x+ent backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.8.x backport to 1.8.x release line labels Aug 22, 2024
@@ -1695,6 +1695,12 @@ func (n *Node) createNodeEvals(node *structs.Node, nodeIndex uint64) ([]string,
}
jobIDs[alloc.JobNamespacedID()] = struct{}{}

// If it's a sysbatch job, skip it. Sysbatch job evals should only
// ever be created by structs.EvalTriggerPeriodicJob
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm having a hard time tracking down the relationship between sysbatch and EvalTriggerPeriodicJob. sysbatch jobs aren't necessarily periodic.

I think the following logic is appropriate: non-periodic sysbatch jobs are "one shot": whatever nodes are eligible when they're scheduled, are the nodes they run on. We don't guarantee that nodes that were down when the sysbatch job was submitted ever receive the job if they come back and re-register.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! So the comment needs some re-phrasing I think, but the logic still holds. If it's a system batch job, and it's periodic, it'll get triggered by periodic-job and should not be triggered by anything else. If it's not periodic, job-register is the appropriate trigger and again, I believe the system scheduler should not be called on node update events.

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the expansive comment!

@pkazmierczak pkazmierczak merged commit 82f0f00 into main Aug 27, 2024
19 checks passed
@pkazmierczak pkazmierczak deleted the b-sysbatch-node-update-evals branch August 27, 2024 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.6.x+ent Changes are backported to 1.6.x+ent backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.8.x backport to 1.8.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants