Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't allow overlapping queues to be simultaneously started #4884

Open
garlick opened this issue Jan 23, 2023 · 6 comments
Open

don't allow overlapping queues to be simultaneously started #4884

garlick opened this issue Jan 23, 2023 · 6 comments

Comments

@garlick
Copy link
Member

garlick commented Jan 23, 2023

Problem: queues with overlapping resources should not be started (scheduled) concurrently due to flux-framework/flux-sched#939.

We do want to support overlapping queues so that we can have an "all" queue for DATs configured at the same time as the usual batch and debug queues, but not have "all" running at the same time as the queues it overlaps.

Perhaps we could check for this in the job manager somehow so flux queue start could fail with "queues with overlapping resources may not be simultaneously started", until the scheduler grows some smarts for this.

@grondo
Copy link
Contributor

grondo commented Jan 23, 2023

Seems like this approach assumes only the Fluxion scheduler is valid for use with queues. Not that that isn't true now, but it seems like that breaks separation of concerns. Seems like a valid solution would be to document the current problem?

@garlick
Copy link
Member Author

garlick commented Jan 24, 2023

Worries me a bit since it seems like it ought to work and will, sort of, then subtly misbehave.

What if we required overlapping queues to declare their relationship with the other queue(s) in the config? For example, if "all" overlaps "debug" and "batch", maybe the config could say

[queues.all]
overlap-queues = [ "batch", "debug" ]
overlap-mode = "exclusive"  # vs "shared"?
[queues.debug]
requires = [ "debug" ]
[queues.batch]
requires = [ "batch" ]

Then

  • the person creating the config (one hopes referencing the flux admin guide) can intentionally enable/disable exclusivity
  • the job manager can remain ignorant of all things R but still enforce scheduler exclusivity for overlapping queues
  • the resource module could detect unintentional queue overlap and make that fatal

@grondo
Copy link
Contributor

grondo commented Jan 24, 2023

Well my point is whether it works or not is dependent on the loaded scheduler, but the proposed logic is going in core code. This would mean we are de facto stating that queues must work this way. I.e., someone can't build a scheduler that works. That being said it isn't a big deal, I just wanted to bring up the point.

I like the idea of being able to compose queues from other queues, though! That is a pretty neat idea.

@grondo
Copy link
Contributor

grondo commented Jan 24, 2023

if the constraint logic supported or, then we could build queues from other queues by adding or to their constraints in requires. Then you could do something like:

[queues.all]
contains = [ "batch", "debug" ]
exclusive = true
[queues.debug]
requires = [ "debug" ]
[queues.batch]
requires = [ "batch" ]

Eh, now that I look at it. it doesn't really seem that useful..

Unless... if overlapping queues are internally kept in a hierarchy it would be easy to stop all "child" queues when enabling an exclusive parent, or disable a parent when enabling a "child".

Ok, I'm done hijacking your issue. Proceed.

@garlick
Copy link
Member Author

garlick commented Jan 24, 2023

Heh, I wasn't proposing that we compose queues from other queues. I didn't put any constraints in "all" because I wanted it to have no constraints and I just meant the config to say "all overlaps both batch and debug". But maybe you're onto something there... 😄

Yeah I got your point. What I meant was that we don't have to limit the general job manager behavior, but we could offer a way for admins to configure such a limitation. Then we could recommend they do that currently if using fluxion (e.g. in a warning box in the admin guide).

@grondo
Copy link
Contributor

grondo commented Jan 24, 2023

Yes, I was agreeing with you in the end 🙂 Sorry for the tangent!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants