Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the scheduling logic of the local engine #185

Merged
merged 5 commits into from
Apr 29, 2024

Conversation

dthulke
Copy link
Member

@dthulke dthulke commented Apr 19, 2024

In the current implementation the LocalEngine always tries to schedule the first runnable task in its input queue (fifo). In cases where this jobs requires resources that are currently not available (e.g. a GPU), the scheduler waits till the resources are available even if there are smaller jobs that could be scheduled now.

As a very simple fix, with this change, if a task cannot be scheduled it is moved to the end of the queue and the next task is checked.

@dthulke dthulke changed the title Improves the scheduling logic of the local engine Improving the scheduling logic of the local engine Apr 19, 2024
@critias
Copy link
Contributor

critias commented Apr 19, 2024

Good point! It bothers me a little that this now changes the order of the queue if it finds a job which needs less resources. It would be nice to keep the FIFO property.

You could just pull everything out of the Queue and store it in a normal list. Since the LocalEngine has only one thread checking for new tasks it shouldn't be a problem that a normal list isn't thread safe like the Queue. You can than simply iterate over the list and remove submitted jobs from it.

@dthulke dthulke force-pushed the localengine-scheduling branch from ef5e080 to eb77402 Compare April 21, 2024 12:08
@dthulke dthulke force-pushed the localengine-scheduling branch from eb77402 to 43470da Compare April 21, 2024 12:09
@dthulke
Copy link
Member Author

dthulke commented Apr 21, 2024

Good point, changed it to a normal list (wrapped in sync_object to be safe).

Copy link
Contributor

@critias critias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing the PR. I think it looks good like that, but why do you are you using total_runnable_tasks instead of just using len(runnable_tasks) directly?


# run next task if the capacities are available
if next_task is not None:
while runnable_task_idx < total_runnable_tasks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use len(runnable_tasks) directly?

Copy link
Member Author

@dthulke dthulke Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question :D now I also don't see a reason to introduce the extra variable.

@dthulke dthulke requested a review from critias April 24, 2024 16:20
Copy link
Contributor

@critias critias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, I like the this version much better :)

@dthulke dthulke merged commit 0da764a into rwth-i6:master Apr 29, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants