This repository has been archived by the owner on Feb 1, 2021. It is now read-only.
WIP: Remove the lock at the scheduler level #1212
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fixes #1194 #786
This PR removes the
Lock
at thescheduler
level and moves it tonode.Node
for better granularity.cluster
now maintains a list of Nodes with resource accounting protected by aLock
which avoids races for resource reservation. Processes not lucky enough to get a resource slice on a machine that gets full in the meantime are retrying the process of scheduling unless they get the resources they need (it fails if there is no more resource available cluster-wide).This is I think the most simple approach to remove the
lock
at the scheduler level for now and avoid the races. Note that this does not solve the naming issue (we should settle if we allow duplicate names and solve the ambiguity through the prefix which is the node name or if each container must absolutely have a unique name).Downside: The blocked
pull
problem still persists so some resources are going to be locked up for a while if this happens because we have no way to know if this is due to the issue or just a very slowpull
.TODO items:
Signed-off-by: Alexandre Beslic [email protected]