Fixing ArrayNode integration with backoff controller #4640
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tracking issue
NA
Why are the changes needed?
ArrayNode
integrates into the core execution log of FlytePropeller. This means that, unlike the existingmaptask
implementation, it forwards through FlytePropeller's backoff controller when creating k8s resources. If the backoff controller detects FlytePropeller is starting creating too many resources it will throttle by setting the task phase toWaitingForResources
. Previously,ArrayNode
mapped this phase toRunning
and then immediately failed when the resource did not exist in the next evaluation round.What changes were proposed in this pull request?
This PR treats the task phase
WaitingForResources
asNotYetStarted
so that evaluation of the subNode integrates with FlytePropeller's backoff controller.How was this patch tested?
Local testing with varying resource requests and k8s resource quotas. Also testing on Union Cloud.
Setup process
Create a k8s resource quota, example:
And start an
ArrayNode
that will exceed this quota, example:with
pyflyte run --remote array-node.py wf --names='["foo","bar","baz"]'
Screenshots
NA
Check all the applicable boxes
Related PRs
NA
Docs link
NA