Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][Scale] Large number of launchplans yielded (8k+) in dynamic workflows cause memory bloat #1660

Closed
2 tasks done
kumare3 opened this issue Oct 14, 2021 · 0 comments · Fixed by flyteorg/flytekit#705
Closed
2 tasks done
Labels
enhancement New feature or request flytekit FlyteKit Python related issue scale Scale, Reliability and Performance of the platform
Milestone

Comments

@kumare3
Copy link
Contributor

kumare3 commented Oct 14, 2021

Motivation: Why do you think this is important?

Consider an example of this type

@dynamic
def dynamic_fan_out_task(input_integers: List[int]) -> None:
    for input_integer in input_integers:
        complex_single_integer_subworkflow(some_integer=input_integer)

for a large number of input_integers - in our example 8k. In this case, the node-ids created by flytekit are extremely verbose and for 8k nodes cause a total overhead of more than 1MB+. On the other hand the name of the node itself, which is a derivative of the function is also extremely large and causes a bloat of 0.6MB.

The reason why the node-ids cause a large bloat, is because they are repeated in

  1. Upstream, downstream connections (start-node, individual nodes and end-node)
  2. They are repeated per node-id
  3. They are repeated in output bindings

In the above example the node-id is of the type dynamic-launch-lps-n7999, when it could simply be n7999 or dn7999

Moreover, the name is about 60 characters long as it is fully qualified, which is unnecessary.

** Note: This example should probably use a map task, but for debugging this is a good enough usecase**

Goal: What should the final outcome look like, ideally?

compiled closures should be small, this drastically affects the performance

Describe alternatives you've considered

NA

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@kumare3 kumare3 added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers and removed untriaged This issues has not yet been looked at by the Maintainers labels Oct 14, 2021
@kumare3 kumare3 added this to the 0.18.1 milestone Oct 14, 2021
@kumare3 kumare3 added flytekit FlyteKit Python related issue scale Scale, Reliability and Performance of the platform labels Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request flytekit FlyteKit Python related issue scale Scale, Reliability and Performance of the platform
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant