Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Allow pod spec in Ray plugin #4674

Open
2 tasks done
vkaiser-mb opened this issue Jan 4, 2024 · 2 comments
Open
2 tasks done

[Core feature] Allow pod spec in Ray plugin #4674

vkaiser-mb opened this issue Jan 4, 2024 · 2 comments
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request

Comments

@vkaiser-mb
Copy link

Motivation: Why do you think this is important?

The Flyte Ray plugin seems not to offer to same parameters as RayJob ATM. For us, its necessary to set the configuration for the pod (e.g. which nodepool or volume mount).
Looking at the ray documentation, there is an option to define template like this:
https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html

Looking at the plugin code it seems that it takes the pod template for head and worker from the pod template defined on by the "task".

Goal: What should the final outcome look like, ideally?

You should be able to specify your resources for every group (head and multiple worker groups) the same way you do for a regular flyte task.

ray_config = RayJobConfig(
    head_node_config=HeadNodeConfig(
        requests=Resources(mem="64Gi", cpu="4"),
        limits=Resources(mem="64Gi", cpu="4")
        pod_template_name = "ray_head_nodeÄ
    ),
    worker_node_config=[
        WorkerNodeConfig(
            group_name="V100-group",
            replicas=4,
            requests=Resources(mem="256Gi", cpu="64",  gpu="1"),
            limits=Resources(mem="256Gi", cpu="64",  gpu="1"),
            pod_template = V1PodSpec(node_selector={"node_group": "V100"}),
        ),
        WorkerNodeConfig(
            group_name="A100-group",
            replicas=2,
            requests=Resources(mem="480Gi", cpu="60", gpu="2"),
            limits=Resources(mem="480Gi", cpu="60", gpu="2")
            pod_template = V1PodSpec(node_selector={"node_group": "A100"}),
        )
    ],
)

Describe alternatives you've considered

Just forward all parameters of a ray job from python to the go plugin and ignore the pod template from task level.

Propose: Link/Inline OR Additional context

I think this has a high overlap with #4377

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@vkaiser-mb vkaiser-mb added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Jan 4, 2024
Copy link

dosubot bot commented Jan 4, 2024

🤖

It's great to see your interest in enhancing the Flyte Ray plugin! Your proposal for allowing the pod spec in the Ray plugin seems well thought out and could definitely add value to the platform. Your detailed explanation and the example provided will be very helpful for the maintainers to understand the proposed changes. It's also good to see that you've linked the proposal to the relevant issue on GitHub. Your contribution would certainly be appreciated. Keep up the good work!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@eapolinario eapolinario added backlogged For internal use. Reserved for contributor team workflow. and removed untriaged This issues has not yet been looked at by the Maintainers labels Jan 18, 2024
@davidmirror-ops
Copy link
Contributor

@vkaiser-mb this is coming in flytekit 1.14. #5933
Would you mind taking a look when it's out and see if it address your needs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants