Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce pod_spec_from_resources()ray helper function #2943

Merged
merged 25 commits into from
Dec 13, 2024

Conversation

fiedlerNr9
Copy link
Contributor

@fiedlerNr9 fiedlerNr9 commented Nov 20, 2024

Tracking issue

Related to flyteorg/flyte#5666

Why are the changes needed?

These changes update the flytekit ray plugin to let the user construct Ray pod definition with an introduced helper function called pod_spec_from_resources(). This function allows setting setting Resources for Ray Head & Worker nodes

What changes were proposed in this pull request?

  • introducing pod_spec_from_resources()

How was this patch tested?

  • added unit tests
  • Running this
from flytekit import ImageSpec, Resources, task, workflow
from flytekitplugins.ray import HeadNodeConfig, RayJobConfig, WorkerNodeConfig
import ray
import typing
from flytekit.models.task import K8sPod, K8sObjectMetadata
from flytekit.core.resources import pod_spec_from_resources

flytekit_hash = "248cda186dd95253f432507b42e906f6de48201b"
flytekitplugins_ray = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}#subdirectory=plugins/flytekit-ray"
new_flytekit = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}"

container_image = ImageSpec(
    name="ray-union-demo",
    python_version="3.11.9",
    apt_packages=["wget", "gdb", "git"],
    packages=[
        new_flytekit,
        flytekitplugins_ray,
        "kubernetes",
    ],
    registry="ghcr.io/fiedlerNr9",
)
ray_config = RayJobConfig(
    head_node_config=HeadNodeConfig(
        ray_start_params={"num-cpus": "0", "log-color": "true"},
        k8s_pod=K8sPod(pod_spec=pod_spec_from_resources(k8s_pod_name="ray-head", requests=Resources(cpu="4",mem="5Gi")))
    ),
    worker_node_config=[
        WorkerNodeConfig(
            group_name="ray-group",
            replicas=2,
            min_replicas=0,
            max_replicas=2,
            k8s_pod=K8sPod(pod_spec=pod_spec_from_resources(k8s_pod_name="ray-worker", requests=Resources(cpu="1",mem="1Gi")))
        )
    ],
    shutdown_after_job_finishes=True,
    ttl_seconds_after_finished=60,
    enable_autoscaling=False,
)


@ray.remote
def f(x):
    return x * x


@task(
    task_config=ray_config,
    requests=Resources(mem="2Gi", cpu="3000m"),
    container_image=container_image,
)
def ray_task(n: int) -> typing.List[int]:
    futures = [f.remote(i) for i in range(n)]
    return ray.get(futures)


@workflow
def wf(n: int = 50):
    ray_task(n=n)

Ray Head node Resources

Limits:
      cpu:     4
      memory:  5Gi
    Requests:
      cpu:      4
      memory:   5Gi

Ray controller Resources

Limits:
      cpu:     2
      memory:  3Gi
    Requests:
      cpu:      2
      memory:   3Gi

Ray worker Resources

Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      1
      memory:   1Gi

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

@fiedlerNr9 fiedlerNr9 changed the title Construct ray k8spods Decouple Ray Resources: Construct ray k8spods from Resources Nov 20, 2024
@fiedlerNr9 fiedlerNr9 force-pushed the construct-ray-k8spods branch from 2778db2 to b79cea4 Compare November 20, 2024 18:40
@fiedlerNr9 fiedlerNr9 force-pushed the construct-ray-k8spods branch from f2b811c to da7d6ae Compare November 20, 2024 19:04
Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 20.00000% with 16 lines in your changes missing coverage. Please review.

Project coverage is 46.65%. Comparing base (faee3da) to head (cad3d33).
Report is 36 commits behind head on master.

Files with missing lines Patch % Lines
flytekit/core/resources.py 20.00% 16 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (faee3da) and HEAD (cad3d33). Click for more details.

HEAD has 6 uploads less than BASE
Flag BASE (faee3da) HEAD (cad3d33)
8 2
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2943       +/-   ##
===========================================
- Coverage   79.32%   46.65%   -32.67%     
===========================================
  Files         199      200        +1     
  Lines       20870    20962       +92     
  Branches     2684     2709       +25     
===========================================
- Hits        16555     9780     -6775     
- Misses       3566    10694     +7128     
+ Partials      749      488      -261     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@eapolinario eapolinario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, nothing major.

flytekit/core/resources.py Outdated Show resolved Hide resolved
flytekit/core/resources.py Outdated Show resolved Hide resolved
plugins/flytekit-ray/flytekitplugins/ray/models.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@eapolinario eapolinario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for flip-flopping on this, I only realized that we were removing the pod spec template. We should instead build helper functions around that, but still produce pod specs that get passed to the ray idl objects.

requests: Optional[Resources],
limits: Optional[Resources],
) -> dict[str, Any]:
def _construct_k8s_pods_resources(resources: Optional[Resources], k8s_gpu_resource_key: str = "nvidia.com/gpu"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using other gpus is going to be hard, even if we push this parameter to the outer function (i.e. construct_k8s_pod_spec_from_resources).

):
self._group_name = group_name
self._replicas = replicas
self._max_replicas = max(replicas, max_replicas) if max_replicas is not None else replicas
self._min_replicas = min(replicas, min_replicas) if min_replicas is not None else replicas
self._ray_start_params = ray_start_params
self._k8s_pod = k8s_pod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep this as part of the interface and build helper functions that construct valid pod specs instead (as mentioned in the original flyte PR). This is going to help in the other problem we're having with passing the gpu resource name around (in other words, gpu can be an argument of one of the helper function that builds pod specs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you are saying. So we want users to construct the pod specs themself like calling construct_k8s_pod_spec_from_resources() or specifying pod templates in user code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the method name simple, maybe pod from resources

plugins/flytekit-ray/flytekitplugins/ray/models.py Outdated Show resolved Hide resolved
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
@fiedlerNr9 fiedlerNr9 changed the title Decouple Ray Resources: Construct ray k8spods from Resources Introduce pod_spec_from_resources()ray helper function Dec 7, 2024
Copy link
Collaborator

@eapolinario eapolinario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@eapolinario eapolinario merged commit 2da64ef into master Dec 13, 2024
104 of 107 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants