Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Plugin] Flux Operator #3829

Open
vsoch opened this issue Jul 4, 2023 · 8 comments
Open

[Plugin] Flux Operator #3829

vsoch opened this issue Jul 4, 2023 · 8 comments
Assignees
Labels

Comments

@vsoch
Copy link

vsoch commented Jul 4, 2023

Hi! 👋 I develop the Flux operator https://flux-framework.org/flux-operator/ which conceptually is like the MPI operator, but it brings up a Flux Framework cluster (that acts as a job) to run a scoped piece of work, akin the MPI operator. I'm interested in adding it as a plugin (and can also do the development work for it) but I wanted to check first about the order of operations.

I had first cloned https://github.com/flyteorg/flyteplugins, and I started adding the operator under k8s until I noticed that the others (e.g., dask) had a DaskJob that is also defined under the flyteidl repository. So I think the correct order of operations (and what I want to check here) is:

Are there any more pieces? Thanks for the help! I tried flyte out this week and really loved it - it already has support for several CRD I've been hoping to see in one place, so I'm eager to see support for our operator here as well.

@hamersaw
Copy link
Contributor

hamersaw commented Jul 7, 2023

@vsoch this is great to see, happy to help where we can! So the main changes that need to happen to add an a new plugin:

  • flyteidl: add a proto message to describe the job metadata
  • flyteplugins: add support for handling the job definition
  • flytekit: add conversion from python definition into flyte proto

There are a few relatively recent examples that lay a great foundation for how backend plugins can be added. This is a comment laying out the PRs that added the dask plugin and here is the issue for the ray plugin. I would advise you to take a look through the process that both of those plugins went through and I would be happy to fill in the gaps.

@kumare3
Copy link
Contributor

kumare3 commented Jul 11, 2023

Also @vsoch please join slack.flyte.org I am sure the community would love this and love to help. We would also love to learn what you plan to do with it.

Cc @davidmirror-ops

@davidmirror-ops
Copy link
Contributor

@vsoch This is great to see. Please let us know if any further question arises in the process, we'd like to see this integration happening too.

Also, we host a bi-weekly contributor meetup where all the maintainers, steering committee members, and new/existing contributors discuss ideas. It'd be great to have you there. More info here

@vsoch
Copy link
Author

vsoch commented Jul 15, 2023

heyo! Apologies for the small silence - I had 3x the amount of normal meetings this week and didn't have enough time to program! I have this on my TODO and worst case will be a few weeks away - I will definitely keep you in the loop. Thank you for keeping the issue open!

@vsoch
Copy link
Author

vsoch commented Jul 17, 2023

okay I'm starting with flyteidl. My question is pretty simple - how do I know what fields to create here?

syntax = "proto3";

import "flyteidl/core/tasks.proto";

package flyteidl.plugins;

option go_package = "github.com/flyteorg/flyteidl/gen/pb-go/flyteidl/plugins";


// Custom Proto for Dask Plugin.
message DaskJob {
    // Spec for the scheduler pod.
    DaskScheduler scheduler = 1; 

    // Spec of the default worker group.
    DaskWorkerGroup workers = 2;
}

// Specification for the scheduler pod.
message DaskScheduler {
    // Optional image to use. If unset, will use the default image.
    string image = 1;

    // Resources assigned to the scheduler pod.
    core.Resources resources = 2;
}

message DaskWorkerGroup {
    // Number of workers in the group.
    uint32 number_of_workers = 1;

    // Optional image to use for the pods of the worker group. If unset, will use the default image.
    string image = 2;
    
    // Resources assigned to the all pods of the worker group.
    // As per https://kubernetes.dask.org/en/latest/kubecluster.html?highlight=limit#best-practices 
    // it is advised to only set limits. If requests are not explicitly set, the plugin will make
    // sure to set requests==limits.
    // The plugin sets ` --memory-limit` as well as `--nthreads` for the workers according to the limit.
    core.Resources resources = 3;
}

It looks like it's hitting a tiny subset of resources here: https://kubernetes.dask.org/en/latest/operator_resources.html and is this an overlapping set between what flyte needs and dask? What about the others in the CRD?

@kumare3
Copy link
Contributor

kumare3 commented Oct 11, 2023

@vsoch any updates on this?

@vsoch
Copy link
Author

vsoch commented Oct 11, 2023

I don’t think anyone ever answered my question?

Copy link

github-actions bot commented Jul 8, 2024

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable.
Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot added the stale label Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants