Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add run launcher for GCP Cloud Run Jobs #21864

Closed

Conversation

timchap
Copy link

@timchap timchap commented May 15, 2024

Summary & Motivation

I have been working on a cloud-native Dagster deployment for my company. I have opted to use Cloud Run Jobs for the run launcher(s) as it provides a batch containerised job environment which:

  • is highly managed
  • is highly scalable (consider increasing the Cloud Run Admin API quota, which is 10 req/min by default)
  • has zero idle cost
  • is fast to spin up
  • works elegantly in conjunction with Cloud Run Services for code server hosting

I noticed this discussion indicating that others are interested in a similar approach, which motivated this PR.

Note that on GCP a Cloud Run Job is a fairly persistent resource. That is, instead of creating a new Job resource for each run, it is preferable to have one persistent Job per code location and re-execute this job once per run with arg/env overrides. As such, the configuration of the Job itself (resources, image, service account) is not managed by Dagster in this implementation - instead, I do this with Terraform to ensure the proper coupling between code servers and run launchers. I may publish this as a public Terraform module when I have the chance, otherwise if there's a suitable way to share this within the Dagster project let me know.

How I Tested These Changes

I've been using the CloudRunRunLauncher in our corporate Dagster deployment for several months with no issues. I've included unit tests which mock the GCP clients. I am open to adding more integration-style tests which interact with cloud services directly, but I may need some guidance from the maintainers to do so (e.g. does the CI test environment have permissions for Cloud Run).

…sed on the multiple code location feature of the new run launcher
@timchap
Copy link
Author

timchap commented May 22, 2024

PR has been updated with scripts for an example deployment provided by @baumann-t.

We remain available for any other questions or change requests @garethbrickman. Thanks! 🙏

@stickyhipp stickyhipp requested a review from neilfulwiler May 28, 2024 19:00
@clement-chaneching
Copy link

That's great, thank you for this PR, I was also waiting for cloud run jobs to be supported!
I'm also very interested in the terraform part, would it be possible for you to share it?

@AndreaGiardini
Copy link
Contributor

I would love to see this as well :)

@timchap
Copy link
Author

timchap commented Jul 3, 2024

For those who were interested in the Terraform module, I have finally organised my deployment into a module. I also prepared a walkthrough to assist in running a demo/POC deployment. It comes with a few caveats as you will see, but with a bit of tweaking you should be able to get a fairly fully-featured prod-ready Dagster deployment running on GCP managed services.

@AndreaGiardini
Copy link
Contributor

Now that GPUs are available in preview on Google Cloud Run, we need this more than ever 💯

https://cloud.google.com/run/docs/configuring/services/gpu

@leonardovff
Copy link

any news?

1 similar comment
@mkleinbort-ic
Copy link

any news?

@cmpadden
Copy link
Contributor

Hey all!

This is super impressive, and we're very appreciative for all of the hard work here. We have a few sets of eyes looking over this internally, and hope to get feedback out soon.

Before we merge we are also exploring solutions to make it easier to incorporate large-scale community contributions like this one, like having a separate repository for community contributions to hopefully expedite the review process.

Will keep you all posted!

@john-ramsey
Copy link

Wondering if we have an update here? This is exciting stuff and would love to see what Dagster's broader vision is when implementing this work

@cmpadden
Copy link
Contributor

cmpadden commented Oct 28, 2024

Hey @john-ramsey and @timchap -

Last week we made the https://github.com/dagster-io/community-integrations repository public, with the intention of providing an outlet to more easily contribute integrations, and support the community. The intention was that this Cloud Runner work would be moved there under the dagster-contrib-gcp package. I've started porting over these changes, using git commit --author timchap to maintain attribution.

Since this PR was started, there have been some changes the core library, notably ExternalJob is now RemoteJob. @timchap would you have the bandwidth to help get these up-to-date in the new pull request?

Thanks for the patience, and I'm hoping we can move faster with this new outlet for community maintained integrations!

@timchap
Copy link
Author

timchap commented Nov 8, 2024

@cmpadden thanks for the update! Yes, sure, I will have a look at that in the coming week.

@timchap
Copy link
Author

timchap commented Nov 18, 2024

Work has been migrated to dagster-io/community-integrations#26.

@cmpadden @garethbrickman ok to close this PR?

@cmpadden
Copy link
Contributor

Work has been migrated to dagster-io/community-integrations#26.

@cmpadden @garethbrickman ok to close this PR?

Fantastic--thanks so much @timchap.

Will close this one, and will also post an update here once work completes in dagster-io/community-integrations#26.

@cmpadden cmpadden closed this Nov 18, 2024
@cmpadden
Copy link
Contributor

@timchap 's work has been merged in dagster-io/community-integrations#26, and is available for install via:

pip install dagster_contrib_gcp

For now we ask you to reference the repository itself as documentation as we improve that experience. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants