Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart for Flyte #550

Closed
wants to merge 5 commits into from

Conversation

rstanevich
Copy link
Contributor

This PR contains Helm chart for Flyte with sandbox and EKS configurations.

The configuration for sandbox (values-sandbox.yaml) is ready for deploying in Minikube. But EKS config (values-eks.yaml) should be edited before installation in the cloud: s3 bucket, RDS hosts, iam roles, secrets and etc need to be configured and modified.

@kumare3 kumare3 linked an issue Oct 15, 2020 that may be closed by this pull request
23 tasks
@kumare3
Copy link
Contributor

kumare3 commented Jan 8, 2021

@rstanevich i have been looking at Helm now and I am liking it. I will review your PR and we can build on it I feel. One of the problem seems to be how to use from remote configurations like - pytorch operator etc

@@ -0,0 +1,136 @@
{{- if .Values.contour.enabled }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on EKS now you can use alb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes I've just found this announce https://aws.amazon.com/blogs/aws/new-application-load-balancer-support-for-end-to-end-http-2-and-grpc/
looks like for now it is possible, also it requires aws-load-balancer-controller 2.0+ installed in kubernetes.
thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I did set it up already on a personal
Account and works really well. I will be updating the eks manifests

@sbrunk
Copy link
Member

sbrunk commented Jan 25, 2021

Since we're stuck with Helm for the time being we'd like to contribute here. We could help test the chart and perhaps work on the GKE config.

@rstanevich
Copy link
Contributor Author

So, I still didn't try new feature of AWS ALB with gRPC support. For provisioning it in K8s it requires new AWS loadbalancer controller for Kubernetes. I need some time to setup own devbox with new controller for testing this stuff.

@kumare3
Copy link
Contributor

kumare3 commented Feb 7, 2021

Since we're stuck with Helm for the time being we'd like to contribute here. We could help test the chart and perhaps work on the GKE config.

@sbrunk i would love to help you with some testing as well

@sbrunk
Copy link
Member

sbrunk commented Feb 8, 2021

@kumare3 @rstanevich what do you think about an approach that minimizes the diff between kustomize output and helm template output first for core Flyte w.o. dependencies, and then iterate from there?

That way we can make sure the helm installation is on par with what we have right now and it could also provide a smoother upgrade path.

My first crude try looks like this:

gh pr checkout 550
kustomize build kustomize/base/single_cluster/complete > base_deployment.yaml
kubectl apply -f base_deployment.yaml
helm template . -f values-sandbox.yaml | kubectl diff -f - 

Then incrementally work through the errors (and the diff later), change the helm chart accordingly to minimize the diff and run helm template again.

A slightly better approach could be using a structural diff of the rendered yaml output. That's because kubectl diff will check some API constraints (immutable fields etc.) that can slow us down here. I just havn't tried that yet because I couldn't find good tooling on first sight.

@rstanevich
Copy link
Contributor Author

@kumare3 @rstanevich what do you think about an approach that minimizes the diff between kustomize output and helm template output first for core Flyte w.o. dependencies, and then iterate from there?

That way we can make sure the helm installation is on par with what we have right now and it could also provide a smoother upgrade path.

My first crude try looks like this:

gh pr checkout 550
kustomize build kustomize/base/single_cluster/complete > base_deployment.yaml
kubectl apply -f base_deployment.yaml
helm template . -f values-sandbox.yaml | kubectl diff -f - 

Then incrementally work through the errors (and the diff later), change the helm chart accordingly to minimize the diff and run helm template again.

A slightly better approach could be using a structural diff of the rendered yaml output. That's because kubectl diff will check some API constraints (immutable fields etc.) that can slow us down here. I just havn't tried that yet because I couldn't find good tooling on first sight.

@sbrunk, do you mean we just need to compare the generated helm manifest and flyte_generated.yaml? Do we need run this once for this PR or some script to check it regularly? So, at first glance, I see one evident problem:

  • ConfigMap generated by Kustomize has a hash suffix in the name, but helm does not. So diff for ConfigMap won't work.

If the main goal is just to check smooth update from kustomize to helm installation I can check it out.

And an obvious note: If we'd like using helm install (I don't like this option :) ) - we cannot override existent k8s resources, we'll get smth like resource already exists.

@sbrunk
Copy link
Member

sbrunk commented Feb 9, 2021

@rstanevich yes I meant to use the diff only to help during development of the chart. It actually came up when I was looking into this PR to see how far you got compared with the kustomize based deployment, i.e. is the sandbox on par. I guess this is something you can answer, too. 😉

For us the upgrade path is actually not important because we don't run Flyte in prod yet but I guess for most people running Flyte in prod it's quite important.

@rstanevich
Copy link
Contributor Author

resolved in #916

@rstanevich rstanevich closed this Jun 14, 2021
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Dec 20, 2022
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Dec 20, 2022
* fix tag issue in ci

Signed-off-by: Yuvraj <[email protected]>

* remove welcome bot from boilerplate config

Signed-off-by: Yuvraj <[email protected]>

Co-authored-by: Yuvraj <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Jul 24, 2023
* Infer GOOS and GOARCH from environment

Signed-off-by: Jeev B <[email protected]>

* Multiarch builds for flytescheduler

Signed-off-by: Jeev B <[email protected]>

* fix makefile to read variables from environment and overrides

Signed-off-by: Jeev B <[email protected]>

---------

Signed-off-by: Jeev B <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Aug 9, 2023
* updated flyteidl to local to get ArrayNode

Signed-off-by: Daniel Rammer <[email protected]>

* added boilerplate to support ArrayNode

Signed-off-by: Daniel Rammer <[email protected]>

* pushing forward

Signed-off-by: Daniel Rammer <[email protected]>

* refactored node executor interfaces to fix dependency cycle

Signed-off-by: Daniel Rammer <[email protected]>

* refactoring almost complete

Signed-off-by: Daniel Rammer <[email protected]>

* refactor complete

Signed-off-by: Daniel Rammer <[email protected]>

* supporting environment variables

Signed-off-by: Daniel Rammer <[email protected]>

* minimum viable product

Signed-off-by: Daniel Rammer <[email protected]>

* update print statements for debugging

Signed-off-by: Daniel Rammer <[email protected]>

* massive refactor fixing NodeExecutionContext override for ArrayNode

Signed-off-by: Daniel Rammer <[email protected]>

* refactoring TODOs

Signed-off-by: Daniel Rammer <[email protected]>

* subnode retries working

Signed-off-by: Daniel Rammer <[email protected]>

* parallelism working

Signed-off-by: Daniel Rammer <[email protected]>

* cache and cache_serialize working - first new functionality in maptask

Signed-off-by: Daniel Rammer <[email protected]>

* adding implementation notes

Signed-off-by: Daniel Rammer <[email protected]>

* removed eventing from subtasks

Signed-off-by: Daniel Rammer <[email protected]>

* adding correct requirements

Signed-off-by: Daniel Rammer <[email protected]>

* working end-2-end with flytekit

Signed-off-by: Daniel Rammer <[email protected]>

* reporting output directory on success

Signed-off-by: Daniel Rammer <[email protected]>

* fixed output directory append

Signed-off-by: Daniel Rammer <[email protected]>

* mocking TaskTemplate interface to enable caching

Signed-off-by: Daniel Rammer <[email protected]>

* capture failure reasons

Signed-off-by: Daniel Rammer <[email protected]>

* wrapped up abort and finalize functionality

Signed-off-by: Daniel Rammer <[email protected]>

* mocking initialization events

Signed-off-by: Daniel Rammer <[email protected]>

* sending all events

Signed-off-by: Daniel Rammer <[email protected]>

* minor refactoring of debug prints and formatting

Signed-off-by: Daniel Rammer <[email protected]>

* intratask checkpointing working

Signed-off-by: Daniel Rammer <[email protected]>

* support for  and

Signed-off-by: Daniel Rammer <[email protected]>

* setting node log ids correctly

Signed-off-by: Daniel Rammer <[email protected]>

* reporting cache status

Signed-off-by: Daniel Rammer <[email protected]>

* correctly setting subnode abort phase

Signed-off-by: Daniel Rammer <[email protected]>

* removing dead code

Signed-off-by: Daniel Rammer <[email protected]>

* cleaned up most random TODO items

Signed-off-by: Daniel Rammer <[email protected]>

* refactored into new files

Signed-off-by: Daniel Rammer <[email protected]>

* refactoring for ArrayNode unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* refactored for unit testing to allow creation of NodeExecutor in array package

Signed-off-by: Daniel Rammer <[email protected]>

* first unit test for handling ArrayNodePhaseNone

Signed-off-by: Daniel Rammer <[email protected]>

* most of executing unit tests completed

Signed-off-by: Daniel Rammer <[email protected]>

* finished executing unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* finished succeeding unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* wrote failing phase unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* moving towards complete unit_test success

Signed-off-by: Daniel Rammer <[email protected]>

* unit tests passing

Signed-off-by: Daniel Rammer <[email protected]>

* fixed lint issues

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl dep

Signed-off-by: Daniel Rammer <[email protected]>

* added unit tests for Abort

Signed-off-by: Daniel Rammer <[email protected]>

* adding unit test for Finalize

Signed-off-by: Daniel Rammer <[email protected]>

* added utils unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* moved state structs to handler package

Signed-off-by: Daniel Rammer <[email protected]>

* added docs

Signed-off-by: Daniel Rammer <[email protected]>

* cleaned up abort event reporting

Signed-off-by: Daniel Rammer <[email protected]>

* fixed RecordNodeEvent unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* removed taskEventRecorder from nodes package

Signed-off-by: Daniel Rammer <[email protected]>

* adding interface checking for arraynode

Signed-off-by: Daniel Rammer <[email protected]>

* added transform unit test

Signed-off-by: Daniel Rammer <[email protected]>

* fixed input bindings issue

Signed-off-by: Daniel Rammer <[email protected]>

* fixed unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* fixed unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* go generate

Signed-off-by: Daniel Rammer <[email protected]>

* addressing random TODO

Signed-off-by: Daniel Rammer <[email protected]>

* fixed unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* addressing pr comments

Signed-off-by: Daniel Rammer <[email protected]>

---------

Signed-off-by: Daniel Rammer <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Aug 21, 2023
* Infer GOOS and GOARCH from environment

Signed-off-by: Jeev B <[email protected]>

* Multiarch builds for flytescheduler

Signed-off-by: Jeev B <[email protected]>

* fix makefile to read variables from environment and overrides

Signed-off-by: Jeev B <[email protected]>

---------

Signed-off-by: Jeev B <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Convert Flyte deployment from Kustomize to Helm!
3 participants