Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update apis to allow for compatibility with k8s 1.22 #1972

Merged
merged 3 commits into from
Jan 26, 2022

Conversation

fsz285
Copy link
Contributor

@fsz285 fsz285 commented Dec 21, 2021

The CRD for the flyteworkflows is incomplete, atm. A schema should be
added in the near future.

See #1273

@welcome
Copy link

welcome bot commented Dec 21, 2021

Thank you for opening this pull request! 🙌
These tips will help get your PR across the finish line: - Most of the repos have a PR template; if not, fill it out to the best of your knowledge. - Sign off your commits (Reference: DCO Guide).

The CRD for the flyteworkflows is incomplete, atm. A schema should be
added in the near future.

See flyteorg#1273

Signed-off-by: ferdinand.szekeresch <[email protected]>
Signed-off-by: Sören Brunk <[email protected]>
@sbrunk
Copy link
Member

sbrunk commented Dec 22, 2021

@EngHabu @kumare3
Docker desktop is now at k8s 1.22 so this would help people installing sandbox directly inside the k8s coming with docker desktop.

Looking at the failing test it seems we also need to update Flytepropeller to be able to deal with apiextensions.k8s.io/v1 for CRDs:

goroutine 681 [running]:
github.com/flyteorg/flytepropeller/pkg/apis/flyteworkflow/v1alpha1.(*WorkflowSpec).GetID(...)
	/go/src/github.com/flyteorg/flytepropeller/pkg/apis/flyteworkflow/v1alpha1/workflow.go:278
github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).TryMutateWorkflow(0xc000b7b350, 0x2898430, 0xc000785ce0, 0xc00045ef00, 0x0, 0x0, 0x0)

@EngHabu
Copy link
Contributor

EngHabu commented Jan 10, 2022

@sbrunk @fsz285 Do you have an idea of what are the changes needed in propeller?

@sbrunk
Copy link
Member

sbrunk commented Jan 10, 2022

@EngHabu I haven't looked at the code yet (my go knowledge is still somewhat limited) but AFAIK propeller dynamically creates flyteworkflow custom resources when a workflow is executed.

Since 1.22 removes apiextensions.k8s.io/v1beta1, @fsz285 has changed the CRD here in this PR to conform to apiextensions.k8s.io/v1 according to https://kubernetes.io/docs/reference/using-api/deprecation-guide/#customresourcedefinition-v122

I initially thought we'd have to change propeller to create flyteworkflow custom resources conforming to that updated CRD version, but the structure of a flyteworkflow custom resource instance itself actually shouldn't change at all. 🤔

The only difference in our case is that a schema is required in v1 while it was optional in v1beta1 so we allow any JSON object but perhaps we are doing it wrong.

@sbrunk
Copy link
Member

sbrunk commented Jan 10, 2022

The error in the test is caused by this line when we're trying to update the custom resource status:

// TODO we will need to call updatestatus when it is supported. But to preserve metadata like (label/finalizer) we will need to use update

// update the GetExecutionStatus block of the FlyteWorkflow resource. UpdateStatus will not
// allow changes to the Spec of the resource, which is ideal for ensuring
// nothing other than resource status has been updated.
p.wfStore.Update(ctx, mutatedWf, workflowstore.PriorityClassCritical)

@hamersaw
Copy link
Contributor

hamersaw commented Jan 11, 2022

@sbrunk I'm interested in helping out with the where I can. FlyteAdmin actually creates the FlyteWorkflow CRD to start a workflow execution which is then detected by FlytePropeller using it's k8s watch. Then as you indicated FlytePropeller updates the CRD as needed in the line you mentioned. I suspect the CRD update may require changed in both FlyteAdmin and FlytePropeller. Hopefully it's as easy as updating the API version.

@EngHabu
Copy link
Contributor

EngHabu commented Jan 11, 2022

@hamersaw thank you! You should be able to repro this locally by starting flytectl sandbox start, updating the CRD in the cluster (like, this file). And trying to run any example Workflow...

I'm not sure where things go wrong... Updating the Api version may require regeneration of clientset in propeller using -maybe- some updated versions of k8s code-gen repo? but yeah I would start with a simple repo and looking at the FlyteWorkflow CRD Instance Admin creates after launching an execution...

@hamersaw
Copy link
Contributor

@fsz285 @sbrunk I think you missed the x-kubernetes-preserve-unknown-fields in the FlyteWorkflow CRD v1 schema. As I understand it, unless this is explicitly set all fields not conforming to the schema will be automatically pruned. In my testing, all CRDs made in k8s v1.22 with the previous setup (this PR) had completely empty specs which then makes sense that retrieving the workflow ID gives us an "invalid memory address or nil pointer dereference" error as you saw. Would you be willing to run a quick test including the x-kubernetes-preserve-unknown-fields as shown below? it worked for me.

{{- if .Values.flytepropeller.enabled }}
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: flyteworkflows.flyte.lyft.com
spec:
  group: flyte.lyft.com
  names:
    kind: FlyteWorkflow
    plural: flyteworkflows
    shortNames:
    - fly
    singular: flyteworkflow
  scope: Namespaced
  versions:
  - name: v1alpha1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        x-kubernetes-preserve-unknown-fields: true
        properties:
{{- end }}

cc @EngHabu this, of course, should be a temporary workaround. I think we really should flesh out a fully defined schema for the FlyteWorkflow CRD moving forward. Additionally, although it seems to work with the 0.20 k8s.io/api dependencies in a majority of our repos (I suspect we are using fairly stable APIs), we should probably update these as well. Thoughts?

Required to prevent pruning of fields by the API server until we have a
schema for workflows.

Signed-off-by: Sören Brunk <[email protected]>
@sbrunk
Copy link
Member

sbrunk commented Jan 26, 2022

@hamersaw yeah that totally makes sense. Thanks for looking into it. I just added x-kubernetes-preserve-unknown-fields: true to see if the tests pass now.

Copy link
Contributor

@EngHabu EngHabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the work!

@EngHabu EngHabu merged commit 94327d6 into flyteorg:master Jan 26, 2022
@welcome
Copy link

welcome bot commented Jan 26, 2022

Congrats on merging your first pull request! 🎉

@bigbitbus
Copy link

@EngHabu This change does break the helm chart for everyone who is on Kubernetes 1.18 or earlier (ingress v1beta1)... I know those are old versions but still a good number of folks use those versions, and EKS supports that until March 2022.

@sbrunk
Copy link
Member

sbrunk commented Feb 1, 2022

If we really need this (March 2022 is only one month away though), it is possible to use helm capabilities to define an ingress differently depending on the available API/k8s version:

https://stackoverflow.com/questions/66147091/is-it-possible-to-configure-a-helm-chart-to-dynamically-use-a-value-based-on-ano

@EngHabu
Copy link
Contributor

EngHabu commented Feb 1, 2022

I've an attempt at that here: #2114
but trying to figure out why make helm is producing a diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants