Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to use Kustomize v4+ #1797

Closed
zijianjoy opened this issue Apr 5, 2021 · 46 comments
Closed

Upgrade to use Kustomize v4+ #1797

zijianjoy opened this issue Apr 5, 2021 · 46 comments
Assignees

Comments

@zijianjoy
Copy link
Contributor

zijianjoy commented Apr 5, 2021

From the README of kubeflow/manifests: https://github.com/kubeflow/manifests#prerequisites, it is a prerequisite to use Kustomize v3.2.0 for building resources. However, Kustomize v4+ is available. Can we upgrade prerequisite to use new version of Kustomize?

One thing that needs to be changed is that: flag name load_restrictor is changed in Kustomize v4+, we need to make the following updates in docs and build commands:

Change from --load_restrictor=none to --load-restrictor LoadRestrictionsNone. Note the difference between _ and -.

@yanniszark yanniszark self-assigned this Apr 5, 2021
@yanniszark
Copy link
Contributor

@zijianjoy thanks a lot for opening this issue to switch our target version to latest Kustomize v4.
This was not in our plan for 1.3.0, the reason being that we didn't want to change too many things together.
But it can very well be in our plans for the point release or 1.4. Let's use this issue to track the effort.

@davidspek
Copy link
Contributor

davidspek commented Apr 6, 2021

I am using Kustomize v4.0.5 for https://github.com/kubeflow-onprem/ArgoFlow and everything works as expected. The --load_restrictor flag is also not necessary except for the Katib manifests, which should be fixed and I have a PR open for this already. For the Argo installation I am not using load_restrictor (for Katib I am using my fixed version of the manifests) so this can be seen as proof that this flag is not necessary for deploying Kubeflow.

@zijianjoy One interesting anomaly I did encounter is that running kustomize build github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-multi-user-pns works for kustomize 3.2.1 and 4.0.5, but not for 3.9.4 (all the other manifests did not have any issue with 3.9.4, the default for Argo CD v2). There was a breaking change in how remote bases are grabbed in kustomize between v3 and v4 (s3 isn't supported anymore), so this is something to potentially keep in mind (not that I think we should update to an outdated version though).

@berndverst
Copy link
Member

berndverst commented Apr 7, 2021

This issue burns me over and over. I am used to installing the latest version of tools and I ran into this again trying to work out the latest release configs for Azure.

As far as I can tell 3.9.2+ are broken with Knative also:
kustomize build --load_restrictor=none common/knative/knative-eventing-install/base

errors with

... at path 'spec/template/metadata/labels': wrong Node Kind for spec.template.metadata.labels expected: MappingNode was AliasNode: value: {*ref_0}

The latest version that works for me is 3.9.1 or 3.8.10. I installed and tested about 20 different versions.

@berndverst
Copy link
Member

FYI @davidspek with 4.0.5 Knative is also broken as is:

 ~/src/kubeflow/manifests-1.3-branch:  kustomize build --load-restrictor LoadRestrictionsNone common/knative/knative-eventing-install/base
wrong Node Kind for spec.template.metadata.labels expected: MappingNode was AliasNode: value: {*ref_0}

@yanniszark
I really think this is a release blocking issue because neither the latest 3.X version nor any 4.X versions of kustomize work. On the one hand we are telling people that kfdef is no longer the recommend tool, instead use kustomize, on the other hand we don't support any modern versions of kustomize.

@berndverst
Copy link
Member

@yanniszark with the PR to remove pointers from the Knative YAML everything seems to be working fine.

I can generate everything with v4.0.5 by running from the example folder you provided:

kustomize build --load-restrictor LoadRestrictionsNone .

@davidspek
Copy link
Contributor

@berndverst You are correct. I had forgotten about this one but I had already created a PR to fix this in the KNative manifests 4 days ago.

#1795

Also, you don't need the --load-restrictor LoadRestrictionsNone flag so you shouldn't be using it as it disables a security check in kustomize.

@berndverst
Copy link
Member

@davidspek I noticed I apparently created 2 PRs that were dupes of yours when they were marked as dupes this morning sigh. At least we are getting those issues addressed now!

@berndverst
Copy link
Member

@davidspek I found I needed the load restrictor flag for katib when I tried this yesterday. But I saw that this is also being addressed somewhere.

@davidspek
Copy link
Contributor

@berndverst Indeed, Katig should be the only one. I am working with @andreyvelich to remove this requirement in kubeflow/katib#1498.

On my branch with these fixes you can run:
kustomize build github.com/davidspek/katib/manifests/v1beta1/installs/katib-with-kubeflow-cert-manager?ref=fix-manifests
And it will build the manifests directly, no need to download them locally.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 8, 2021

I want to add another data point that sticking to kustomize 3.2.0 is not a good idea.

See kubernetes-sigs/kustomize#1500 (comment), starting from kubectl release 1.21, it will come with kustomize v3.9. In not far future, people will be able to use a relatively new version of kustomize from kubectl. That can be an onboarding experience improvement if kubectl is the only tool people need.

I think the urgency isn't so much considering the speed people upgrade kubernetes clusters, it could be enough that we start supporting later kustomize versions starting from Kubeflow 1.4.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 8, 2021

@berndverst You are correct. I had forgotten about this one but I had already created a PR to fix this in the KNative manifests 4 days ago.

#1795

Also, you don't need the --load-restrictor LoadRestrictionsNone flag so you shouldn't be using it as it disables a security check in kustomize.

@davidspek I want to remind you why we decided to use load restrictor none, read https://bit.ly/kf_kustomize_v3. It was introduced mainly because we want people to build reusable patches that people can compose, instead of relying on overlays (which is inheritance model and people cannot compose).

So far, I think https://kubectl.docs.kubernetes.io/guides/config_management/components/ is a feature designed to solve this problem -- allow reusable composition of kustomization transformation without turning off security check.

but I haven't used it much and I don't see much of the community using it, I wonder how people think about it.

@davidspek
Copy link
Contributor

@Bobgy I understand why the load-restrictor was useful, but currently only 1 component still needs it (and those manifests are better suited with overlays if you ask me). Indeed the components feature is something I ran into as well, but this cannot be used at this point because for some reason kustomize 3.2.1 is being used (the load restrictor can be using with newer version, and all the manifests build with newer versions).
The only reason the version of kustomize matters at this point is because the load-restrictor flag has changed in newer versions, and this flag is being used for all the manifests even though only the Katib manifests needs it (for now). If the load-restrictor flag is removed from the instructions, the kustomize version can also be removed.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 8, 2021

Good point! Then I agree it's very useful fixing katib and let everyone stop using the flag, that makes manifest compatible between v3 and v4.

@davidspek
Copy link
Contributor

@Bobgy For reference, here is the PR for updating the Katib manifests: kubeflow/katib#1498.

@yanniszark
Copy link
Contributor

@berndverst we have merged all linked PRs necessary in order to use kustomize 4.0.5.
However, I have bumped into what seems a blocking issue in latest kustomize.
The latest kustomize version orders admission webhooks (Validating/MutatingWebhookConfiguration) last.
This means that the installation is now prone to the following race condition:

  1. Pods are created before the istio injection webhook, thus they don't get a sidecar.
  2. Apps that integrate with CentralDashboard like KFP UI fail because they don't have a sidecar.

I tested the current example kustomization and it actually failed. Did you also encounter this problem?
I will open an upstream kustomize issue on Monday.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 10, 2021

@yanniszark I think we should better break up installation to several steps to avoid the race condition in the first place. There is no guarantee how long each webhook server takes to start either.

@davidspek
Copy link
Contributor

I've deployed the manifests manually with Kustomize 4.0.5 as well and have only run into the chicken & egg situation a few times and rerunning the command (instantly) like what is stated in the current instructions solved it.

@yanniszark It looks like there is an existing discussion about the ordering, see kubernetes-sigs/kustomize#821. In particular this comment and the following one are about the ordering of MutatingWebhookConfigurations.

@yanniszark
Copy link
Contributor

@yanniszark I think we should better break up installation to several steps to avoid the race condition in the first place. There is no guarantee how long each webhook server takes to start either.

True, but this is not a problem with the current single command installation right? Because if a webhook Pod is not up yet, then kubectl exits with non-zero code and is retried.

I've deployed the manifests manually with Kustomize 4.0.5 as well and have only run into the chicken & egg situation a few times and rerunning the command (instantly) like what is stated in the current instructions solved it.

@davidspek you are referring to the scenario described by @Bobgy right? That is, trying to create an object while its registered webhook Service does not have any ready Pods yet.

In the scenario I described, I believe retrying won't help, as the Pods need to be recreated in order to be injected with the Istio sidecar.

@yanniszark It looks like there is an existing discussion about the ordering, see kubernetes-sigs/kustomize#821. In particular this comment and the following one are about the ordering of MutatingWebhookConfigurations.

Yeap! I am currently going through the relevant kustomize PRs / issues and I will open an upstream issue to discuss.

@davidspek
Copy link
Contributor

@Bobgy @yanniszark The PR to make Katib’s manifests not need the load-restrictor (and some other enhancements/optimizations) has just been merged: kubeflow/katib#1498. It does still need to be picked into the current release branch. After this is done, the load-restrictor shouldn’t be needed for any of the Kubeflow manifests anymore and v3/v4 should both work without issue. I’ve been deploying everything with 4.0.5 and without the load-restrictor flag.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 13, 2021

@yanniszark I think we are talking about the same problem, if istio sidecars are not injected, retry doesn't help.

So I suggest separating steps as necessary -- e.g. Istio setup should finish before anything may depend on it. It's ok to not separate steps for race conditions that can be resolved via retry.

Right now, it seems we need two steps:

  1. apply istio & its dependencies
  2. apply other stuff

@yanniszark
Copy link
Contributor

@Bobgy for this:

I think we should better break up installation to several steps to avoid the race condition in the first place

We already have the step-by-step instructions. From the later message, I see that you are talking about just 2 steps, instead of a full step-by-step installation. This is definitely a possible workaround, but there might still be some webhook configurations that are skipped because of kustomize's new ordering, so we need to look into that carefully if we decide to go down this way.

In addition, I have opened an upstream issue in kustomize and I'm testing a fix:
kubernetes-sigs/kustomize#3794

@davidspek
Copy link
Contributor

Just for reference, I've not seen this issue with the Istio sidecars not being injected and causing problems when deploying with Argo CD. I'm not sure if this is just luck, or if Argo CD is handling this is some way behind the scenes.

@connorlwilkes
Copy link

Are manifests going to revert to referencing the 'upstream' manifests through URL's for the next release or is this planned for further in the future? This would drastically reduce the overhead of pulling in all the upstream manifests into this repository, it would simply be a link

@yanniszark
Copy link
Contributor

yanniszark commented Apr 13, 2021

Are manifests going to revert to referencing the 'upstream' manifests through URL's for the next release or is this planned for further in the future? This would drastically reduce the overhead of pulling in all the upstream manifests into this repository, it would simply be a link

That is something that should be discussed for after the 1.3 release. @connorlwilkes would you like to open an issue to propose this change and the benefits / drawbacks? cc @davidspek who has also used this method in Argoflow.
Since this issue is for using kustomize v4 and the issues blocking us from doing so, let's spin this discussion in a separate issue.

@davidspek
Copy link
Contributor

@yanniszark @connorlwilkes After the release I was also planning on starting this discussion. I think this will be particularly useful once I have Renovate properly configured for everything, as the only upkeep would then be merge auto-created PRs. Distributions would then simply for the repo and add their customizations to it, similarly to how people are using my Argo installation now. But indeed, this is off the topic of this issue and should be discussed somewhere else.

@stale stale bot removed the lifecycle/stale label Jan 17, 2022
@stale
Copy link

stale bot commented Apr 18, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions.

@mvoitko
Copy link
Contributor

mvoitko commented Apr 20, 2022

@yanniszark any updates?

@elamaran11
Copy link

When could we expect that kubeflow manifests support kustomize v4+?

@yanniszark
Copy link
Contributor

Hey folks, just a quick update that the PR to introduce support for custom ordering in kustomize is in the final review stage: kubernetes-sigs/kustomize#4019
Thanks for your patience :)

apo-ger added a commit to apo-ger/manifests that referenced this issue Nov 22, 2022
Istio 1.6.0 generated manifests include some policy/v1
PodDisruptionBudget resources that we need to remove. See:
- istio/istio#12602
- istio/istio#24000

The current manifests utilize two "delete" patches:
- common/istio-1-16/istio-install/base/patches/remove-pdb.yaml
- common/istio-1-16/cluster-local-gateway/base/patches/remove-pdb.yaml

However these patches do not work with kustomize v3.2.0. The root
cause is that v3.2.0 doesn't have the appropriate openapi schema for
the policy/v1 API version resources. This is fixed in kustomize v4+.
See:
- kubernetes-sigs/kustomize#3694 (comment)
- kubernetes-sigs/kustomize#4495

Changes:
- We disable the delete patches until we upgrade to kustomize v4+.
  - tracked in: kubeflow#1797

- As a temporary workaraound we remove PodDisruptionBudget resources
  manually with yq before deploying Istio manifests.

- Update README file with instructions.

Refs: kubeflow#2325

Signed-off-by: Apostolos Gerakaris <[email protected]>
apo-ger added a commit to apo-ger/manifests that referenced this issue Nov 22, 2022
Istio 1.6.0 generated manifests include some policy/v1
PodDisruptionBudget resources that we need to remove. See:
- istio/istio#12602
- istio/istio#24000

The current manifests utilize two "delete" patches:
- common/istio-1-16/istio-install/base/patches/remove-pdb.yaml
- common/istio-1-16/cluster-local-gateway/base/patches/remove-pdb.yaml

However these patches do not work with kustomize v3.2.0. The root
cause is that v3.2.0 doesn't have the appropriate openapi schema for
the policy/v1 API version resources. This is fixed in kustomize v4+.
See:
- kubernetes-sigs/kustomize#3694 (comment)
- kubernetes-sigs/kustomize#4495

Changes:
- We disable the delete patches until we upgrade to kustomize v4+.
  - tracked in: kubeflow#1797

- As a temporary workaraound we remove PodDisruptionBudget resources
  manually with yq before deploying Istio manifests.

- Update README file with instructions.

Refs: kubeflow#2325

Signed-off-by: Apostolos Gerakaris <[email protected]>
apo-ger added a commit to apo-ger/manifests that referenced this issue Nov 22, 2022
Istio 1.6.0 generated manifests include some policy/v1
PodDisruptionBudget resources that we need to remove. See:
- istio/istio#12602
- istio/istio#24000

The current manifests utilize two "delete" patches:
- common/istio-1-16/istio-install/base/patches/remove-pdb.yaml
- common/istio-1-16/cluster-local-gateway/base/patches/remove-pdb.yaml

However these patches do not work with kustomize v3.2.0. The root
cause is that v3.2.0 doesn't have the appropriate openapi schema for
the policy/v1 API version resources. This is fixed in kustomize v4+.
See:
- kubernetes-sigs/kustomize#3694 (comment)
- kubernetes-sigs/kustomize#4495

Changes:
- We disable the delete patches until we upgrade to kustomize v4+.
  - tracked in: kubeflow#1797

- As a temporary workaraound we remove PodDisruptionBudget resources
  manually with yq before deploying Istio manifests.

- Update README file with instructions.

Refs: kubeflow#2325

Signed-off-by: Apostolos Gerakaris <[email protected]>
apo-ger added a commit to apo-ger/manifests that referenced this issue Nov 24, 2022
Istio 1.6.0 generated manifests include some policy/v1
PodDisruptionBudget resources that we need to remove. See:
- istio/istio#12602
- istio/istio#24000

The current manifests utilize two "delete" patches:
- common/istio-1-16/istio-install/base/patches/remove-pdb.yaml
- common/istio-1-16/cluster-local-gateway/base/patches/remove-pdb.yaml

However these patches do not work with kustomize v3.2.0. The root
cause is that v3.2.0 doesn't have the appropriate openapi schema for
the policy/v1 API version resources. This is fixed in kustomize v4+.
See:
- kubernetes-sigs/kustomize#3694 (comment)
- kubernetes-sigs/kustomize#4495

Changes:
- We disable the delete patches until we upgrade to kustomize v4+.
  - tracked in: kubeflow#1797

- As a temporary workaraound we remove PodDisruptionBudget resources
  manually with yq before deploying Istio manifests.

- Update README file with instructions.

Refs: kubeflow#2325

Signed-off-by: Apostolos Gerakaris <[email protected]>
google-oss-prow bot pushed a commit that referenced this issue Nov 24, 2022
* common: Add Istio v1.16.0 manifests

Signed-off-by: Apostolos Gerakaris <[email protected]>

* Update kustomization file in example to deploy istio-1-16

Signed-off-by: Apostolos Gerakaris <[email protected]>

* tests: Update install Istio GH action script

Use Istio 1.16 for testing

Signed-off-by: Apostolos Gerakaris <[email protected]>

* common: Remove PodDisruptionBudget resources for Istio

Istio 1.6.0 generated manifests include some policy/v1
PodDisruptionBudget resources that we need to remove. See:
- istio/istio#12602
- istio/istio#24000

The current manifests utilize two "delete" patches:
- common/istio-1-16/istio-install/base/patches/remove-pdb.yaml
- common/istio-1-16/cluster-local-gateway/base/patches/remove-pdb.yaml

However these patches do not work with kustomize v3.2.0. The root
cause is that v3.2.0 doesn't have the appropriate openapi schema for
the policy/v1 API version resources. This is fixed in kustomize v4+.
See:
- kubernetes-sigs/kustomize#3694 (comment)
- kubernetes-sigs/kustomize#4495

Changes:
- We disable the delete patches until we upgrade to kustomize v4+.
  - tracked in: #1797

- As a temporary workaraound we remove PodDisruptionBudget resources
  manually with yq before deploying Istio manifests.

- Update README file with instructions.

Refs: #2325

Signed-off-by: Apostolos Gerakaris <[email protected]>

* Update README

Use the --cluster-specific flag when generating Istio 1.16 manifests
for K8s-1.25. See:
* istio/istio#41220

Signed-off-by: Apostolos Gerakaris <[email protected]>

* tests: Update GH Action workflows

Trigger the workflows when the Istio version changes

Signed-off-by: Apostolos Gerakaris <[email protected]>

Signed-off-by: Apostolos Gerakaris <[email protected]>
@yanniszark
Copy link
Contributor

Kustomize PR is merged: kubernetes-sigs/kustomize#4019
The new functionality should be coming in the next kustomize release!

@juliusvonkohout
Copy link
Member

@yanniszark is there an easy way to enable the ordering in kustomize 4 and who is going to change the install instructions?

@yanniszark
Copy link
Contributor

@juliusvonkohout the way to customize the order is through a new sortOptions field in kustomization.yaml.
You can see an example here:
https://github.com/kubernetes-sigs/kustomize/blob/71eb865cea2646ccf5e4bdeeb555f85c6604b537/api/krusty/sortordertransformer_test.go#L113-L144

This would go in the example kustomization. I think @kimwnasptd will coordinate the effort, it should be a straightforward PR.

kevin85421 pushed a commit to juliusvonkohout/manifests that referenced this issue Feb 28, 2023
* common: Add Istio v1.16.0 manifests

Signed-off-by: Apostolos Gerakaris <[email protected]>

* Update kustomization file in example to deploy istio-1-16

Signed-off-by: Apostolos Gerakaris <[email protected]>

* tests: Update install Istio GH action script

Use Istio 1.16 for testing

Signed-off-by: Apostolos Gerakaris <[email protected]>

* common: Remove PodDisruptionBudget resources for Istio

Istio 1.6.0 generated manifests include some policy/v1
PodDisruptionBudget resources that we need to remove. See:
- istio/istio#12602
- istio/istio#24000

The current manifests utilize two "delete" patches:
- common/istio-1-16/istio-install/base/patches/remove-pdb.yaml
- common/istio-1-16/cluster-local-gateway/base/patches/remove-pdb.yaml

However these patches do not work with kustomize v3.2.0. The root
cause is that v3.2.0 doesn't have the appropriate openapi schema for
the policy/v1 API version resources. This is fixed in kustomize v4+.
See:
- kubernetes-sigs/kustomize#3694 (comment)
- kubernetes-sigs/kustomize#4495

Changes:
- We disable the delete patches until we upgrade to kustomize v4+.
  - tracked in: kubeflow#1797

- As a temporary workaraound we remove PodDisruptionBudget resources
  manually with yq before deploying Istio manifests.

- Update README file with instructions.

Refs: kubeflow#2325

Signed-off-by: Apostolos Gerakaris <[email protected]>

* Update README

Use the --cluster-specific flag when generating Istio 1.16 manifests
for K8s-1.25. See:
* istio/istio#41220

Signed-off-by: Apostolos Gerakaris <[email protected]>

* tests: Update GH Action workflows

Trigger the workflows when the Istio version changes

Signed-off-by: Apostolos Gerakaris <[email protected]>

Signed-off-by: Apostolos Gerakaris <[email protected]>
@juliusvonkohout
Copy link
Member

We are now at kustomize 5

/close

There has been no activity for a long time. Please reopen if necessary.

@google-oss-prow
Copy link

@juliusvonkohout: Closing this issue.

In response to this:

We are now at kustomize 5

/close

There has been no activity for a long time. Please reopen if necessary.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants