Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up helm chart + kustomize overlays #797

Closed
wongma7 opened this issue Mar 15, 2021 · 22 comments · Fixed by #856
Closed

Clean up helm chart + kustomize overlays #797

wongma7 opened this issue Mar 15, 2021 · 22 comments · Fixed by #856
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@wongma7
Copy link
Contributor

wongma7 commented Mar 15, 2021

/kind bug

  • snapshot is not alpha anymore
  • the snapshot crd is just randomly in cluster dir (but not present in helm chart? should add it to helm first and then generate the kustomize from helm)
  • file naming doesn't match the helm template files
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 15, 2021
@ayberk
Copy link
Contributor

ayberk commented Mar 15, 2021

+1. We also need to go over the values.yaml to see if we can restructure it to be more consistent. For example controller values are at top level but daemonset values are under node:.

@wongma7
Copy link
Contributor Author

wongma7 commented Mar 16, 2021

  • make tolerations consistent. tolerateAllTaints should deefault false and
    • key: CriticalAddonsOnly
      operator: Exists
      - operator: Exists
      effect: NoExecute
      tolerationSeconds: 300

should be the default everwyerhe

@krmichel
Copy link
Contributor

krmichel commented Mar 19, 2021

I can take a look at this. I will likely need some info from someone more familiar with the driver itself. I don't believe I have the necessary permissions to assign myself this issue. I can suck in several of the other helm issues while looking at this as well.

@ayberk
Copy link
Contributor

ayberk commented Mar 19, 2021

Thanks @krmichel. Feel free to reach out to me or @wongma7 if you have any questions.

/assign @krmichel

@krmichel
Copy link
Contributor

@ayberk @wongma7 I can make the values file more consistent but it will requiring a breaking change and that people will have to update their values files before upgrading to the new chart version. I just want to make sure that is okay before I make those changes.

@ayberk
Copy link
Contributor

ayberk commented Mar 22, 2021

Ideally we should have at least one version where we support both the old fields and the new fields, with new fields being prioritized. Then we can remove the old values in the next update. Is that feasible? I am not sure how big the changes are so the templates might get spaghetti real fast :/

@krmichel
Copy link
Contributor

krmichel commented Mar 25, 2021

@ayberk @wongma7 Here is the list of other issues I have looked at and think I can incorporate into the PR for this issue. I will likely need some advice on dealing with the CRDs. I will also need some information about what to call the organization on artifacthub.io when registering this project (They require a name for the org and can optionally have display name, home URL, and description). Let me know if you see any issues with my selections or any other information you think would relevant. Is this project still supporting helm v2 or should we bump the api version in the Chart.yaml?

Issue #448
ability to change kubelet default path

Issue #758
Issue #672
toleration issues for things like eviction an fargte, etc.

Issue #746
Migrate to non deprecated apiVersions on Helm Chart

Issue #722
AWS EBS CSI Driver Helm chart to inject environment variables

Issue #512
PodDisruptionBudget needed for autoscaling

@krmichel
Copy link
Contributor

krmichel commented Mar 25, 2021

Ideally we should have at least one version where we support both the old fields and the new fields, with new fields being prioritized. Then we can remove the old values in the next update. Is that feasible? I am not sure how big the changes are so the templates might get spaghetti real fast :/

I can do this in the chart. Do you want an additional PR that does the second part of the move to new fields?

Also, there are a fair number of values that are currently shared between the controller and the snapshot controller. Is that desired or should those be separated while we are making these changes?

@krmichel
Copy link
Contributor

krmichel commented Mar 25, 2021

  • make tolerations consistent. tolerateAllTaints should deefault false and

    • key: CriticalAddonsOnly
      operator: Exists
      • operator: Exists
        effect: NoExecute
        tolerationSeconds: 300

should be the default everwyerhe

@wongma7 It seems to me like we should leave tolerateAllTaints on for the node as that is a fairly common toleration for daemonsets and just get rid of that functionality for the controller and the snapshot controller. Both of the controllers have the ability to add arbitrary tolerations anyway and if someone really wants that they can just add it back. It isn't really a normal toleration to have on something that isn't a daemonset and causes issues with draining nodes and evictions. Thoughts?

@ayberk
Copy link
Contributor

ayberk commented Mar 25, 2021

@krmichel Really appreciate the help here.

I will also need some information about what to call the organization on artifacthub.io when registering this project

Let's skip the artifacthub for now because there might be some internal approval requirements. I can follow up on that.

Is this project still supporting helm v2 or should we bump the api version in the Chart.yaml?

Hmmm. I'm pretty sure we've been using helm v3 exclusively, so we can bump it.

Do you want an additional PR that does the second part of the move to new fields?

If in the first PR if you can add the new fields and give them priority over the old ones, that'd be great. Then in the next PR, with a new release, we can eventually remove the old fields. Does that answer your question?

Also, there are a fair number of values that are currently shared between the controller and the snapshot controller. Is that desired or should those be separated while we are making these changes?

They should be separated. Recommendation from the upstream is deploying the snapshot controller with your Kubernetes distribution, so we will probably just delete the snapshot controller from the driver. It's better if we split everything now.

@ayberk
Copy link
Contributor

ayberk commented Mar 25, 2021

On 2nd thought maybe we shouldn't split the snapshot controller because it might end up being a throw-away work.

@krmichel
Copy link
Contributor

@ayberk I assigned Issue #586 to myself so when you get the information for artifacthub.io you can reply on that issue.

@krmichel
Copy link
Contributor

I will bump the helm api in the second PR where I remove the deprecated values since it will be a breaking change anyway.

@krmichel
Copy link
Contributor

@wongma7 @ayberk I need to know whether to remove the snapshot controller or to add the CRDs. It seems to me like removing the snapshot controller is the better course of action. One issue would be that If we add the CRDs to the chart I can't make it so they would only get installed if the snapshotting is enabled. CRDs aren't templated so they would just install if we added them (unless the --skip-crds flag is used). They would continually get out of date. It also seems a little strange to install resources that aren't really part of the project. If that project had a chart it could be included as a conditional dependency but it doesn't look like they have a chart. Let me know how to proceed.

@krmichel
Copy link
Contributor

krmichel commented Mar 26, 2021

  • file naming doesn't match the helm template files

@wongma7 Is this indicating to rename the kustomize files to have the same name as the files from the helm chart that generate them?

@krmichel
Copy link
Contributor

  • snapshot is not alpha anymore

@wongma7 I assume this means to move the snapshot related kustomize files from alpha to stable? Is that correct? Also, is the resizer still in alpha or should it move too?

@krmichel
Copy link
Contributor

krmichel commented Mar 26, 2021

While we are splitting things out do we want to have different resources for each of the containers in the controller or continue to have them all use the same values? Same question for the containers in the node. I don't know a lot about the resource requirements of the various images, but it does seem unlikely that a side car image for liveness would need the same amount of resources as the controller itself.

@ayberk
Copy link
Contributor

ayberk commented Mar 26, 2021

We should just get rid of the snapshot controller honestly. We just need to update the docs to point where to find the installation steps. This will most certainly break some customers, but we have the pull the plug eventually.

@krmichel Let's not change what we have right now. One step at a time. We'll have perf testing later, we can adjust them then.

@krmichel
Copy link
Contributor

I can remove the snapshot controller when I remove the old values and bump the helm api. I will create an issue for round two of these changes.

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 27, 2021

@krmichel sorry I missed your questions.

What I meant by "snapshot is not alpha anymore" is we should not gate its installation by a flag, and it should not be in the alpha kustomize overlay, it should be installed by default.

As for snapshot controller and its associated CRDs, if there is a way to install it only if it DNE then I would like that. I think the vast majority of users will appreciate it being installed together with the driver, even if technically it can be installed separately.

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 27, 2021

(sorry, just catching up with past discussion).

However I am fine if we postpone the snapshot controller/CRD question for later. Because currently we don't do it, if we start doing it then we are setting the expectation that we will always do it. BTW, I was planning to do a refactor similar to kubernetes-sigs/aws-efs-csi-driver#406 taht more clearly splits up the controller values and node ones, but haven't found the time yet.

@wongma7
Copy link
Contributor Author

wongma7 commented May 21, 2021

latest discussion on enableVolumeSnapshot:
#856 (comment)

maybe we should deprecate & rename it to installSnapshotController . on one hand I have heard argument that snapshot controller should be pre-installed by the k8s distro/vendor or cluster admin, because it is a reusable component for multiple csi drivers not specific to ebs, so we should not even install it. on other, I know for a fact that EKS doesn't do that : ) and so users should at least have option to deploy the snapshot controller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants