Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protect admission against empty keys #5900

Conversation

dbenque
Copy link
Contributor

@dbenque dbenque commented Jun 28, 2023

Which component this PR applies to?

vertical-pod-autoscaler/admission-controller

What type of PR is this?

/kind bug

What this PR does / why we need it:

Using external recommender that was coming with a bug, we ended up with a VPA object that was presenting a recommendation for a resource that has an empty name:

 resources:
          limits:
            cpu: "1"
            memory: 3Gi
          requests:
            "": "0"              <== empty key
            cpu: "1"
            ephemeral-storage: 10Gi
            memory: 3Gi

In this PR we are proposing to sanitise the recommendation by removing any resource key that would be empty so that the admission is not pushing "garbage" to the pods.

Possible extension: instead of purging empty key, we could have a list of allowed resourceName given as input parameter of the admission. This would also allow us to purge unknown resource or the one coming with typo.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

If a recommendation comes with a resource key that is empty, this resource will be ignored in the admission-controller.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

domenicbozzuto and others added 16 commits May 17, 2023 16:02
The provisioning state reflects the status of the last provisioning
action, which means the instance can enter a failed state after it's
running. Protect against unnecessary scaledowns by checking the
power state to avoid scaling down running VMs
- this taint leads to unexpected behavior
- users expect CA to consider the taint when autoscaling

Signed-off-by: vadasambar <[email protected]>
- happens when CA tries to check if the unmanaged fargate node is a part of ASG (it isn't)
- and keeps on logging error
Signed-off-by: vadasambar <[email protected]>
Signed-off-by: vadasambar <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/vertical-pod-autoscaler labels Jun 28, 2023
@k8s-ci-robot k8s-ci-robot requested review from jbartosik and kgolab June 28, 2023 09:39
kawych and others added 3 commits June 28, 2023 10:04
Previous "CropNodes" function of ScaleDownBudgetProcessor had an
assumption that atomically-scaled node groups should be classified as
"empty" or "drain" as a whole, however Cluster Autoscaler may classify
some of the nodes from a single group as "empty" and other as "drain".
update agnhost image to pull from registry.k8s.io
Copy link
Collaborator

@jbartosik jbartosik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked for some changes. I think we should allow only cpu and memory

Also I'd add a short release note

@@ -89,6 +90,16 @@ func Resources(cpu, mem string) apiv1.ResourceList {
return result
}

// AddResource add a resource to the given resource list
func AddResource(rl apiv1.ResourceList, resourceName apiv1.ResourceName, value string) apiv1.ResourceList {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only one place we're using this function. Please move it to vertical-pod-autoscaler/pkg/utils/test/test_recommendation.go and unexport it (rename to addResource)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -55,6 +57,15 @@ func (b *recommendationBuilder) WithTarget(cpu, memory string) RecommendationBui
return &c
}

func (b *recommendationBuilder) WithResource(resource apiv1.ResourceName, value string) RecommendationBuilder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (b *recommendationBuilder) WithResource(resource apiv1.ResourceName, value string) RecommendationBuilder {
func (b *recommendationBuilder) WithTargetResource(resource apiv1.ResourceName, value string) RecommendationBuilder {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -131,6 +133,12 @@ func (b *verticalPodAutoscalerBuilder) WithTarget(cpu, memory string) VerticalPo
return &c
}

func (b *verticalPodAutoscalerBuilder) WithResourceInTarget(resource core.ResourceName, value string) VerticalPodAutoscalerBuilder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it the same as the function in vertical-pod-autoscaler/pkg/utils/test/test_recommendation.go

Also please add a TODO, it looks like verticalPodAutoscalerBuilder (here) and recommendationBuilder (in test_recommendation.go) could use some deduplication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

jbartosik and others added 4 commits June 28, 2023 16:01
Generated by runing:

```
go mod tidy
go mod vendor
```
…r-cleanup

Replace `BuildTestContainer` with use of builder
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 29, 2023
k8s-ci-robot and others added 12 commits August 22, 2023 15:45
Allow using an externally created secret instead of using the one the Helm chart creates
fix: Broken links to testgrid dashboard
…ial-backoff

feat(hetzner): use less requests while waiting for server create
…proposal-5700

[addon-resizer] docs: add KEP to add nanny configuration automatic reload.
update RBAC to only use verbs that exist for the resources
…re-s390x-arch

CA - Git ignore s390x arch binaries
…mp-pod-scaleup

Add support to filter out pods being deleted.
…recommendation/recommendation_provider_test.go

Co-authored-by: Joachim <[email protected]>
@dbenque dbenque requested a review from jbartosik August 28, 2023 09:28
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/addon-resizer and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 28, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dbenque
Once this PR has been reviewed and has the lgtm label, please assign mwielgus for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dbenque
Copy link
Contributor Author

dbenque commented Aug 28, 2023

Closing this PR due to a bad rebase that makes the history not clear.

Taking all the relevant changes to a new clean branch:
#6062

@dbenque dbenque closed this Aug 28, 2023
@@ -311,8 +311,7 @@ func TestUpdateResourceRequests(t *testing.T) {
return
}

_, foundEmpty := resources[0].Requests[""]
assert.Equal(t, foundEmpty, false, "empty resourceKey have not been purged")
assert.Contains(t, resources, "", "expected empty resource to be removed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is closed but isn't this check kind of opposite of the old one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/addon-resizer area/balancer area/cluster-autoscaler area/helm-charts area/provider/aws Issues or PRs related to aws provider area/provider/azure Issues or PRs related to azure provider area/provider/gce area/provider/hetzner Issues or PRs related to Hetzner provider area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.