Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support object count quota for custom resources #64201

Closed
wants to merge 4 commits into from

Conversation

nikhita
Copy link
Member

@nikhita nikhita commented May 23, 2018

Fixes #53777
Fixes #59826
Draws a lot of inspiration from #47665

The resource quota controller now uses a dynamic client and a rest mapper to understand custom resources created via CRDs and aggregated apiservers.

To calculate the quota usage, the quota controller needs informers and listers.

  • A new shared informer is created for custom resources.
  • The controller also uses a object count quota evaluator to calculate the usage. The evaluator uses a lister to list the resources. So a new lister is created (by creating an indexer) for the custom resource.

Please note that the resource quota controller syncs every 30 seconds, so it might take some time for the quota status to be updated after the CRD is created. Until the quota status is updated and synced, we cannot create any custom resources.

Also, the custom resources created need to be namespaced to work with the quota controller.

Release note:

Object count quota is now supported for namespaced custom resources using the `count/<resource>.<group>` syntax.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 23, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nikhita
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: deads2k

Assign the PR to them by writing /assign @deads2k in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nikhita
Copy link
Member Author

nikhita commented May 23, 2018

In action with kubectl:

# create the object count resource quota
$ kubectl create quota test --hard=count/crontabs.example.com=2
resourcequota/test created

# the usage will not be reflected till the CRD is created
$ kubectl get quota test -oyaml
apiVersion: v1
kind: ResourceQuota
metadata:
  creationTimestamp: 2018-05-23T13:39:25Z
  name: test1
  namespace: default
  resourceVersion: "319"
  selfLink: /api/v1/namespaces/default/resourcequotas/test1
  uid: b5826694-5e8e-11e8-8ec1-54e1ad6c2d05
spec:
  hard:
    count/crontabs.example.com: "2"
status:
  hard:
    count/crontabs.example.com: "2"

# create the CRD and wait for sometime (the crd should be available via discovery and the quota controller needs to sync)
$ kubectl create -f crd.yaml
customresourcedefinition.apiextensions.k8s.io/crontabs.example.com created

# the quota now contains the usage
$ kubectl get quota test -oyaml
apiVersion: v1
kind: ResourceQuota
metadata:
  creationTimestamp: 2018-05-23T13:39:37Z
  name: test
  namespace: default
  resourceVersion: "651"
  selfLink: /api/v1/namespaces/default/resourcequotas/test
  uid: bc8d1a09-5e8e-11e8-8ec1-54e1ad6c2d05
spec:
  hard:
    count/crontabs.example.com: "2"
status:
  hard:
    count/crontabs.example.com: "2"
  used:
    count/crontabs.example.com: "0"

# create custom resources
$ kubectl create -f cr1.yaml
crontab.example.com/my-new-cron-object-1 created

$ kubectl create -f cr2.yaml
crontab.example.com/my-new-cron-object-2 created

$ kubectl create -f cr3.yaml
Error from server (Forbidden): error when creating "cr3.yaml": crontabs.example.com "my-new-cron-object-3" is forbidden: exceeded quota: test, requested: count/crontabs.example.com=1, used: count/crontabs.example.com=2, limited: count/crontabs.example.com=2

@nikhita
Copy link
Member Author

nikhita commented May 23, 2018

@nikhita
Copy link
Member Author

nikhita commented May 23, 2018

/sig api-machinery
/sig scheduling
/area custom-resources

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. area/custom-resources labels May 23, 2018
@nikhita nikhita force-pushed the cr-quota branch 2 times, most recently from 0756a8e to 3b592a8 Compare May 24, 2018 06:03
nikhita added 2 commits May 24, 2018 11:36
The resource quota controller now uses a dynamic client
and a rest mapper to understand custom resources created
via CRDs and aggregated apiservers.

It now uses shared informers. It uses the existing known
informer for known resources and creates a new shared informer
for custom resources.

To calculate the usage in the quota, it needs to create a
new object count evaluator for the custom resource. A evaluator
contains a lister. So a new lister is created (by creating an indexer)
for the custom resource. This lister is then used to create the evaluator.

Please note that the resource quota controller syncs every 30 seconds,
so it might take some time for the quota status to be updated after the
CRD is created. Until the quota status is updated and synced, we cannot
create any custom resources.
// like those created via CRDs or aggregated apiservers.
DynamicClient dynamic.Interface
// RESTMapper can reset itself from discovery.
RESTMapper resettableRESTMapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the underlying impl for this threadsafe? We use a generally dynamic restmapper in the controller context, but I don't know if it is designed to tolerate concurrent resets.

Copy link
Member Author

@nikhita nikhita May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the underlying impl for this threadsafe?

Yes.

// Reset resets the internally cached Discovery information and will
// cause the next mapping request to re-discover.
func (d *DeferredDiscoveryRESTMapper) Reset() {
glog.V(5).Info("Invalidating discovery information")
d.initMu.Lock()
defer d.initMu.Unlock()
d.cl.Invalidate()
d.delegate = nil
}

replenishmentFunc: rq.replenishQuota,
registry: rq.registry,
informersStarted: options.InformersStarted,
ignoredResources: options.IgnoredResourcesFunc(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becomes interesting.

// TODO we either find a way to discover this or find a way to provide it via config

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #64310 to allow it to be provided via config (same as GC)

// Resetting the REST mapper will also invalidate the underlying discovery
// client. This is a leaky abstraction and assumes behavior about the REST
// mapper, but we'll deal with it for now.
rq.restMapper.Reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we passed a restmapper that was already being periodically refreshed. Why do we need to punch this out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do, outside of the consuming controllers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already being periodically refreshed

The Sync is supposed to periodically refresh the restmapper with new discovery information.

The reset occurs within this Sync function.

newResources, err := GetQuotableResources(discoveryFunc)
if err != nil {
utilruntime.HandleError(err)
newResources := GetQuotableResources(discoveryClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems like you really want to flush your discovery cache and it doesn't have to do with the restmapper

}

// ServeHTTP logs the action that occurred and always returns the associated status code
func (f *fakeActionHandler) ServeHTTP(response http.ResponseWriter, request *http.Request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a lot of mock I didn't expect to see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the listWatch test doesn't add anything here. That certainly can go, removing some of the mock code.

Copy link
Member Author

@nikhita nikhita May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the listWatch test, along with some others. Weren't really adding too much. This also removed most of the mock code.

Please see this commit: 9247d03. If removing those tests seems fine, I'll squash.

@lavalamp
Copy link
Member

cc @yliaog

@lavalamp
Copy link
Member

cc @mbohlool

}
}

type fakeServerResources struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fakeDiscoveryClient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the mock code from the quota test but fixed this in the GC test


// TestQuotaControllerSync ensures that a discovery client error
// will not cause the quota controller to block infinitely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not: no new line

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 25, 2018

@nikhita: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-kubemark-e2e-gce-big d6d5197 link /test pull-kubernetes-kubemark-e2e-gce-big
pull-kubernetes-e2e-kops-aws d6d5197 link /test pull-kubernetes-e2e-kops-aws
pull-kubernetes-e2e-gce d6d5197 link /test pull-kubernetes-e2e-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@nikhita
Copy link
Member Author

nikhita commented May 25, 2018

For the record: we are not going to target this for 1.11. Since this ends up creating a new informer and lister for custom resources, we are increasing the number of times we cache resources (GC does the same as well).

This is solvable if we have something like dynamic informers and listers. We will first focus on creating them and then refocus on the quota and GC controller. See #64310 (comment) and #64319 for more details.

@k8s-ci-robot
Copy link
Contributor

@nikhita: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 5, 2018
@yliaog
Copy link
Contributor

yliaog commented Oct 1, 2018

/cc @yliaog

@k8s-ci-robot k8s-ci-robot requested a review from yliaog October 1, 2018 21:32
@nikhita
Copy link
Member Author

nikhita commented Oct 3, 2018

I'm going to close this PR and reopen a proper fix once @p0lyn0mial's PR (#69308) gets in.

@nikhita nikhita closed this Oct 3, 2018
@p0lyn0mial
Copy link
Contributor

I'm going to close this PR and reopen a proper fix once @p0lyn0mial's PR (#69308) gets in.

@nikhita done :)

@raushan2016
Copy link

Is this feature now supported. Where i want to limit number of instance of a CRD created in a namespace like number of knative services created in a namespace ?

@liggitt
Copy link
Member

liggitt commented May 23, 2019

Support is merged in master and will be part of 1.15

@raushan2016
Copy link

@liggitt can you point me to sample/doc about how to use it for CRD objects

@liggitt
Copy link
Member

liggitt commented May 23, 2019

the docs at https://kubernetes.io/docs/concepts/policy/resource-quotas/#object-count-quota are accurate

in 1.15+, to quota a custom resource "widgets" in the "example.com" API group, you would use count/widgets.example.com

@liggitt
Copy link
Member

liggitt commented May 23, 2019

opened doc PR for the 1.15 branch in kubernetes/website#14492

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/custom-resources cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource Quota controller should allow discover-resources errors quota support for custom resources
10 participants