Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[common-go] Add keyed gRPC rate limits #9547

Merged
merged 3 commits into from
Apr 29, 2022
Merged

[common-go] Add keyed gRPC rate limits #9547

merged 3 commits into from
Apr 29, 2022

Conversation

csweichel
Copy link
Contributor

Description

This PR adds support for message-keyed gRPC rate limits to common-go.
Using this functionality, we can e.g. add per-user rate limits in ws-manager.

For example:

{
    "rateLimits": {
        "/wsman.WorkspaceManager/StartWorkspace": {
            "bucketSize": 5,
            "refillInterval": "1s",
            "key": "metadata.owner",
            "keyCacheSize": 200
        }
    }
}

would impose a 5 workspaces per second rate limit per workspace owner.

How to test

  • See unit test
  • Manually add the config to ws-manager and use loadgen

Release Notes

[ws-manager] Support more fine-grained rate limits

test/go.sum Outdated Show resolved Hide resolved
test/go.mod Outdated Show resolved Hide resolved
@csweichel csweichel force-pushed the cw/grpc-rl-fields branch 2 times, most recently from f57737e to 681fb56 Compare April 26, 2022 10:29
@csweichel
Copy link
Contributor Author

csweichel commented Apr 26, 2022

/werft run with-clean-slate

👍 started the job as gitpod-build-cw-grpc-rl-fields.4
(with .werft/ from main)

@csweichel csweichel marked this pull request as ready for review April 27, 2022 07:29
@csweichel csweichel requested a review from a team April 27, 2022 07:29
@csweichel csweichel requested a review from sagor999 as a code owner April 27, 2022 07:29
@csweichel csweichel requested a review from a team April 27, 2022 07:29
@csweichel csweichel requested a review from geropl as a code owner April 27, 2022 07:29
@csweichel csweichel requested a review from a team April 27, 2022 07:29
@github-actions github-actions bot added team: IDE team: webapp Issue belongs to the WebApp team team: workspace Issue belongs to the Workspace team labels Apr 27, 2022
@easyCZ
Copy link
Member

easyCZ commented Apr 27, 2022

https://github.com/grpc-ecosystem/go-grpc-middleware offers a rate limit implementation, this may be possibly cheaper to use than building our own.

I've used it in the past and worked well. It has built in support for sharing the rate limits across instances (but can be used without it).

@csweichel
Copy link
Contributor Author

csweichel commented Apr 27, 2022

https://github.com/grpc-ecosystem/go-grpc-middleware offers a rate limit implementation, this may be possibly cheaper to use than building our own.

I've used it in the past and worked well. It has built in support for sharing the rate limits across instances (but can be used without it).

We looked at that implementation prior to rolling our own.
AFAIK it does not support

  • per method rate limiting
  • "keyed" rate limiting as is introduced in this PR.

We use the exact same rate limiting package as go-grpc-middleware under the hood (basically took inspiration from that package), hence could also support rate limiting across multiple instances.

Copy link
Member

@easyCZ easyCZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, left a couple comments. Could you please also add metrics to expose the following:

  1. Rate limited count, by method
  2. Hit rate on the LRU

Can be in seperate follow-up PRs, but we should be able to observe this before we deploy it to prod.

// RatelimitingInterceptor limits how often a gRPC function may be called. If the limit has been
// exceeded, we'll return resource exhausted.
type RatelimitingInterceptor map[string]ratelimitedFunction
type RatelimitingInterceptor map[string]*ratelimitedFunction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map here should probably be made private. As implemented, updates to the map are not concurrency safe so we shouldn't be exposing it.


GlobalLimit *rate.Limiter
Key keyFunc
KeyedLimit *lru.Cache
}

// UnaryInterceptor creates a unary interceptor that implements the rate limiting
func (r RatelimitingInterceptor) UnaryInterceptor() grpc.UnaryServerInterceptor {
return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (resp interface{}, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (resp interface{}, err error) {
return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {

we're not using the named returns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they indicate what's returned though :)

return nil, err
}

f.KeyedLimit.ContainsOrAdd(key, f.RateLimit.Limiter())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: this immediately throws creates and trashes a limiter struct on stack AFAIU which could be avoided by not using ContainsOrAdd.

Copy link
Contributor Author

@csweichel csweichel Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - not doing it this way requires extra locking because this operation can happen concurrently.

We could also ignore the concurrency for the time being and go with a Peek followed by an Add, running the risk to mis-account for a key during very high call-rate situations.

Copy link
Member

@geropl geropl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, and seems to work. Left minor comments.

@roboquat roboquat removed the size/L label Apr 28, 2022
@csweichel
Copy link
Contributor Author

Added metrics

}

// NewRatelimitingInterceptor creates a new rate limiting interceptor
func NewRatelimitingInterceptor(f map[string]RateLimit) RatelimitingInterceptor {
funcs := make(map[string]ratelimitedFunction, len(f))
callCounter := prometheus.NewCounterVec(prometheus.CounterOpts{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the metrics here also need to be registered, to be picked up by prometheus, otherwise they are just local counters which won't show up on /metrics endpoint.

See https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#Register

Copy link
Member

@easyCZ easyCZ Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I'd actually hoist the counter definition outside of this instantiation and make it package private variable. You can use https://pkg.go.dev/github.com/prometheus/client_golang/prometheus/promauto to simplify the registration.

This will work because the metrics scraper also adds labels for the pod/service so we can differentiate the various metrics when querying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It registers them here:

metrics.Registry.MustRegister(ratelimits)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, missed this. Normally, I'd expect the registration to happen in this file and automatically without requiring an extra call. The risk here is that a component uses the rate limiting interceptor but doesn't register the metric.

The other issue is that this would panic on startup, but CI doesn't catch this.

Copy link
Member

@easyCZ easyCZ Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to address this would be accept the metrics registry when constructing the interceptor. This would signal to the caller that they control the registry and need to register it.

It would also make it possible to inject a registry in tests which would allow us to test the metrics, if we wanted to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a while now we've adopted the pattern to make things implement the prometheus.Collector interface so that they can be registered.
Testing is just as easy as passing in the registry, but it doesn't mandate that someone uses metrics.
The "registration error at runtime" behaviour is also the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'd personally like the need for registration to be more explicit (and to use DI to pass the registry in) but given this is an existing pattern, better to stick to it than introduce yet another. Thanks for the info.

cacheHitCounter := prometheus.NewCounterVec(prometheus.CounterOpts{
Namespace: "grpc",
Subsystem: "server",
Name: "rate_limiter_calls_total",
Copy link
Member

@easyCZ easyCZ Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste error with the name in callCounter. Once the metrics are registered (other comment), this would cause a panic/error.

Copy link
Contributor

@sagor999 sagor999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just need to fix metrics name copy paste error.

}

// NewRatelimitingInterceptor creates a new rate limiting interceptor
func NewRatelimitingInterceptor(f map[string]RateLimit) RatelimitingInterceptor {
funcs := make(map[string]ratelimitedFunction, len(f))
callCounter := prometheus.NewCounterVec(prometheus.CounterOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It registers them here:

metrics.Registry.MustRegister(ratelimits)

@sagor999
Copy link
Contributor

/hold to make sure metrics name gets fixed before merge. feel free to remove hold once addressed.

@csweichel
Copy link
Contributor Author

fixed the metrics name

@csweichel
Copy link
Contributor Author

/hold cancel

Copy link
Contributor

@felladrin felladrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the code, and also took into account other reviewers comments and the test case. ✅

@roboquat roboquat merged commit ff6c381 into main Apr 29, 2022
@roboquat roboquat deleted the cw/grpc-rl-fields branch April 29, 2022 10:31
@roboquat roboquat added the deployed: webapp Meta team change is running in production label May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: webapp Meta team change is running in production release-note size/XL team: IDE team: webapp Issue belongs to the WebApp team team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants