-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[common-go] Add keyed gRPC rate limits #9547
Conversation
2f17da9
to
a467582
Compare
f57737e
to
681fb56
Compare
/werft run with-clean-slate 👍 started the job as gitpod-build-cw-grpc-rl-fields.4 |
681fb56
to
cd88685
Compare
https://github.com/grpc-ecosystem/go-grpc-middleware offers a rate limit implementation, this may be possibly cheaper to use than building our own. I've used it in the past and worked well. It has built in support for sharing the rate limits across instances (but can be used without it). |
We looked at that implementation prior to rolling our own.
We use the exact same rate limiting package as go-grpc-middleware under the hood (basically took inspiration from that package), hence could also support rate limiting across multiple instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, left a couple comments. Could you please also add metrics to expose the following:
- Rate limited count, by method
- Hit rate on the LRU
Can be in seperate follow-up PRs, but we should be able to observe this before we deploy it to prod.
// RatelimitingInterceptor limits how often a gRPC function may be called. If the limit has been | ||
// exceeded, we'll return resource exhausted. | ||
type RatelimitingInterceptor map[string]ratelimitedFunction | ||
type RatelimitingInterceptor map[string]*ratelimitedFunction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map here should probably be made private. As implemented, updates to the map are not concurrency safe so we shouldn't be exposing it.
|
||
GlobalLimit *rate.Limiter | ||
Key keyFunc | ||
KeyedLimit *lru.Cache | ||
} | ||
|
||
// UnaryInterceptor creates a unary interceptor that implements the rate limiting | ||
func (r RatelimitingInterceptor) UnaryInterceptor() grpc.UnaryServerInterceptor { | ||
return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (resp interface{}, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (resp interface{}, err error) { | |
return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) { |
we're not using the named returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they indicate what's returned though :)
return nil, err | ||
} | ||
|
||
f.KeyedLimit.ContainsOrAdd(key, f.RateLimit.Limiter()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: this immediately throws creates and trashes a limiter struct on stack AFAIU which could be avoided by not using ContainsOrAdd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - not doing it this way requires extra locking because this operation can happen concurrently.
We could also ignore the concurrency for the time being and go with a Peek
followed by an Add
, running the risk to mis-account for a key during very high call-rate situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, and seems to work. Left minor comments.
Added metrics |
} | ||
|
||
// NewRatelimitingInterceptor creates a new rate limiting interceptor | ||
func NewRatelimitingInterceptor(f map[string]RateLimit) RatelimitingInterceptor { | ||
funcs := make(map[string]ratelimitedFunction, len(f)) | ||
callCounter := prometheus.NewCounterVec(prometheus.CounterOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the metrics here also need to be registered, to be picked up by prometheus, otherwise they are just local counters which won't show up on /metrics
endpoint.
See https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#Register
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I'd actually hoist the counter definition outside of this instantiation and make it package private variable. You can use https://pkg.go.dev/github.com/prometheus/client_golang/prometheus/promauto to simplify the registration.
This will work because the metrics scraper also adds labels for the pod/service so we can differentiate the various metrics when querying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It registers them here:
gitpod/components/ws-manager/cmd/run.go
Line 122 in 29ba28f
metrics.Registry.MustRegister(ratelimits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, missed this. Normally, I'd expect the registration to happen in this file and automatically without requiring an extra call. The risk here is that a component uses the rate limiting interceptor but doesn't register the metric.
The other issue is that this would panic on startup, but CI doesn't catch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to address this would be accept the metrics registry when constructing the interceptor. This would signal to the caller that they control the registry and need to register it.
It would also make it possible to inject a registry in tests which would allow us to test the metrics, if we wanted to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a while now we've adopted the pattern to make things implement the prometheus.Collector
interface so that they can be registered.
Testing is just as easy as passing in the registry, but it doesn't mandate that someone uses metrics.
The "registration error at runtime" behaviour is also the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'd personally like the need for registration to be more explicit (and to use DI to pass the registry in) but given this is an existing pattern, better to stick to it than introduce yet another. Thanks for the info.
cacheHitCounter := prometheus.NewCounterVec(prometheus.CounterOpts{ | ||
Namespace: "grpc", | ||
Subsystem: "server", | ||
Name: "rate_limiter_calls_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-paste error with the name in callCounter
. Once the metrics are registered (other comment), this would cause a panic/error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just need to fix metrics name copy paste error.
} | ||
|
||
// NewRatelimitingInterceptor creates a new rate limiting interceptor | ||
func NewRatelimitingInterceptor(f map[string]RateLimit) RatelimitingInterceptor { | ||
funcs := make(map[string]ratelimitedFunction, len(f)) | ||
callCounter := prometheus.NewCounterVec(prometheus.CounterOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It registers them here:
gitpod/components/ws-manager/cmd/run.go
Line 122 in 29ba28f
metrics.Registry.MustRegister(ratelimits) |
/hold to make sure metrics name gets fixed before merge. feel free to remove hold once addressed. |
29ba28f
to
264ab74
Compare
fixed the metrics name |
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the code, and also took into account other reviewers comments and the test case. ✅
Description
This PR adds support for message-keyed gRPC rate limits to common-go.
Using this functionality, we can e.g. add per-user rate limits in ws-manager.
For example:
would impose a 5 workspaces per second rate limit per workspace owner.
How to test
Release Notes