-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relay rate limits #906
Relay rate limits #906
Conversation
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
relay/config.go
Outdated
// MaxKeysPerGetChunksRequest is the maximum number of keys that can be requested in a single GetChunks request. | ||
// Default is 1024. // TODO should this be the max batch size? What is that? | ||
MaxKeysPerGetChunksRequest int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ian-shim what is the maximum number of keys we should permit in a single request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can start with something like 512? It will probably hit the grpc limit anyways if it goes beyond that.
Actually let's leave this with 1024 and talk about this offline.
@@ -177,7 +177,7 @@ func buildChunkStore(t *testing.T, logger logging.Logger) (chunkstore.ChunkReade | |||
|
|||
func randomBlob(t *testing.T) (*v2.BlobHeader, []byte) { | |||
|
|||
data := tu.RandomBytes(128) | |||
data := tu.RandomBytes(225) // TODO talk to Ian about this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ian-shim a blob size of 128 bytes results in 5 commitment length of 5, which the method to determine the length of a chunk says is illegal because it's not a power of 2. Can you explain this to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a requirement v2 started imposing - to force the blob length to be a power of 2. GetCommitments
method should be refactored so that it automatically outputs a length in power of 2. @mooselumph is working on that.
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
relay/config.go
Outdated
MaxKeysPerGetChunksRequest int | ||
|
||
// RateLimits contains configuration for rate limiting. | ||
RateLimits limiter.Config | ||
} | ||
|
||
// DefaultConfig returns the default configuration for the relay Server. | ||
func DefaultConfig() *Config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we move this to test file? actual default values for optional fields can be set in flags.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
relay/config.go
Outdated
// MaxKeysPerGetChunksRequest is the maximum number of keys that can be requested in a single GetChunks request. | ||
// Default is 1024. // TODO should this be the max batch size? What is that? | ||
MaxKeysPerGetChunksRequest int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can start with something like 512? It will probably hit the grpc limit anyways if it goes beyond that.
Actually let's leave this with 1024 and talk about this offline.
relay/limiter/blob_rate_limiter.go
Outdated
return fmt.Errorf("global concurrent request limit exceeded for getBlob operations, try again later") | ||
} | ||
|
||
allowed := l.opLimiter.AllowN(now, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this different from opLimiter.Allow()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference is that the time is supplied from the outside context. This allows me to write unit tests that aren't wall clock dependent.
relay/limiter/blob_rate_limiter.go
Outdated
countInFlight := l.operationsInFlight.Add(1) | ||
if countInFlight > int64(l.config.MaxConcurrentGetBlobOps) { | ||
l.operationsInFlight.Add(-1) | ||
return fmt.Errorf("global concurrent request limit exceeded for getBlob operations, try again later") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this whole sequence be atomic? otherwise, can operationsInFlight
go over MaxConcurrentGetBlobOps
and potentially reject valid operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The situation where this logic may fail is if the request rate is so high that there are always MaxConcurrentGetBlobOps
or more goroutines that happen to be inside this one block of code, which could in theory prevent any actual operations from being served.
Although such a scenario is unlikely, probably better to err on the side of safety. This now uses a mutex to guarantee atomicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing around with this a little, the code gets really ugy when I try and to low resolution locking. Methods in this class (as well as ChunkRateLimiter
) are just doing basic arithmetic... let's just use a single mutext to protect the entire class for the sake of simplicity.
@@ -177,7 +177,7 @@ func buildChunkStore(t *testing.T, logger logging.Logger) (chunkstore.ChunkReade | |||
|
|||
func randomBlob(t *testing.T) (*v2.BlobHeader, []byte) { | |||
|
|||
data := tu.RandomBytes(128) | |||
data := tu.RandomBytes(225) // TODO talk to Ian about this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a requirement v2 started imposing - to force the blob length to be a power of 2. GetCommitments
method should be refactored so that it automatically outputs a length in power of 2. @mooselumph is working on that.
"too many chunk requests provided, max is %d", s.config.MaxKeysPerGetChunksRequest) | ||
} | ||
|
||
// Future work: client IDs will be fixed when authentication is implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: // TODO(cody-littley): ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
// Future work: client IDs will be fixed when authentication is implemented | ||
clientID := fmt.Sprintf("%d", request.RequesterId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we allow unauthenticated traffic or all requests need to be signed (for MVP)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan was to only permit authenticated traffic for GetChunks()
. Is there any use case for allowing unauthenticated traffic? I can add it if you think it would be helpful.
|
||
if req.GetByIndex() != nil { | ||
key := v2.BlobKey(req.GetByIndex().GetBlobKey()) | ||
metadata = mMap[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if the key exists in the map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not be possible, since process will fail before this point if there is missing metadata. But it's a useful safety check, so I don't mind adding it.
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
} | ||
|
||
if l.globalOperationsInFlight >= l.config.MaxConcurrentGetChunkOps { | ||
return fmt.Errorf("global concurrent request limit exceeded for GetChunks operations, try again later") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it would be useful to include the values (globalOperationsInFlight
& MaxConcurrentGetChunkOps
) in the error messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to this error message and all similar error messages
return fmt.Errorf("global rate limit exceeded for GetChunk bandwidth, try again later") | ||
} | ||
|
||
allowed = l.perClientBandwidthLimiter[requesterID].AllowN(now, bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know l.perClientBandwidthLimiter[requesterID]
should exist because this method always gets called after BeginGetChunkOperation
, but this library probably shouldn't assume that. Let's check if l.perClientBandwidthLimiter[requesterID]
exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added sanity check
@@ -153,8 +155,15 @@ func (m *metadataProvider) fetchMetadata(key v2.BlobKey) (*blobMetadata, error) | |||
} | |||
} | |||
|
|||
blobSize := uint32(cert.BlobHeader.BlobCommitments.Length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my bad. I meant BlobMetadata. Otherwise, we can also multiply Length
by BYTES_PER_SYMBOL
, but this will overestimate the size
Why are these changes needed?
Adds server side rate limiting to the relay.
Checks