-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Limit reading bytes from request bodies #168
feat: Limit reading bytes from request bodies #168
Conversation
a7b084f
to
18b7911
Compare
This change both limits the number of bytes that the SDK attempts to read and buffer in memory, as well as delays as much work as possible. It requires breaking changes to the exported API. Notes on API changes: - Request.FromHTTPRequest + NewRequest The difference is that NewRequest does not read the request body. FromHTTPRequest was called from HTTP middleware in integrations in the beginning of the request handling flow. Attempting to read the whole body there is often wasteful, since most requests should not result in reporting an error to Sentry. With this change, Scope.SetRequest keeps a lazy buffer that clones a limited portion of the request body only if it is read by the user. - Scope.SetRequest(Request) + Scope.SetRequest(*http.Request) For most integrations (fasthttp being the only exception), this means less work needs to be done when handling a request. Converting from http.Request to sentry.Request is only done when there is an error to be reported. It also means we can keep a reference to the request body without reading all bytes upfront. - Event.Request type Request + Event.Request type *Request Avoids copying of possibly large struct.
18b7911
to
74b8e18
Compare
import ( | ||
"bytes" | ||
"context" | ||
"fmt" | ||
"net" | ||
"strings" | ||
"io/ioutil" | ||
"net/http" | ||
"net/url" | ||
"time" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite changes to imports, I verified manually that the transitive dependencies of sentryfasthttp
did not change.
go list -f '{{range .Deps}}{{println .}}{{end}}' ./fasthttp
{ | ||
Path: "/post/large", | ||
Method: "POST", | ||
Body: largePayload, | ||
Handler: func(ctx *fasthttp.RequestCtx) { | ||
hub := sentryfasthttp.GetHubFromContext(ctx) | ||
hub.CaptureMessage(fmt.Sprintf("post: %d KB", len(ctx.Request.Body())/1024)) | ||
}, | ||
|
||
WantEvent: &sentry.Event{ | ||
Level: sentry.LevelInfo, | ||
Message: "post: 15 KB", | ||
Request: &sentry.Request{ | ||
URL: "http://example.com/post/large", | ||
Method: "POST", | ||
// Actual request body omitted because too large. | ||
Data: "", | ||
Headers: map[string]string{ | ||
"Content-Length": "15360", | ||
"Content-Type": "application/x-www-form-urlencoded", | ||
"Host": "example.com", | ||
"User-Agent": "fasthttp", | ||
}, | ||
}, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests that the SDK doesn't try to send "large" payloads to the Sentry server.
The current definition of large is the default value adopted by other SDKs.
Making the max payload size configurable is tracked by #169.
There is a similar test for the net/http
integration.
{ | ||
Path: "/post/body-ignored", | ||
Method: "POST", | ||
Body: "client sends, fasthttp always reads, SDK reports", | ||
Handler: func(ctx *fasthttp.RequestCtx) { | ||
hub := sentryfasthttp.GetHubFromContext(ctx) | ||
hub.CaptureMessage("body ignored") | ||
}, | ||
|
||
WantEvent: &sentry.Event{ | ||
Level: sentry.LevelInfo, | ||
Message: "body ignored", | ||
Request: &sentry.Request{ | ||
URL: "http://example.com/post/body-ignored", | ||
Method: "POST", | ||
// Actual request body included because fasthttp always | ||
// reads full request body. | ||
Data: "client sends, fasthttp always reads, SDK reports", | ||
Headers: map[string]string{ | ||
"Content-Length": "48", | ||
"Content-Type": "application/x-www-form-urlencoded", | ||
"Host": "example.com", | ||
"User-Agent": "fasthttp", | ||
}, | ||
}, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test contrasts with a similar test for the net/http
integration.
Because fasthttp
always reads the request body before calling handlers, the SDK always has access to the body and includes it when reporting events to Sentry. Still, it respects the max body size.
For all servers based on the net/http
stack, the request body is only read on demand as a stream. The SDK replaces the original body with a wrapper that buffers up to the maximum number of bytes so that we can report to Sentry. If the body is never read, nothing gets buffered. This optimizes for not blocking on network IO when not needed.
{ | ||
Path: "/post/large", | ||
Method: "POST", | ||
Body: largePayload, | ||
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
hub := sentry.GetHubFromContext(r.Context()) | ||
body, err := ioutil.ReadAll(r.Body) | ||
if err != nil { | ||
t.Error(err) | ||
} | ||
hub.CaptureMessage(fmt.Sprintf("post: %d KB", len(body)/1024)) | ||
}), | ||
|
||
WantEvent: &sentry.Event{ | ||
Level: sentry.LevelInfo, | ||
Message: "post: 15 KB", | ||
Request: &sentry.Request{ | ||
URL: "/post/large", | ||
Method: "POST", | ||
// Actual request body omitted because too large. | ||
Data: "", | ||
Headers: map[string]string{ | ||
"Accept-Encoding": "gzip", | ||
"Content-Length": "15360", | ||
"User-Agent": "Go-http-client/1.1", | ||
}, | ||
}, | ||
}, | ||
}, | ||
{ | ||
Path: "/post/body-ignored", | ||
Method: "POST", | ||
Body: "client sends, server ignores, SDK doesn't read", | ||
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
hub := sentry.GetHubFromContext(r.Context()) | ||
hub.CaptureMessage("body ignored") | ||
}), | ||
|
||
WantEvent: &sentry.Event{ | ||
Level: sentry.LevelInfo, | ||
Message: "body ignored", | ||
Request: &sentry.Request{ | ||
URL: "/post/body-ignored", | ||
Method: "POST", | ||
// Actual request body omitted because not read. | ||
Data: "", | ||
Headers: map[string]string{ | ||
"Accept-Encoding": "gzip", | ||
"Content-Length": "46", | ||
"User-Agent": "Go-http-client/1.1", | ||
}, | ||
}, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments on sentryfasthttp_test.go
.
tl;dr there are two new test cases:
- The SDK should not report large bodies to the Sentry server (and it avoids keeping too much in memory)
- The SDK should not read bodies on behalf of the user (never slows programs down doing network IO)
@@ -147,7 +140,7 @@ type Event struct { | |||
User User `json:"user,omitempty"` | |||
Logger string `json:"logger,omitempty"` | |||
Modules map[string]string `json:"modules,omitempty"` | |||
Request Request `json:"request,omitempty"` | |||
Request *Request `json:"request,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This avoids copying a possibly large struct every time we pass events around.
We may do the same for other fields in the future (e.g. User and Exception).
r.Body = readCloser{ | ||
Reader: io.TeeReader(r.Body, buf), | ||
Closer: r.Body, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This replaces the original request body with a wrapper that lazily buffers up to maxRequestBodyBytes
.
func (scope *Scope) SetRequestBody(b []byte) { | ||
scope.mu.Lock() | ||
defer scope.mu.Unlock() | ||
|
||
capacity := maxRequestBodyBytes | ||
overflow := false | ||
if len(b) > capacity { | ||
overflow = true | ||
b = b[:capacity] | ||
} | ||
scope.requestBody = &limitedBuffer{ | ||
Capacity: capacity, | ||
Buffer: *bytes.NewBuffer(b), | ||
overflow: overflow, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We export this new method on Scope
just because of fasthttp
. I wish it was not needed.
It uses the same limitedBuffer
such that we don't send large bodies to Sentry.
Note that limitedBuffer.Buffer
reuses the byte slice b
to avoid copying it.
Note that we retain no more than maxRequestBodyBytes
(although memory from the original backing array can't be garbage collected). This is such that limitedBuffer.Bytes()
returns no more than limitedBuffer.Capacity
bytes.
const basicEvent = "{\"message\":\"mkey\",\"sdk\":{},\"user\":{},\"request\":{}}" | ||
const basicEvent = "{\"message\":\"mkey\",\"sdk\":{},\"user\":{}}" | ||
const enhancedEvent = "{\"extra\":{\"info\":\"Original event couldn't be marshalled. Succeeded by stripping " + | ||
"the data that uses interface{} type. Please verify that the data you attach to the scope is serializable.\"}," + | ||
"\"message\":\"mkey\",\"sdk\":{},\"user\":{},\"request\":{}}" | ||
"\"message\":\"mkey\",\"sdk\":{},\"user\":{}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the "request"
field disappears because finally the omitempty
directive in the JSON struct tag for Event.Request
can take effect.
In the future, the same could happen to "sdk"
and "user"
if we make them pointers to struct.
request *http.Request | ||
// requestBody holds a reference to the original request.Body. | ||
requestBody interface { | ||
// Bytes returns bytes from the original body, lazily buffered as the | ||
// original body is read. | ||
Bytes() []byte | ||
// Overflow returns true if the body is larger than the maximum buffer | ||
// size. | ||
Overflow() bool | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we store a *http.Request
instead of sentry.Request
.
This means that most integrations don't need to perform any extra work to transform *http.Request
into sentry.Request
in their HTTP middleware. We delay the conversion to Scope.ApplyToEvent
, which is only called if there is actually an event to be reported.
requestBody
is most of the time populated by Scope.SetRequest
, the exception being in the fasthttp
integration.
if (reflect.DeepEqual(event.Request, Request{})) { | ||
event.Request = scope.request | ||
if event.Request == nil && scope.request != nil { | ||
event.Request = NewRequest(scope.request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where we convert from *http.Request
to *sentry.Request
.
Note that this code runs in the same goroutine as calls like Hub.CaptureEvent
, synchronously. Before we hit the sentry.Transport
layer, all references to the HTTP request and its body have been resolved / prepared to be serialized. We don't leak references to other goroutines (the default transport sends events in a separate goroutine).
One notable difference to the previous implementation is that if several events are captured within the same request, there will be several type conversions / calls to NewRequest, whereas in the old implementation Request.FromHTTPRequest
was called only once per request.
The advantage of the new implementation is that it delays reading the request body and keep it under the users control (if the user doesn't read the body, the SDK also doesn't).
If deemed necessary, we could cache the computed sentry.Request
and store it in the scope for reuse.
The problem with caching is invalidation: while request headers and other metadata should not change, the observed body may be different depending on when Scope.ApplyToEvent
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through the whole implementation and it looks awesome, thanks!
This change both limits the number of bytes that the SDK attempts to
read and buffer in memory, as well as delays as much work as possible.
It requires breaking changes to the exported API.
Notes on API changes:
The difference is that NewRequest does not read the request body.
FromHTTPRequest was called from HTTP middleware in integrations in the
beginning of the request handling flow. Attempting to read the whole
body there is often wasteful, since most requests should not result in
reporting an error to Sentry.
With this change, Scope.SetRequest keeps a lazy buffer that clones a
limited portion of the request body only if it is read by the user.
For most integrations (fasthttp being the only exception), this means
less work needs to be done when handling a request. Converting from
http.Request to sentry.Request is only done when there is an error to
be reported.
It also means we can keep a reference to the request body without
reading all bytes upfront.
Avoids copying of possibly large struct.
Closes #96.
Closes #166.