Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Limit reading bytes from request bodies #168

Merged
merged 1 commit into from
Feb 25, 2020

Conversation

rhcarvalho
Copy link
Contributor

This change both limits the number of bytes that the SDK attempts to
read and buffer in memory, as well as delays as much work as possible.

It requires breaking changes to the exported API.

Notes on API changes:

- Request.FromHTTPRequest
+ NewRequest

The difference is that NewRequest does not read the request body.
FromHTTPRequest was called from HTTP middleware in integrations in the
beginning of the request handling flow. Attempting to read the whole
body there is often wasteful, since most requests should not result in
reporting an error to Sentry.

With this change, Scope.SetRequest keeps a lazy buffer that clones a
limited portion of the request body only if it is read by the user.

- Scope.SetRequest(Request)
+ Scope.SetRequest(*http.Request)

For most integrations (fasthttp being the only exception), this means
less work needs to be done when handling a request. Converting from
http.Request to sentry.Request is only done when there is an error to
be reported.
It also means we can keep a reference to the request body without
reading all bytes upfront.

- Event.Request type Request
+ Event.Request type *Request

Avoids copying of possibly large struct.


Closes #96.
Closes #166.

This change both limits the number of bytes that the SDK attempts to
read and buffer in memory, as well as delays as much work as possible.

It requires breaking changes to the exported API.

Notes on API changes:

- Request.FromHTTPRequest
+ NewRequest

  The difference is that NewRequest does not read the request body.
  FromHTTPRequest was called from HTTP middleware in integrations in the
  beginning of the request handling flow. Attempting to read the whole
  body there is often wasteful, since most requests should not result in
  reporting an error to Sentry.

  With this change, Scope.SetRequest keeps a lazy buffer that clones a
  limited portion of the request body only if it is read by the user.

- Scope.SetRequest(Request)
+ Scope.SetRequest(*http.Request)

  For most integrations (fasthttp being the only exception), this means
  less work needs to be done when handling a request. Converting from
  http.Request to sentry.Request is only done when there is an error to
  be reported.
  It also means we can keep a reference to the request body without
  reading all bytes upfront.

- Event.Request type Request
+ Event.Request type *Request

  Avoids copying of possibly large struct.
@rhcarvalho rhcarvalho force-pushed the request-body-limited-lazy branch from 18b7911 to 74b8e18 Compare February 24, 2020 23:11
Comment on lines 3 to 10
import (
"bytes"
"context"
"fmt"
"net"
"strings"
"io/ioutil"
"net/http"
"net/url"
"time"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite changes to imports, I verified manually that the transitive dependencies of sentryfasthttp did not change.

go list -f '{{range .Deps}}{{println .}}{{end}}' ./fasthttp

Comment on lines +96 to +121
{
Path: "/post/large",
Method: "POST",
Body: largePayload,
Handler: func(ctx *fasthttp.RequestCtx) {
hub := sentryfasthttp.GetHubFromContext(ctx)
hub.CaptureMessage(fmt.Sprintf("post: %d KB", len(ctx.Request.Body())/1024))
},

WantEvent: &sentry.Event{
Level: sentry.LevelInfo,
Message: "post: 15 KB",
Request: &sentry.Request{
URL: "http://example.com/post/large",
Method: "POST",
// Actual request body omitted because too large.
Data: "",
Headers: map[string]string{
"Content-Length": "15360",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "example.com",
"User-Agent": "fasthttp",
},
},
},
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests that the SDK doesn't try to send "large" payloads to the Sentry server.
The current definition of large is the default value adopted by other SDKs.

Making the max payload size configurable is tracked by #169.

There is a similar test for the net/http integration.

Comment on lines +122 to +148
{
Path: "/post/body-ignored",
Method: "POST",
Body: "client sends, fasthttp always reads, SDK reports",
Handler: func(ctx *fasthttp.RequestCtx) {
hub := sentryfasthttp.GetHubFromContext(ctx)
hub.CaptureMessage("body ignored")
},

WantEvent: &sentry.Event{
Level: sentry.LevelInfo,
Message: "body ignored",
Request: &sentry.Request{
URL: "http://example.com/post/body-ignored",
Method: "POST",
// Actual request body included because fasthttp always
// reads full request body.
Data: "client sends, fasthttp always reads, SDK reports",
Headers: map[string]string{
"Content-Length": "48",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "example.com",
"User-Agent": "fasthttp",
},
},
},
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test contrasts with a similar test for the net/http integration.

Because fasthttp always reads the request body before calling handlers, the SDK always has access to the body and includes it when reporting events to Sentry. Still, it respects the max body size.

For all servers based on the net/http stack, the request body is only read on demand as a stream. The SDK replaces the original body with a wrapper that buffers up to the maximum number of bytes so that we can report to Sentry. If the body is never read, nothing gets buffered. This optimizes for not blocking on network IO when not needed.

Comment on lines +96 to +149
{
Path: "/post/large",
Method: "POST",
Body: largePayload,
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
hub := sentry.GetHubFromContext(r.Context())
body, err := ioutil.ReadAll(r.Body)
if err != nil {
t.Error(err)
}
hub.CaptureMessage(fmt.Sprintf("post: %d KB", len(body)/1024))
}),

WantEvent: &sentry.Event{
Level: sentry.LevelInfo,
Message: "post: 15 KB",
Request: &sentry.Request{
URL: "/post/large",
Method: "POST",
// Actual request body omitted because too large.
Data: "",
Headers: map[string]string{
"Accept-Encoding": "gzip",
"Content-Length": "15360",
"User-Agent": "Go-http-client/1.1",
},
},
},
},
{
Path: "/post/body-ignored",
Method: "POST",
Body: "client sends, server ignores, SDK doesn't read",
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
hub := sentry.GetHubFromContext(r.Context())
hub.CaptureMessage("body ignored")
}),

WantEvent: &sentry.Event{
Level: sentry.LevelInfo,
Message: "body ignored",
Request: &sentry.Request{
URL: "/post/body-ignored",
Method: "POST",
// Actual request body omitted because not read.
Data: "",
Headers: map[string]string{
"Accept-Encoding": "gzip",
"Content-Length": "46",
"User-Agent": "Go-http-client/1.1",
},
},
},
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments on sentryfasthttp_test.go.

tl;dr there are two new test cases:

  • The SDK should not report large bodies to the Sentry server (and it avoids keeping too much in memory)
  • The SDK should not read bodies on behalf of the user (never slows programs down doing network IO)

@@ -147,7 +140,7 @@ type Event struct {
User User `json:"user,omitempty"`
Logger string `json:"logger,omitempty"`
Modules map[string]string `json:"modules,omitempty"`
Request Request `json:"request,omitempty"`
Request *Request `json:"request,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids copying a possibly large struct every time we pass events around.

We may do the same for other fields in the future (e.g. User and Exception).

Comment on lines +116 to +119
r.Body = readCloser{
Reader: io.TeeReader(r.Body, buf),
Closer: r.Body,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the original request body with a wrapper that lazily buffers up to maxRequestBodyBytes.

Comment on lines +128 to +143
func (scope *Scope) SetRequestBody(b []byte) {
scope.mu.Lock()
defer scope.mu.Unlock()

capacity := maxRequestBodyBytes
overflow := false
if len(b) > capacity {
overflow = true
b = b[:capacity]
}
scope.requestBody = &limitedBuffer{
Capacity: capacity,
Buffer: *bytes.NewBuffer(b),
overflow: overflow,
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We export this new method on Scope just because of fasthttp. I wish it was not needed.

It uses the same limitedBuffer such that we don't send large bodies to Sentry.

Note that limitedBuffer.Buffer reuses the byte slice b to avoid copying it.
Note that we retain no more than maxRequestBodyBytes (although memory from the original backing array can't be garbage collected). This is such that limitedBuffer.Bytes() returns no more than limitedBuffer.Capacity bytes.

Comment on lines -18 to +21
const basicEvent = "{\"message\":\"mkey\",\"sdk\":{},\"user\":{},\"request\":{}}"
const basicEvent = "{\"message\":\"mkey\",\"sdk\":{},\"user\":{}}"
const enhancedEvent = "{\"extra\":{\"info\":\"Original event couldn't be marshalled. Succeeded by stripping " +
"the data that uses interface{} type. Please verify that the data you attach to the scope is serializable.\"}," +
"\"message\":\"mkey\",\"sdk\":{},\"user\":{},\"request\":{}}"
"\"message\":\"mkey\",\"sdk\":{},\"user\":{}}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the "request" field disappears because finally the omitempty directive in the JSON struct tag for Event.Request can take effect.

In the future, the same could happen to "sdk" and "user" if we make them pointers to struct.

Comment on lines +37 to +46
request *http.Request
// requestBody holds a reference to the original request.Body.
requestBody interface {
// Bytes returns bytes from the original body, lazily buffered as the
// original body is read.
Bytes() []byte
// Overflow returns true if the body is larger than the maximum buffer
// size.
Overflow() bool
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we store a *http.Request instead of sentry.Request.

This means that most integrations don't need to perform any extra work to transform *http.Request into sentry.Request in their HTTP middleware. We delay the conversion to Scope.ApplyToEvent, which is only called if there is actually an event to be reported.

requestBody is most of the time populated by Scope.SetRequest, the exception being in the fasthttp integration.

if (reflect.DeepEqual(event.Request, Request{})) {
event.Request = scope.request
if event.Request == nil && scope.request != nil {
event.Request = NewRequest(scope.request)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where we convert from *http.Request to *sentry.Request.

Note that this code runs in the same goroutine as calls like Hub.CaptureEvent, synchronously. Before we hit the sentry.Transport layer, all references to the HTTP request and its body have been resolved / prepared to be serialized. We don't leak references to other goroutines (the default transport sends events in a separate goroutine).

One notable difference to the previous implementation is that if several events are captured within the same request, there will be several type conversions / calls to NewRequest, whereas in the old implementation Request.FromHTTPRequest was called only once per request.

The advantage of the new implementation is that it delays reading the request body and keep it under the users control (if the user doesn't read the body, the SDK also doesn't).

If deemed necessary, we could cache the computed sentry.Request and store it in the scope for reuse.

The problem with caching is invalidation: while request headers and other metadata should not change, the observed body may be different depending on when Scope.ApplyToEvent is called.

@rhcarvalho rhcarvalho marked this pull request as ready for review February 25, 2020 00:05
Copy link
Contributor

@kamilogorek kamilogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read through the whole implementation and it looks awesome, thanks!

@rhcarvalho rhcarvalho merged commit db5e5da into getsentry:master Feb 25, 2020
@rhcarvalho rhcarvalho deleted the request-body-limited-lazy branch February 25, 2020 15:54
rvolosatovs pushed a commit to rvolosatovs/lorawan-stack-fork that referenced this pull request May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pull request #94 causes http.ErrBodyReadAfterClose Limit number of bytes read from HTTP request bodies
2 participants