Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server/authorizer: Fix gzip payload handling. #6825

Conversation

philipaconrad
Copy link
Contributor

@philipaconrad philipaconrad commented Jun 17, 2024

What changed in this PR?

This PR fixes an issue where an OPA running authorization policies would be unable to handle gzipped request bodies.

Example OPA CLI setup:

opa run -s --authorization=basic

Example request:

echo -n '{}' | gzip | curl -H "Content-Encoding: gzip" --data-binary @- http://127.0.0.1:8181/v1/data

This would result in unhelpful error messages, like:

{
  "code": "invalid_parameter",
  "message": "invalid character '\\x1f' looking for beginning of value"
}

The cause was that the request body handling system in the server/authorizer package did not take gzipped payloads into account. The fix was to borrow the gzip request body handling function from server/server.go, to transparently decompress the body when needed.

Fixes: #6804

TODOs

  • Add an authorizer test for the gzipped payload case (to prevent regressions!)

@philipaconrad philipaconrad added bug go Pull requests that update Go code labels Jun 17, 2024
@philipaconrad philipaconrad self-assigned this Jun 17, 2024
Copy link

netlify bot commented Jun 17, 2024

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 0182db3
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/667c77b7948c3000083dbe38
😎 Deploy Preview https://deploy-preview-6825--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@philipaconrad philipaconrad force-pushed the philip/fix-authorizer-gzip-handling branch from 6805365 to 34889a4 Compare June 17, 2024 16:37
@srenatus
Copy link
Contributor

I'm a bit surprised about the approach taken here 😅 I'd have expected us to change the ordering of handlers, since we’re setting up the authzHandler (via initRouters) before the compressHandler. One could argue that authz should happen before decompress (to avoid an attack vector, decompressing not-yet-authorized bodies), but if we end up decompressing in the authz handler anyways, that point is moot.

🤔 Am I missing something that makes this idea flawed? 😃

@srenatus
Copy link
Contributor

Ooooh that linked code (CompressHandler) only takes care of compressing the response, doesn't it? That's not what we need, of course.

I guess the alternative approach taken that I had in mind would be to have some sort of middleware for both request payloads and response payloads. 💭

@philipaconrad
Copy link
Contributor Author

philipaconrad commented Jun 17, 2024

@srenatus, I do think you've got a valid point w.r.t. the gzipped not-authorized-yet message bodies. I like the idea of having a uniform bit of middleware for decompressing requests! I'll take a look at how hard that would be to implement, and will ping back in this thread with the results.

@srenatus
Copy link
Contributor

If we can keep authentication -> compression -> authorization, so authentication will be first, then I think it's no problem at all.

@philipaconrad
Copy link
Contributor Author

Gotcha! Authentication first should not be an issue-- it's authorization and decompression that have a fixed ordering relative to each other.

@philipaconrad
Copy link
Contributor Author

philipaconrad commented Jun 17, 2024

I think we actually could do some validation on gzip payload reads, but the method I have in mind relies on low-level knowledge of the format, namely, that the 8-byte trailer at the end of any valid gzip-compressed blob includes the CRC, as well as the decompressed file size in bytes, modulo 2^32 (4GB).

Here's a sketch of what a server/decoding plugin might look like, that would solve this problem a few levels above where the current PR fix is doing it:

  • Identical configuration style to the server/encoding plugin.
  • Identical low-level http.Handler hooks style to server/encoding. (Requiring some more initialization logic in server/server.go, so that the request decompressor wraps everything else)
  • New Gzip decompressor system uses the following techniques to allow rejection of bad gzip payloads:
    • Try creating a gzip.NewReader(request.Body), as NewReader doesn't do any real work beyond reading the header, until (*gzip.Reader).Read is called.
    • Manually grab the last 4 bytes from the gzip blob, and convert to a uint32 for checking against the plugin's configured size limits.
  • Prealloc read buffer, based on the gzip blob's declared size.
  • Use a limited-size read method to read out/decompress the payload into the buffer.

Maybe I'm overthinking things? I do think this approach would both boost performance for gzipped requests, and (if my understanding of the compress/gzip package is correct) would comprehensively resolve issue #6804, even for unauthenticated-but-authorized OPA deployments, at the expense of much heavier refactoring than I'd expected at the outset of trying to solve this issue. 😅

Notes:

@philipaconrad
Copy link
Contributor Author

philipaconrad commented Jun 17, 2024

@srenatus I've added the some gzip + authz policy test cases, so that's no longer a blocker. The tests have some extra cruft in them to allow turning the authz policy parts on/off while I experiment with the server/decoding plugin idea.

srenatus
srenatus previously approved these changes Jun 18, 2024
server/server_test.go Outdated Show resolved Hide resolved
server/server_test.go Outdated Show resolved Hide resolved
server/server_test.go Outdated Show resolved Hide resolved
@johanfylling
Copy link
Contributor

On this branch, if I have no authz policy and run:

echo -n '{"input": {}}' | curlie --data @- http://127.0.0.1:8181/v1/data

I get a 200 OK and the expected eval result.
But if I run:

echo -n '{"input": {}}' | gzip | curlie -H "Content-Encoding: gzip" --data-binary @- http://127.0.0.1:8181/v1/data

I get:

HTTP/1.1 400 Bad Request
Content-Type: application/json
Vary: Accept-Encoding
Date: Tue, 18 Jun 2024 09:10:31 GMT
Content-Length: 96

{
    "code": "invalid_parameter",
    "message": "could not decompress the body: unexpected EOF"
}

Given that I don't do authn/authz on the request, it's not surprising if the changes in this PR aren't exercised, but wouldn't we expect the latter request to succeed?

@philipaconrad
Copy link
Contributor Author

[...] But if I run:

echo -n '{"input": {}}' | gzip | curlie -H "Content-Encoding: gzip" --data-binary @- http://127.0.0.1:8181/v1/data

@johanfylling I tried running that command with plain curl, and got a 200 OK on both main, and this PR branch:

echo -n '{"input": {}}' | gzip | curl -H "Content-Encoding: gzip" --data-binary @- http://127.0.0.1:8181/v1/data

⬇️
Response:

{"result":{}}

I wonder if it's a behavior difference or a bug in curlie?

@johanfylling
Copy link
Contributor

@philipaconrad , you're correct: curl works as expected. This must be a bug in curlie, which is supposed to just be a simple wrapper of curl to get some prettier output .. apparently they mess something up on the way through 😄.

@philipaconrad philipaconrad force-pushed the philip/fix-authorizer-gzip-handling branch from 3fd8c3f to cb39173 Compare June 25, 2024 16:53
@philipaconrad
Copy link
Contributor Author

philipaconrad commented Jun 25, 2024

In tests with a 1GB gzipped input to curl, it looks like we're decompressing twice in some cases, due to this change unmarshaling twice, due to how the authorizer unmarshals inputs before the v1CompilePost endpoint works.

Other endpoints check the request context to see if the input was already parsed for them, but v1CompilePost does not. Since we probably do not want to have 2x the overhead for POST requests against /v1/data, I'm going to investigate adding a context lookup for that handler.

geninput.py:

N = 1024 * 1024
oneKBString = "A" * 1024

if __name__ == "__main__":
    print('{"input":{')
    for i in range(0, N):
        print('"{}": "{}",'.format(i, oneKBString))
    print('"{}": "{}"'.format(N, oneKBString))
    print("}}")

Example CLI (server):

/usr/bin/time -v ./opa-pr run -s --authorization=basic authz.rego

Example CLI (client):

python3 geninput.py | gzip | curl -H "Content-Encoding: gzip" --data-binary @- http://127.0.0.1:8181/v1/data

if err != nil {
return nil, err
}
defer gzReader.Close()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation potentially leaks the gzip.Reader. In util/read_gzip_body.go in this PR, I add back this utility function, but with corrected defer behavior.

This PR fixes an issue where an OPA running authorization policies would
be unable to handle gzipped request bodies.

Example OPA CLI setup:

    opa run -s --authorization=basic

Example request:

    echo -n '{}' | gzip | curl -H "Content-Encoding: gzip" --data-binary @- http://127.0.0.1:8181/v1/data

This would result in unhelpful error messages, like:

```json
{
  "code": "invalid_parameter",
  "message": "invalid character '\\x1f' looking for beginning of value"
}
```

The cause was that the request body handling system in the
`server/authorizer` package did not take gzipped payloads into
account. The fix was to borrow the gzip request body handling function
from `server/server.go`, to transparently decompress the body when
needed.

Fixes: open-policy-agent#6804

Signed-off-by: Philip Conrad <[email protected]>
@philipaconrad philipaconrad force-pushed the philip/fix-authorizer-gzip-handling branch from da71251 to 0182db3 Compare June 26, 2024 20:19
@philipaconrad
Copy link
Contributor Author

philipaconrad commented Jun 26, 2024

ℹ️ Here's the current state of things (for @johanfylling, @srenatus, and anyone else following along):

The good news:

The bad news:

  • Due to how folks have written stuff in the server package, we will end up reading the message body twice for gzipped requests in some cases when authz policies are enabled.
    • This unfortunately is the "intended" behavior that should have been happening before, but wasn't, due to #6804.
    • When Compress OPA response & accept compressed input #5696 went in, it introduced cases where the message body could be io.ReadAll'd 2x or more by accident. That PR was focused on compressed OPA responses, not compressed requests to OPA, but it had to do something with the requests at the time, since gzip compatibility/detection relies on seeing appropriate gzip-related request headers.
  • Resolving the performance problems will require some extensive refactoring, and is unrelated to what this PR is attempting to solve-- I'd like to move the refactoring work to a follow-up PR for the next OPA release.

Unless there's any other comments/critiques/requests around this PR, I think it should be good-for-merge now.

@johanfylling johanfylling merged commit 4e01537 into open-policy-agent:main Jun 27, 2024
28 checks passed
@philipaconrad philipaconrad deleted the philip/fix-authorizer-gzip-handling branch June 27, 2024 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--authorization=basic breaks gzipped request decompression
4 participants