Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

G1: Add proposal for true binary metadata #19

Merged
merged 3 commits into from
Aug 25, 2017
Merged

G1: Add proposal for true binary metadata #19

merged 3 commits into from
Aug 25, 2017

Conversation

ctiller
Copy link
Member

@ctiller ctiller commented Mar 29, 2017


When transmitting metadata on a connection where the peer has specified
GRPC_ALLOW_TRUE_BINARY_METADATA, instead of encoding using base64, an
implementation MAY instead prefix a NUL byte to the metadata and transmit the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/metadata/metadata value/. Or maybe "header field value".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Since this is a HTTP2 extension and other extensions might alias this extension
id, it's possible that this becomes misconfigured. In that case, peers are
required to RST_STREAM with http error PROTOCOL_ERROR. If a binary encoding was
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call out that this is the normal behavior as specified in RFC7540§10.3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Suppose we wanted to send metadata element 'foo-bin: 0x01' (ie a single byte
containing '1').

Under base64, we'd send a http header 'foo-bin: AQ'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add period at end of sentence, or an additional newline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,89 @@
Title
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put in the real title.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

GRPC_ALLOW_TRUE_BINARY_METADATA = 0xfe03.

This setting is randomly chosen (to avoid conflicts with other extensions), and
within the experimental range of HTTP extensions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding a link to https://tools.ietf.org/html/rfc7540#section-11.3 here to refer readers to information about the experimental range.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ctiller ctiller changed the title Add proposal for true binary metadata G1: Add proposal for true binary metadata Mar 31, 2017
### New setting

Expose a custom setting in our HTTP2 settings exchange:
GRPC_ALLOW_TRUE_BINARY_METADATA = 0xfe03.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem gRPC-specific enough to justify GRPC_ prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That it only applies to -bin headers (which are a gRPC thing) makes it gRPC-specific enough (IMO)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only -bin headers? If we're using 0x00 magic prefix, then I don't see a reason for this limitation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So gRPC uses -bin to specify that a header is binary, and says that other metadata must be ascii (per http2). That -bin has semantic meaning (foo-bin is a different header than foo).

We use that as a signal to base64 encode things right now. Under this proposal we'd just switch to an alternate encoding.

Since we can't universally make all of the HTTP2 ecosystem take binary metadata, a mechanism to allow transmitting anything and dropping the -bin isn't possible for us (we need to keep interoperating - so proxies that support this would potentially need to do the base64 conversion if one side didn't support this mechanism).

If there were an extension that allowed sending binary values in metadata, we'd still end up needing a disambiguating byte for backwards compatibility (for the gRPC -bin suffixed headers), and we'd still maintain the requirement that non -bin headers must be ascii.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you aware of any HTTP/2 proxies that would reject requests because of "true binary" encoding of the header field? Unless it's a well-known header that must be processed by the proxy, most (all?) will just pass-through headers without ever looking at the content of the header fields.

Don't get me wrong, I'm all in for allowing binary header fields, but in order get this accepted into HTTP/2 ecosystem, this proposal should be a little more than gRPC implementation detail, that's why -bin suffix feels like an artificial limitation.

I believe that standardizing on "0x00 prefix of the header field indicates binary/raw representation" is much more reasonable, and doesn't prevent gRPC from only using it in headers with -bin suffix.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer this myself, RFC7540, sec. 10.3. Intermediary Encapsulation Attacks says:

Similarly, HTTP/2 allows header field values that are not valid.
While most of the values that can be encoded will not alter header
field parsing, carriage return (CR, ASCII 0xd), line feed (LF, ASCII
0xa), and the zero character (NUL, ASCII 0x0) might be exploited by
an attacker if they are translated verbatim. Any request or response
that contains a character not permitted in a header field value MUST
be treated as malformed (Section 8.1.2.6). Valid characters are
defined by the "field-content" ABNF rule in Section 3.2 of [RFC7230].

I also did some digging on the http-wg mailing list and it seems that sending binary blobs in header fields was discussed during HTTP/2 standardization, but they eventually decided against it, because apparently base64 in HPACK is "good enough".

containing '1').

Under base64, we'd send a http header 'foo-bin: AQ'.
Under binary, we'd send 'foo-bin: 0x00 0x01' (ie prefixing a NUL byte and then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need NUL-byte here? Why can't we send binary header as foo-bin: 0x01?

I actually spent some time on Wednesday looking into this (for different reasons), and I couldn't find anything in the spec that forbids binary header values.

This was a limitation in SPDY, which used NUL-byte as a separator between multiple header values, but it seems that this "feature" was removed from HPACK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://tools.ietf.org/html/rfc7540#section-8.1 says that https://tools.ietf.org/html/rfc7230#section-3.2 is normative, which does not allow binary headers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...which means that foo-bin: 0x00 0x01 is also illegal.

My point is that nothing in HPACK prevents it from transmitting binary data, so we could add generic SETTINGS parameter for it (for example: SETTING_ALLOW_BINARY_HEADER_FIELD) and carry it over HPACK as any other value.

But I guess 0x00 prefix is here to differentiate binary from text? If so, do we really need -bin prefix in header name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-bin is not going away so long as there are clients, servers, and intermediaries that do not support binary headers.

containing '1').

Under base64, we'd send a http header 'foo-bin: AQ'.
Under binary, we'd send 'foo-bin: 0x00 0x01' (ie prefixing a NUL byte and then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...which means that foo-bin: 0x00 0x01 is also illegal.

My point is that nothing in HPACK prevents it from transmitting binary data, so we could add generic SETTINGS parameter for it (for example: SETTING_ALLOW_BINARY_HEADER_FIELD) and carry it over HPACK as any other value.

But I guess 0x00 prefix is here to differentiate binary from text? If so, do we really need -bin prefix in header name?

### New setting

Expose a custom setting in our HTTP2 settings exchange:
GRPC_ALLOW_TRUE_BINARY_METADATA = 0xfe03.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only -bin headers? If we're using 0x00 magic prefix, then I don't see a reason for this limitation.

peers MAY use a 'true binary' encoding (described below), instead of the current
base64 encoding for -bin metadata.

Implementations SHOULD transmit this setting only once, and as part of the first

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SETTINGS apply only to a connection and not end-to-end, what about intermediaries? Are proxies supposed to re-encode header field to base64 if they received binary header field but are talking to a backend that doesn't support it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or they can just not advertise support for this feature.

@PiotrSikora
Copy link

Per PROTOCOL-HTTP2.md:

Duplicate header names may have their values joined with "," as the delimiter and be considered semantically equivalent. Implementations must split Binary-Headers on "," before decoding the Base64-encoded values.

However, this proposal doesn't cover this case, and there is no way for implementations to detect whether , is a delimiter or just happens to be a part of binary blob.

On receipt, we transparently reverse the transformation.

This transformation is costly in terms of CPU, and there exist use cases where
this transformation can become the CPU bottleneck for gRPC.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some data here (i.e. metadata to payload ratio, metadata size, how much CPU does base64 encoding, base64 decoding, Huffman encoding and Huffman decoding each take)? It makes a lot of difference whether we're talking about 2% or 50% of total CPU processing time

@ejona86
Copy link
Member

ejona86 commented Apr 10, 2017

@ctiller, let's prohibits concatenating raw-binary-encoded headers that is normally possible per RFC7230§3.2.2?

@ctiller
Copy link
Member Author

ctiller commented Apr 10, 2017

That SGTM.

@markdroth
Copy link
Member

Should we merge this? It's already implemented in C-core, and there's been more than enough time for discussion.

@a11r a11r merged commit 4de5b08 into grpc:master Aug 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants