-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Header to metadata filter (#2436) #3633
Header to metadata filter (#2436) #3633
Conversation
Hi @rgs1 ! Very nice! Let me know when you think it's ready to be tested. I see you're still adding commits. I'm happy to test it and confirm that it works for me too. |
@bmetzdorf yep sorry, still ironing out a few things... Will let you know! |
@bmetzdorf @ggreenway ok, I think it's going through now... Sorry for the noise. Let me know if this goes along with what you are thinking. Thanks! |
Add support for extracting dynamic metadata from requests. This can then be used as static metadata would be used (e.g.: for subset load balancer metadata matches, logging, etc). Risk Level: Low Testing: unit-test Docs Changes: Basic docs, more later. Release Notes: N/A Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Somehow, my version of clang-format didn't catch this. Signed-off-by: Raul Gutierrez Segales <[email protected]>
Also, rebase on master because well-known-names.h changed. Signed-off-by: Raul Gutierrez Segales <[email protected]>
5f8816e
to
950cbb1
Compare
@bmetzdorf sorry, had to rebase plus i added one more commit to avoid string copies. |
Btw, I am not sold on calling this filter |
Hi @rgs1. So, the simple request_rules use case that I'm interested in works fine for plain HTTP, but as you already mentioned in #2436 it is missing regex extraction support. In addition to that though, it seems as if currently (and I know there's some work in progress to overhaul this) this (or any) http filter does not get called for websocket connections, which unfortunately is what I'd require (sigh..). So I'm wondering if we could provide some functionality of this in the route matcher (which does work for websocket connections) instead:
We could also have two different implementations, until the websocket stuff gets refactored. What do you think? |
@bmetzdorf we've got many many route matchers, so if we implement it there we'd need to be able to configure this at the cc: @derekargueta [0] https://github.com/envoyproxy/envoy/blob/master/api/envoy/api/v2/rds.proto#L39 |
@rgs1 If you have many route matchers, would the Long term I'd think that the HTTP filter is the right approach. The filter should work with the upcoming websocket changes. Until then the limited functionality I require could be provided in the |
@bmetzdorf yeah, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a first pass through this. I think it's headed in the right direction.
syntax = "proto3"; | ||
|
||
package envoy.config.filter.http.header_to_metadata.v2; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs option go_package = "v2";
There should be a preamble that briefly describes what the filter does and refers to the docs you start, e.g.
// [#protodoc-title: Header-To-Metadata Filter]
// The Header-to-Metadata filter ....
// :ref:`configuration overview <config_http_filters_header_to_metadata>`.
string metadata_namespace = 1; | ||
|
||
// The key — must be present. | ||
string key = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you append [(validate.rules).string.min_bytes = 1]
(and import "validate/validate.proto"
)
you should get automatic validation that a value is present. (This depends on an invocation of ProtobufUtil::validate, which will happen automatically if you switch to FactoryBase for your NamedHttpFilterConfigFactory as mentioned in another comment of mine.)
string key = 2; | ||
|
||
// The value — may be absent. | ||
string value = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either here or in the Rule message's on_header_present field, please explain the filter's logic when this value is present or not.
For on_header_missing, this isn't really optional.
} | ||
} else { | ||
// Should we add metadata if the header is missing? | ||
if (rule.onHeaderMissing.key.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate this at configuration time and remove this check.
|
||
if (value.empty()) { | ||
// No value, skip. we could allow this though. | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding debug-level logging for all these false return cases.
enum ValueType { | ||
STRING = 0; | ||
NUMBER = 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to inline these enums and messages within the Config message.
|
||
message Rule { | ||
// The header from which to extract a value. | ||
string header = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate that this is non-empty and document that it's required?
/** | ||
* Config registration for the header-to-metadata filter. @see NamedHttpFilterConfigFactory. | ||
*/ | ||
class HeaderToMetadataConfig : public Server::Configuration::NamedHttpFilterConfigFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, inherit from the Envoy::Extensions::HttpFilters::Common::FactoryBase template. That will allow you to just implement createFilterFactoryFromProtoTyped
, which takes care of the dynamic cast (and protobuf validation rules) for you. (See, for example, the ext_authz http filter's config.)
namespace HttpFilters { | ||
namespace HeaderToMetadataFilter { | ||
|
||
enum class MetadataType { String, Number }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like most of these enums, structs or typedefs are not something you want to publicly expose outside your filter, so they should probably become private members of the Config class. Also, consider just using the Protobuf types for MetadataKeyValue and MetadataType. I don't think the conversion buys you much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still undecided about avoiding the conversion from Protobuf
messages to internal structs, mostly because of the LowerCaseString
header field that I'd like to avoid converting/creating on every
request. So there's a small win there — we might also want to extend
things a bit more of which point the conversions might be useful. I
did not make the Rule related structs private because they are shared
between Config and the Filter class.. I could make them private to
Config and make Filter a friend of class but at that point I am
not sure if pays off anymore. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's worth keeping the LowerCaseString. You could store the protobuf Rule objects directly along side their LowerCaseString names as a std::vector<std::pair<Http::LowerCaseString, envoy::config::filter::http::header_to_metadata::v2::Config::Rule>>
or with a struct that has the LowerCaseString and the protobuf Rule as fields.
In the future, if there are other pre-computed structures necessary, I think it makes sense to start introducing extra classes.
* updates to the proto defs (comments, validations, moved all under Config) * debug-level logging for special early exit cases * use FactoryBase instead of NamedHttpFilterConfigFactory * made some typedefs private I am still undecided about avoiding the conversion from Protobuf messages to internal structs, mostly because of the LowerCaseString header field that I'd like to avoid converting/creating on every request. So there's a small win there — we might also want to extend things a bit more of which point the conversions might be useful. I did not make the Rule related structs private because they are shared between Config and the Filter class.. I could make them private to Config and make Filter a friend of class but at that point I am not sure if pays off anymore. Thoughts? I also left the `rule.onHeaderMissing.key.empty()` call as a way to validate if the `onHeaderMissing` case for a rule is set. Looking around for a better way to handle missing/null inner messages. Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
// | ||
// The value field is ignored for this case, since the header value | ||
// will be used. | ||
KeyValuePair on_header_present = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From reading the code, I thought it uses the value, if given, or else the header's value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are correct, thanks for catching this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the second sentence on the on_header_present docs still need revision (or deletion)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuercher oops — fixed in the last commit.
namespace HttpFilters { | ||
namespace HeaderToMetadataFilter { | ||
|
||
enum class MetadataType { String, Number }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's worth keeping the LowerCaseString. You could store the protobuf Rule objects directly along side their LowerCaseString names as a std::vector<std::pair<Http::LowerCaseString, envoy::config::filter::http::header_to_metadata::v2::Config::Rule>>
or with a struct that has the LowerCaseString and the protobuf Rule as fields.
In the future, if there are other pre-computed structures necessary, I think it makes sense to start introducing extra classes.
Instead, lets pair the LowerCaseString header along with its corresponding Rule protobuf message. Signed-off-by: Raul Gutierrez Segales <[email protected]>
* Specify that a given non empty value for `on_header_present` takes precedence over a header value. And, if the header value is empty then no metadata will be added. * Specify that an empty value for `on_header_missing` means that no metadata will be added. Signed-off-by: Raul Gutierrez Segales <[email protected]>
@zuercher i addressed your latest comments, lmk if i missed anything — thanks! |
Hmm, we should probably emit some stats (e.g.: total count of metadata keys added, total per key, etc)? We haven't had a need for them internally (yet), but I actually think they could be useful for users to validate things are working as expected. Thoughts? |
I think per https://github.com/envoyproxy/envoy/blob/master/GOVERNANCE.md#extension-addition-policy |
Yup per @alyssawilk that's fine with me. I don't have time to review this right now. Please make sure that all the docs are in place though. Thank you!! |
@alyssawilk ah nice, will do! @zuercher are you ok with being added to CODEOWNERS for this? :-) (no pressure) |
Signed-off-by: Raul Gutierrez Segales <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just discovered you can view the generated docs from CircleCI's artifacts tab -- I noticed a couple of small things. Let's fix those and then I'll approve it and merge.
|
||
import "validate/validate.proto"; | ||
|
||
// [#protodoc-title: Header-To-Metadata Fitlter] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filter is misspelled.
// The configuration for transforming headers into metadata. This is useful | ||
// for matching load balancer subsets, logging, etc. | ||
// | ||
// :ref:`configuration overview <config_http_filters_header_to_metadata>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird in the generated docs (https://65061-65214191-gh.circle-artifacts.com/0/source/generated/docs/api-v2/config/filter/http/header_to_metadata/v2/header_to_metadata.proto.html). Should probably have "Header-to-Metadata" at before the link so it reads "Header-to-metadata configuration overrview."
Signed-off-by: Raul Gutierrez Segales <[email protected]>
@zuercher done — i went with |
@bmetzdorf so #3301 was fixed/addressed, which means this should now work with websockets... We'd probably need to extend it support regexes, which was another requirement you had.... |
Hi @rgs1 . Unfortunately my use case requires websockets with early body, which violates the websocket spec. While #3301 / #3376 might still allow this for now (I haven't tested it yet) it will be a problem in the future when strict websocket support will be enforced, in which case the http_filter approach for extracting metadata won't work for me. @alyssawilk @ggreenway Do you see any chance that we might allow early websocket bodies even in the long term? |
We've got regression tests for early websocket bodies for both the old style and new style websockets so hopefully you'll be set. When you say "websockets with early body" do you mean explicit content-length != 0, or websocket payload arriving with the upgrade request? I could see Envoy rejecting GET-with-body but I don't think anyone plans on killing off early data handling, especially if you comment on that regression test that there are users using it. |
@alyssawilk I use both, upgrade request coming in with body payload as well as a upgrade response with content-length > 0. Is that something we could entertain to support long term (maybe hidden behind a |
More likely a more generic allow_disallowed_bodies or something but I suspect we could leave it config available. As I'd likely be the person implementing any adhere-to-the-spec feature and @mattklein123 or I would likely be the ones to review if someone else writes it, we'll keep your traffic in mind :-) |
That sounds great! Thanks @alyssawilk ! |
@rgs1 In regards to regex, what do you think about something like this?
|
@bmetzdorf generally lgtm, lets try to make it consistent with how exact match vs regex match is implemented in other places... Something like this perhaps: https://github.com/envoyproxy/envoy/blob/master/api/envoy/api/v2/route/route.proto#L941 What do you think? |
@rgs1 Sure! Do you think we should support all these cases?
Or just
for now? |
@bmetzdorf lets start with the ones you'll need :-) So we have to keep |
@rgs1 I think Examples: Now:
Future:
with |
@bmetzdorf I like the idea of reusing/emulating |
@rgs1 We could introduce a new key |
Yeah that would work. |
@rgs1 Ok, so what do you think about this (
I think we would have to copy route.HeaderMatcher and add the missing definitions though.. |
@bmetzdorf that looks good |
@rgs1 While thinking about this more I'm realizing that it might be easier to create a new filter and retire the current one later. I think it could be a little bit messy trying to cram both types of configuration and logic into the existing one. What do you think? |
@bmetzdorf sure that's fine too |
Add support for extracting dynamic metadata from requests. This can then
be used as static metadata would be used (e.g.: for subset load balancer
metadata matches, logging, etc).
Risk Level: Low
Testing: unit-test
Docs Changes:
Basic docs, more later.
Release Notes:
N/A
Signed-off-by: Raul Gutierrez Segales [email protected]