Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header to metadata filter (#2436) #3633

Merged

Conversation

rgs1
Copy link
Member

@rgs1 rgs1 commented Jun 14, 2018

Add support for extracting dynamic metadata from requests. This can then
be used as static metadata would be used (e.g.: for subset load balancer
metadata matches, logging, etc).

Risk Level: Low

Testing: unit-test

Docs Changes:
Basic docs, more later.

Release Notes:
N/A

Signed-off-by: Raul Gutierrez Segales [email protected]

@bmetzdorf
Copy link
Contributor

Hi @rgs1 ! Very nice! Let me know when you think it's ready to be tested. I see you're still adding commits. I'm happy to test it and confirm that it works for me too.

@rgs1
Copy link
Member Author

rgs1 commented Jun 14, 2018

@bmetzdorf yep sorry, still ironing out a few things... Will let you know!

@rgs1
Copy link
Member Author

rgs1 commented Jun 14, 2018

@bmetzdorf @ggreenway ok, I think it's going through now... Sorry for the noise. Let me know if this goes along with what you are thinking. Thanks!

cc: @zuercher who filed #2436.

Raul Gutierrez Segales added 8 commits June 14, 2018 15:22
Add support for extracting dynamic metadata from requests. This can then
be used as static metadata would be used (e.g.: for subset load balancer
metadata matches, logging, etc).

Risk Level: Low

Testing: unit-test

Docs Changes:
Basic docs, more later.

Release Notes:
N/A

Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Somehow, my version of clang-format didn't catch this.

Signed-off-by: Raul Gutierrez Segales <[email protected]>
Also, rebase on master because well-known-names.h changed.

Signed-off-by: Raul Gutierrez Segales <[email protected]>
@rgs1 rgs1 force-pushed the add-header-to-metadata-filter-for-issue-2436 branch from 5f8816e to 950cbb1 Compare June 14, 2018 22:28
@rgs1
Copy link
Member Author

rgs1 commented Jun 14, 2018

@bmetzdorf sorry, had to rebase plus i added one more commit to avoid string copies.

@rgs1
Copy link
Member Author

rgs1 commented Jun 14, 2018

Btw, I am not sold on calling this filter header-to-metadata... That's just what we called it internally. Something more descriptive would be better.

@bmetzdorf
Copy link
Contributor

Hi @rgs1.

So, the simple request_rules use case that I'm interested in works fine for plain HTTP, but as you already mentioned in #2436 it is missing regex extraction support. In addition to that though, it seems as if currently (and I know there's some work in progress to overhaul this) this (or any) http filter does not get called for websocket connections, which unfortunately is what I'd require (sigh..).

So I'm wondering if we could provide some functionality of this in the route matcher (which does work for websocket connections) instead:

    - match:
        prefix: "/"
        headers:
        - name: x-something
          value: abc([0-9]+)
          regex: true
      route:
        use_websocket: true
        cluster: mycluster
        metadata_match:
          filter_metadata:
            envoy.lb:
              digits_from_x_something: \1

We could also have two different implementations, until the websocket stuff gets refactored.

What do you think?

@rgs1
Copy link
Member Author

rgs1 commented Jun 14, 2018

@bmetzdorf we've got many many route matchers, so if we implement it there we'd need to be able to configure this at the RouteConfiguration [0] or Route [1] level — to avoid the copy/pasta. But yeah, I think it would work for us.

cc: @derekargueta

[0] https://github.com/envoyproxy/envoy/blob/master/api/envoy/api/v2/rds.proto#L39
[1] https://github.com/envoyproxy/envoy/blob/master/api/envoy/api/v2/route/route.proto#L116

@bmetzdorf
Copy link
Contributor

@rgs1 If you have many route matchers, would the Route level still be ok for you?

Long term I'd think that the HTTP filter is the right approach. The filter should work with the upcoming websocket changes. Until then the limited functionality I require could be provided in the Route level, which would be referencing a regex capture group inside metadata_match. Regex matching is already supported.

@rgs1
Copy link
Member Author

rgs1 commented Jun 15, 2018

@bmetzdorf yeah, Route level is good I think.

Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a first pass through this. I think it's headed in the right direction.

syntax = "proto3";

package envoy.config.filter.http.header_to_metadata.v2;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs option go_package = "v2";

There should be a preamble that briefly describes what the filter does and refers to the docs you start, e.g.

// [#protodoc-title: Header-To-Metadata Filter]
// The Header-to-Metadata filter ....
// :ref:`configuration overview <config_http_filters_header_to_metadata>`.

string metadata_namespace = 1;

// The key — must be present.
string key = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you append [(validate.rules).string.min_bytes = 1] (and import "validate/validate.proto")
you should get automatic validation that a value is present. (This depends on an invocation of ProtobufUtil::validate, which will happen automatically if you switch to FactoryBase for your NamedHttpFilterConfigFactory as mentioned in another comment of mine.)

string key = 2;

// The value — may be absent.
string value = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either here or in the Rule message's on_header_present field, please explain the filter's logic when this value is present or not.

For on_header_missing, this isn't really optional.

}
} else {
// Should we add metadata if the header is missing?
if (rule.onHeaderMissing.key.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate this at configuration time and remove this check.


if (value.empty()) {
// No value, skip. we could allow this though.
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding debug-level logging for all these false return cases.

enum ValueType {
STRING = 0;
NUMBER = 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to inline these enums and messages within the Config message.


message Rule {
// The header from which to extract a value.
string header = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate that this is non-empty and document that it's required?

/**
* Config registration for the header-to-metadata filter. @see NamedHttpFilterConfigFactory.
*/
class HeaderToMetadataConfig : public Server::Configuration::NamedHttpFilterConfigFactory {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, inherit from the Envoy::Extensions::HttpFilters::Common::FactoryBase template. That will allow you to just implement createFilterFactoryFromProtoTyped, which takes care of the dynamic cast (and protobuf validation rules) for you. (See, for example, the ext_authz http filter's config.)

namespace HttpFilters {
namespace HeaderToMetadataFilter {

enum class MetadataType { String, Number };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like most of these enums, structs or typedefs are not something you want to publicly expose outside your filter, so they should probably become private members of the Config class. Also, consider just using the Protobuf types for MetadataKeyValue and MetadataType. I don't think the conversion buys you much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still undecided about avoiding the conversion from Protobuf
messages to internal structs, mostly because of the LowerCaseString
header field that I'd like to avoid converting/creating on every
request. So there's a small win there — we might also want to extend
things a bit more of which point the conversions might be useful. I
did not make the Rule related structs private because they are shared
between Config and the Filter class.. I could make them private to
Config and make Filter a friend of class but at that point I am
not sure if pays off anymore. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's worth keeping the LowerCaseString. You could store the protobuf Rule objects directly along side their LowerCaseString names as a std::vector<std::pair<Http::LowerCaseString, envoy::config::filter::http::header_to_metadata::v2::Config::Rule>> or with a struct that has the LowerCaseString and the protobuf Rule as fields.

In the future, if there are other pre-computed structures necessary, I think it makes sense to start introducing extra classes.

@zuercher zuercher self-assigned this Jun 15, 2018
* updates to the proto defs (comments, validations, moved all under
  Config)
* debug-level logging for special early exit cases
* use FactoryBase instead of NamedHttpFilterConfigFactory
* made some typedefs private

I am still undecided about avoiding the conversion from Protobuf
messages to internal structs, mostly because of the LowerCaseString
header field that I'd like to avoid converting/creating on every
request. So there's a small win there — we might also want to extend
things a bit more of which point the conversions might be useful. I
did not make the Rule related structs private because they are shared
between Config and the Filter class.. I could make them private to
Config and make Filter a friend of class but at that point I am
not sure if pays off anymore. Thoughts?

I also left the `rule.onHeaderMissing.key.empty()` call as a way to
validate if the `onHeaderMissing` case for a rule is set. Looking around
for a better way to handle missing/null inner messages.

Signed-off-by: Raul Gutierrez Segales <[email protected]>
@rgs1
Copy link
Member Author

rgs1 commented Jun 15, 2018

@zuercher thanks for the review! I've addressed most of your comments in 471e25f, the commit description has details and questions about the things I am not so sure about... LMK what you think. Thanks!

Raul Gutierrez Segales added 2 commits June 15, 2018 15:17
Signed-off-by: Raul Gutierrez Segales <[email protected]>
//
// The value field is ignored for this case, since the header value
// will be used.
KeyValuePair on_header_present = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading the code, I thought it uses the value, if given, or else the header's value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct, thanks for catching this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the second sentence on the on_header_present docs still need revision (or deletion)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuercher oops — fixed in the last commit.

namespace HttpFilters {
namespace HeaderToMetadataFilter {

enum class MetadataType { String, Number };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's worth keeping the LowerCaseString. You could store the protobuf Rule objects directly along side their LowerCaseString names as a std::vector<std::pair<Http::LowerCaseString, envoy::config::filter::http::header_to_metadata::v2::Config::Rule>> or with a struct that has the LowerCaseString and the protobuf Rule as fields.

In the future, if there are other pre-computed structures necessary, I think it makes sense to start introducing extra classes.

Raul Gutierrez Segales added 2 commits June 15, 2018 16:49
Instead, lets pair the LowerCaseString header along with its
corresponding Rule protobuf message.

Signed-off-by: Raul Gutierrez Segales <[email protected]>
* Specify that a given non empty value for `on_header_present` takes
  precedence over a header value. And, if the header value is
  empty then no metadata will be added.
* Specify that an empty value for `on_header_missing` means that no
  metadata will be added.

Signed-off-by: Raul Gutierrez Segales <[email protected]>
@rgs1
Copy link
Member Author

rgs1 commented Jun 16, 2018

@zuercher i addressed your latest comments, lmk if i missed anything — thanks!

@rgs1
Copy link
Member Author

rgs1 commented Jun 16, 2018

Hmm, we should probably emit some stats (e.g.: total count of metadata keys added, total per key, etc)? We haven't had a need for them internally (yet), but I actually think they could be useful for users to validate things are working as expected. Thoughts?

@alyssawilk
Copy link
Contributor

I think per https://github.com/envoyproxy/envoy/blob/master/GOVERNANCE.md#extension-addition-policy
If you add a CODEOWNERS with you and zuercher (assuming he volunteers to own) you're good to go with his approval. And fixed format :-)

@mattklein123
Copy link
Member

Yup per @alyssawilk that's fine with me. I don't have time to review this right now. Please make sure that all the docs are in place though. Thank you!!

@rgs1
Copy link
Member Author

rgs1 commented Jun 20, 2018

@alyssawilk ah nice, will do!

@zuercher are you ok with being added to CODEOWNERS for this? :-) (no pressure)

Signed-off-by: Raul Gutierrez Segales <[email protected]>
Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just discovered you can view the generated docs from CircleCI's artifacts tab -- I noticed a couple of small things. Let's fix those and then I'll approve it and merge.


import "validate/validate.proto";

// [#protodoc-title: Header-To-Metadata Fitlter]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter is misspelled.

// The configuration for transforming headers into metadata. This is useful
// for matching load balancer subsets, logging, etc.
//
// :ref:`configuration overview <config_http_filters_header_to_metadata>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird in the generated docs (https://65061-65214191-gh.circle-artifacts.com/0/source/generated/docs/api-v2/config/filter/http/header_to_metadata/v2/header_to_metadata.proto.html). Should probably have "Header-to-Metadata" at before the link so it reads "Header-to-metadata configuration overrview."

Signed-off-by: Raul Gutierrez Segales <[email protected]>
@rgs1
Copy link
Member Author

rgs1 commented Jun 20, 2018

@zuercher done — i went with Header to Metadata though, since the IPTagging filter also interchanges IP Tagging and IPTagging.... though I'll switch back to Header-To-Metadata if you have strong feelings about it :-)

Output is here: https://65121-65214191-gh.circle-artifacts.com/0/source/generated/docs/api-v2/config/filter/http/header_to_metadata/v2/header_to_metadata.proto.html

@rgs1
Copy link
Member Author

rgs1 commented Jul 12, 2018

@bmetzdorf so #3301 was fixed/addressed, which means this should now work with websockets... We'd probably need to extend it support regexes, which was another requirement you had....

@bmetzdorf
Copy link
Contributor

Hi @rgs1 . Unfortunately my use case requires websockets with early body, which violates the websocket spec. While #3301 / #3376 might still allow this for now (I haven't tested it yet) it will be a problem in the future when strict websocket support will be enforced, in which case the http_filter approach for extracting metadata won't work for me. @alyssawilk @ggreenway Do you see any chance that we might allow early websocket bodies even in the long term?

@alyssawilk
Copy link
Contributor

We've got regression tests for early websocket bodies for both the old style and new style websockets so hopefully you'll be set.

When you say "websockets with early body" do you mean explicit content-length != 0, or websocket payload arriving with the upgrade request? I could see Envoy rejecting GET-with-body but I don't think anyone plans on killing off early data handling, especially if you comment on that regression test that there are users using it.

@bmetzdorf
Copy link
Contributor

@alyssawilk I use both, upgrade request coming in with body payload as well as a upgrade response with content-length > 0. Is that something we could entertain to support long term (maybe hidden behind a use_relaxed_websockets if needed)?

@alyssawilk
Copy link
Contributor

More likely a more generic allow_disallowed_bodies or something but I suspect we could leave it config available. As I'd likely be the person implementing any adhere-to-the-spec feature and @mattklein123 or I would likely be the ones to review if someone else writes it, we'll keep your traffic in mind :-)

@bmetzdorf
Copy link
Contributor

That sounds great! Thanks @alyssawilk !

@bmetzdorf
Copy link
Contributor

@rgs1 In regards to regex, what do you think about something like this?

http_filters:
  - name: envoy.filters.http.header_to_metadata
    config:
      request_rules:
        - header: x-version
          match: "^[0-9]+\.([0-9]+)$"
          on_header_present:
            metadata_namespace: envoy.lb
            key: version
            type: STRING
          on_header_match:
            metadata_namespace: envoy.lb
            key: minor_version
            value: "$1"
            type: STRING
          on_header_missing:
            metadata_namespace: envoy.lb
            key: default
            value: 'true'
            type: STRING
          remove: false

@rgs1
Copy link
Member Author

rgs1 commented Jul 17, 2018

@bmetzdorf generally lgtm, lets try to make it consistent with how exact match vs regex match is implemented in other places... Something like this perhaps:

https://github.com/envoyproxy/envoy/blob/master/api/envoy/api/v2/route/route.proto#L941

What do you think?

@bmetzdorf
Copy link
Contributor

@rgs1 Sure!

Do you think we should support all these cases?

  "exact_match": "...",
  "regex_match": "...",
  "range_match": "{...}",
  "present_match": "...",
  "prefix_match": "...",
  "suffix_match": "...",
  "invert_match": "..."

Or just

  "regex_match": "...",
  "invert_match": "..."

for now?

@rgs1
Copy link
Member Author

rgs1 commented Jul 17, 2018

@bmetzdorf lets start with the ones you'll need :-) So we have to keep exact_match (we can probably deprecated value), regex_match would be nice... What do you think? All the others would be depending on your needs.

@bmetzdorf
Copy link
Contributor

bmetzdorf commented Jul 17, 2018

@rgs1 I think regex_match and invert_match (to not have to use negative lookahead / lookbehind) would be sufficient for now. Should we model on_header_present with present_match instead and on_header_missing with present_match + invert_match and copy/re-use route.HeaderMatcher (see https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto.html#envoy-api-msg-route-headermatcher)?

Examples:

Now:

http_filters:
  - name: envoy.filters.http.header_to_metadata
    config:
      request_rules:
        - header: x-version
          on_header_present:
            metadata_namespace: envoy.lb
            key: version
            type: STRING
          on_header_missing:
            metadata_namespace: envoy.lb
            key: default
            value: 'true'
            type: STRING
          remove: false

Future:

http_filters:
  - name: envoy.filters.http.header_to_metadata
    config:
      request_rules:
        headers:
          - name: x-version
            present_match: true
            metadata_namespace: envoy.lb
            key: version
            type: STRING
          - name: x-version
            present_match: true
            invert_match: true
            metadata_namespace: envoy.lb
            key: default
            value: 'true'
            type: STRING
          - name: x-version
            regex_match: "^[0-9]+\.([0-9]+)$"
            metadata_namespace: envoy.lb
            key: minor_version
            value: "$1"
            type: STRING

with value being optional and defaulting to the actual complete header value (unless header is missing, then defaulting to empty string)?

@rgs1
Copy link
Member Author

rgs1 commented Jul 17, 2018

@bmetzdorf I like the idea of reusing/emulating route.HeaderMatcher... However I am not sure what the policy is with breaking config for existing filters.... @zuercher ?

@bmetzdorf
Copy link
Contributor

@rgs1 We could introduce a new key request_matche(r)s and keep request_rules ?

@rgs1
Copy link
Member Author

rgs1 commented Jul 18, 2018

Yeah that would work.

@bmetzdorf
Copy link
Contributor

bmetzdorf commented Jul 18, 2018

@rgs1 Ok, so what do you think about this (request_headers instead of request_matche(r)s) ?

http_filters:
  - name: envoy.filters.http.header_to_metadata
    config:
      request_headers:
        - name: x-version
          present_match: true
          metadata_namespace: envoy.lb
          key: version
          type: STRING
        - name: x-version
          present_match: false
          metadata_namespace: envoy.lb
          key: default
          value: 'true'
          type: STRING
        - name: x-version
          regex_match: "^[0-9]+\.([0-9]+)$"
          metadata_namespace: envoy.lb
          key: minor_version
          value: "$1"
          type: STRING

I think we would have to copy route.HeaderMatcher and add the missing definitions though..

@rgs1
Copy link
Member Author

rgs1 commented Jul 18, 2018

@bmetzdorf that looks good

@bmetzdorf
Copy link
Contributor

@rgs1 While thinking about this more I'm realizing that it might be easier to create a new filter and retire the current one later. I think it could be a little bit messy trying to cram both types of configuration and logic into the existing one. What do you think?

@rgs1
Copy link
Member Author

rgs1 commented Jul 24, 2018

@bmetzdorf sure that's fine too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants