Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

matching: introduce match tree API #13591

Closed
wants to merge 2 commits into from
Closed

Conversation

snowp
Copy link
Contributor

@snowp snowp commented Oct 15, 2020

Introduces the API that will be used to implement the matching tree for
the unified matching API effort. This is intended to be used as a very
generic and expressive matching engine, being both efficient and
flexible.

Risk Level: Low, API only
Testing: n/a
Docs Changes: Inline docs
Release Notes: n/a
#5569

Introduces the API that will be used to implement the matching tree for
the unified matching API effort. This is intended to be used as a very
generic and expressive matching engine, being both efficient and
flexible.

Signed-off-by: Snow Pettersen <[email protected]>
Signed-off-by: Snow Pettersen <[email protected]>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #13591 was opened by snowp.

see: more, trace.

Copy link
Contributor Author

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some open questions inline. Open for suggestions around naming, degree of flexibility etc.

@envoyproxy/api-shepherds @euroelessar @yangminzhu

}

// If none of the above matchers match, attempt to match against this subtree.
MatchTree no_match_tree = 3;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: should this be a MatchAction? Does making this recursive make things too flexible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is to just make this MatchAction unless there is a good reason to do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we build logic like "use exact_match for majority of routes, then do the remaining via expensive regular expression"?
Arguably for HTTP scenario specifically it can be solved by "use exact match for majority of routes, then do the rest of selection in "/" prefix match's tree" (so that no_match is never used). It's semantically the same but via some http-specific workarounds.

What issues do you have in mind by making no_match special case (using action vs tree)? And which of this issues are effectively invalidated by using "/" prefix workaround?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a good reason to keep it as a tree (per @euroelessar) no objection from me.

oneof matcher {
option (validate.required) = true;

MatchPredicate predicate = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: Should we reuse this proto? It currently has HTTP awareness, but is set up for further flexibility so we could add a generic multimap matcher to it as well.

Alternatively we could create a net new proto that looks the same but that doesn't carry with it the same API debt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what is the API debt you are referring to? I think we should reuse the existing proto if we can. If we can't that's fine but it would be nice to better understand why. I think I still have unresolved questions about whether this existing proto is going to work OK since it combines HTTP request and response in the same message (without some type of capability control to make sure we don't start buffering unless the user explicitly asks for it).

Also, can you make sure every message and field has documentation? It will make it easier to read/follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to need to switch to whatever is being done in #12938. This is part of the API tech debt, e.g. it move to StringMatchers.

Also, do we want to hardcode HTTP here? If so, why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today MatchPredicate contains

    // HTTP request headers match configuration.
    HttpHeadersMatch http_request_headers_match = 5;

    // HTTP request trailers match configuration.
    HttpHeadersMatch http_request_trailers_match = 6;

    // HTTP response headers match configuration.
    HttpHeadersMatch http_response_headers_match = 7;

    // HTTP response trailers match configuration.
    HttpHeadersMatch http_response_trailers_match = 8;

    // HTTP request generic body match configuration.
    HttpGenericBodyMatch http_request_generic_body_match = 9;

    // HTTP response generic body match configuration.
    HttpGenericBodyMatch http_response_generic_body_match = 10

which supports HTTP in a non-generic way. If we then want to extend this to support matching on arbitrary multimaps similar to how we're supporting it via the sublinear matcher, then we'd most likely add another oneof entry with GenericMultimapMatcher and deprecate all the HTTP ones. This leaves us with two ways of configuring HTTP matches for this proto, potentially going through deprecation for one of them. If we start fresh and never support these HTTP specific matchers we avoid that accumulation of debt.

I think I still have unresolved questions about whether this existing proto is going to work OK since it combines HTTP request and response in the same message (without some type of capability control to make sure we don't start buffering unless the user explicitly asks for it).

How would the user ask for this in any other way than specifying a match criteria that requires matching on buffered data? In my mind we could just look at what they configure for the match tree and if we need to buffer to determine the result we'll buffer, in which case having it all in a single proto should be fine? I might be missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by not special casing HTTP, it will force us to work through the quirks necessary to provide a generic solution that can be expressive enough for an HTTP-like protocol. I think it would be fine to probably do something like MatchPredicate but s/header/key, i.e. it needs to be something functionally equivalent but just drop the HTTP parlance. I wonder if the MatchPredicate fork in #12938 is the right place to start introducing something like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing on this topic, it looks like we actually have two qualitatively different ways to match on headers (or keys). At leaves, we have something that is MatchPredicate-like. At inner nodes, we have a more limited MultiMapMatcher. Is there a good reason for this distinction? Is this because it's that much more expensive to do the more general MatchPredicate in a structured way for inner nodes?

I worry that this mix of expressiveness will cause folks to do gymnastics. Maybe we should have a short doc listing example match structures and how they get turned into this MatchTree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distinction in my mind is about whether you're trying to get a subtree or a result out of the matching. The inner node matcher defines the branching logic while the leaf nodes define a more robust match criteria that results in the final action.

I think it would be feasible to have all matchers (both leaf and node) to both be able to return either a subtree or a final result, and support both the full matching via the predicate or the more efficient matching via the constant time lookups as part of a oneof. This would remove the distinction between leaf and node matchers, which simplifies the API in some ways, though the lack of structure might make it less obvious how to use it and provide fewer guarantees around match performance (compared to saying "if you only use node matchers you will get reasonable performance").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll start a doc to put together some examples using both the currently proposed API and an alternative API that offers no distinction between node and leaf matchers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.google.com/document/d/1PRKQ8WNgNM1IVQt9Na9X31xilwIHyZcBbJ8vhdgVUfI/edit?usp=sharing has a few examples, lmk if you think there are more examples you'd like to see.

I don't have the API specced out for joining the leaf and node protos into one, but I'm thinking we'd just have all the different matchers in a big oneof and have all the matchers be able to either return an action or another subtree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a left a few comments on the doc. It's helpful, but I find the examples to be confusing to read. Can you add more description and/or YAML comments and we can discuss there?

@mattklein123 mattklein123 self-assigned this Oct 15, 2020
@snowp
Copy link
Contributor Author

snowp commented Oct 16, 2020

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13591 (comment) was created by @snowp.

see: more, trace.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this LGTM modulo my remaining questions (similar to yours) about the use of MatchPredicate. Thank you for working on this super important feature!

/wait

@@ -16,6 +19,93 @@ option (udpa.annotations.file_status).package_version_status = ACTIVE;

// [#protodoc-title: Unified Matcher API]

// A MatchTree defines a matching tree that can be used to match incoming data, arriving at a match action that indicates what action to take in response to the match result.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please wrap to ~100 here and elsewhere.

oneof matcher {
option (validate.required) = true;

MatchPredicate predicate = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what is the API debt you are referring to? I think we should reuse the existing proto if we can. If we can't that's fine but it would be nice to better understand why. I think I still have unresolved questions about whether this existing proto is going to work OK since it combines HTTP request and response in the same message (without some type of capability control to make sure we don't start buffering unless the user explicitly asks for it).

Also, can you make sure every message and field has documentation? It will make it easier to read/follow.


// The namespace of this key. This is used to distinguish multiple maps
// that could be consulted during matching.
string key_namespace = 2 [(validate.rules).string = {min_len: 1}];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify where the namespaces will be documented? I assume at the integration points?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think it also helps to have some example yaml for common use cases, both key and key_namespace seems a bit confusing to use without examples.

}

// If none of the above matchers match, attempt to match against this subtree.
MatchTree no_match_tree = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is to just make this MatchAction unless there is a good reason to do so.


message MatchLeaf {
message LeafMatcher {
oneof matcher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: leaf_matcher_type

map<string, MatchTree> prefix_matches = 4;
}

oneof matcher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sublinear_matcher_type

// Mapping from prefix matches to the matching subtree to use when the match applies.
// If a header value matches both a prefix and an exact match, the exact match path will be taken.
// If multiple prefixes match, the longest prefix path will be taken.
map<string, MatchTree> prefix_matches = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prefixes are valid? Is empty one a valid one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it probably makes sense to not allow empty prefixes and instead have explicitly specify what you want to happen when the entry exists but doesn't match any of the entries. That gives us exact matches, prefix matches, no match but key exists and no match and key doesn't exist. This seems more explicit to me than relying on matching against empty prefixes.

Beyond this I don't think we should be limiting this at the API level, though maybe we could have the implementations for each multimap perform protocol specific validation. For example, when matching on :authority or :method we could ensure that we're testing against something that makes sense in the HTTP context.

oneof matcher {
option (validate.required) = true;

MatchPredicate predicate = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by not special casing HTTP, it will force us to work through the quirks necessary to provide a generic solution that can be expressive enough for an HTTP-like protocol. I think it would be fine to probably do something like MatchPredicate but s/header/key, i.e. it needs to be something functionally equivalent but just drop the HTTP parlance. I wonder if the MatchPredicate fork in #12938 is the right place to start introducing something like this.

oneof matcher {
option (validate.required) = true;

MatchPredicate predicate = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing on this topic, it looks like we actually have two qualitatively different ways to match on headers (or keys). At leaves, we have something that is MatchPredicate-like. At inner nodes, we have a more limited MultiMapMatcher. Is there a good reason for this distinction? Is this because it's that much more expensive to do the more general MatchPredicate in a structured way for inner nodes?

I worry that this mix of expressiveness will cause folks to do gymnastics. Maybe we should have a short doc listing example match structures and how they get turned into this MatchTree?

@@ -16,6 +19,93 @@ option (udpa.annotations.file_status).package_version_status = ACTIVE;

// [#protodoc-title: Unified Matcher API]

// A MatchTree defines a matching tree that can be used to match incoming data, arriving at a match action that indicates what action to take in response to the match result.
// This structure allows for traversal to happen as data is becoming available, allowing for optimized matching.
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth explaining what the match semantics are in terms of most specific, backtracking etc.

@snowp
Copy link
Contributor Author

snowp commented Oct 21, 2020

One thing that's missing from this API is properly handling multiple values for a given key. Given this is a net new API it seems like a good time to make sure that we consistently handle how we match against multiple header values. We could continue to do the concatenation into a comma delimited string, or build in more descriptive config options. For example, you can imagine the MultimapMatcher have an enum value for enum ValueMatchCriteria { ALL, EXACTLY_ONCE, AT_LEAST_ONCE } etc.

If we don't introduce this we should at the very least be explicit about what the default behavior is for the matcher.

@mattklein123
Copy link
Member

@markdroth can you PTAL at this PR and the comments? I think there is duplication now between your comments in the design doc and here.

// the match applies.
// If a header value matches both a prefix and an exact match, the exact
// match path will be taken.
map<string, MatchTree> exact_matches = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are multiple header values?

message LeafMatcher {
oneof matcher {
option (validate.required) = true;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly, other types of matchers can just be supported in this oneof ? e.g. IP, port, x509 certificate, etc.

bool skip = 1 [(validate.rules).bool = {const: true}];

// The specified callback value should be passed to the associated operation.
string callback = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a message to give more flexibility for further extension? for example, passing additional information to the callback, etc.


// The namespace of this key. This is used to distinguish multiple maps
// that could be consulted during matching.
string key_namespace = 2 [(validate.rules).string = {min_len: 1}];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think it also helps to have some example yaml for common use cases, both key and key_namespace seems a bit confusing to use without examples.


// Mapping from prefix matches to the matching subtree to use when the match applies.
// If a header value matches both a prefix and an exact match, the exact match path will be taken.
// If multiple prefixes match, the longest prefix path will be taken.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to make sure I understand this correctly, the key of the map is used to do a prefix (or exact) match against the headers? For example, the following config

key_namespace: request_headers
key: foo
prefix_matches:
  abc: ...

will match the header foo with value of prefix abc (e.g. abc, abcd, etc.)

What are the reasons to use a map here? Avoid duplicate?

message SublinearMatcher {
message MultimapMatcher {
// The key to apply this matcher for.
string key = 1 [(validate.rules).string = {min_len: 1}];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder should this be a message to give more flexibility later, I guess some namespace might have a slight different definition of key, like the dynamic metadata could have multiple keys instead of just 1.

@snowp
Copy link
Contributor Author

snowp commented Nov 6, 2020

Closing this in favor of #13926 with the new API

@snowp snowp closed this Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants