-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow range types to be used for enrich matching #76110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks pretty good here, though I can't seem to remember if we had plans to separate ranges out from the match
enrich policy type. Searching for data using indexed ranges feels closer to how geo_match
works, but the underlying query DSL is indeed closer to how match
would write it.
@martijnvg do you remember if that was the plan or not?
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyRunner.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/EnrichPolicyRunnerTests.java
Show resolved
Hide resolved
Apologies for responding late to this PR.
Yes, the plan was to introduce a new enrich policy type for ranges. |
Since we have two separate policy types (match and geo_match) for different query types, I think I'd lean more toward keeping design parity and introducing a third policy type. While the underlying query DSL still uses a match query, we always intended the match policy type to enrich based on singular values. Adding a range policy type gives us the flexibility to support future growth, while adding range support in the match policy type may muddy the waters going forward should the functionality be duplicated. Does the policy type pertain more to the query DSL created, or is it more concerned with the field type of the enrich key? I'll throw the team discuss label on here to see if we can reach consensus. |
Pinging @elastic/es-core-features (Team:Core/Features) |
Muddying the waters when it comes to policy types is a valid concern if this change would get merged. Thinking from that perspective, maybe the match processor should not be changed to use more field types for the match field.
I think it is both. It controls what field type to use the match field (during policy execution) and the query that is executed against this field (in the processor implementation). |
Adding tests for multi level + fix.
No worries, you should enjoy your PTO.
Missed this, will have a closer look at it to see what I've missed. Added the changes to make it a separate policy but realized I forgot about the configuring part. I'll see if I can pick this up later. |
Looking into #65781 , it seems a format field is introduced for date but no such parameter is described for https://www.elastic.co/guide/en/elasticsearch/reference/7.14/query-dsl-term-query.html in our docs (which seems incomplete as it doesn't describe the range matching either) it is described in the type description for the mapping https://www.elastic.co/guide/en/elasticsearch/reference/7.14/range.html It is described for the range query but that is for taking a range as the input to match on a value (or possible range), while I'm sure some user can find a reason to match like that I think that would be a different policy (and PR) it does make me wonder if the naming for the policy we introduce here is the right one. And potentially re-evaluate the split? I should add some tests for dates including exotic date formats that it won't magically pick to check if we need the format configured and if it needs to be user specified or copied from the mapping of the source / enrich index. |
@mjmbischoff Looks like the
Right, perhaps ranges should be handled by a different kind of policy, like the one proposed in #65781. |
Going with RANGE_TYPE, allowing RANGE_MATCH_TYPE or better name to be used for matching ranges against values, ranges. Added tests for date_range which indeed triggered on the format being missing. Similar to types, requiring a singular distinct format specification or otherwise we fail and inform the user. If someone feels they need can argue that they need multiple source indicies with different format fields feel free to open an ER arguing your case. Merging these is non-trivial, because one format might mask the other etc. |
@martijnvg @jbaiera think it's ready, can you give it a check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @mjmbischoff! I left a few more comments around simplification.
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyRunner.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyRunner.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyRunner.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyRunner.java
Outdated
Show resolved
Hide resolved
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick update! I left a few more comments.
...n/enrich/qa/common/src/main/java/org/elasticsearch/test/enrich/CommonEnrichRestTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyRunner.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyRunner.java
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyRunner.java
Show resolved
Hide resolved
adding randomness to test
omit format field if default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this PR @mjmbischoff.
I think this PR looks good now.
@jbaiera do you think this PR is good to get merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the changes!
@martijnvg I did the honors, is it ok if I leave backporting (if applicable) to you? |
🎉
Yes, I will do the back porting to the 7.x branch. |
Backporting elastic#76110 to 7.x Added RANGE_POLICY, The code expects source indicies, if multiple are specified, to have consistent type and if applicable format across these. Pulled in https://github.com/elastic/elasticsearch/pull/65781/files#diff-be6652cff75cf1e283570c14fd5dea042ddb586ac5645f7deef6ef3716732906 and https://github.com/elastic/elasticsearch/pull/65781/files#diff-d893db3430284d2635214c41f0cb66b3ca154d6cd73c29c5c97031ece40d2904 from probakowski
Backporting #76110 to 7.x Added RANGE_POLICY, The code expects source indicies, if multiple are specified, to have consistent type and if applicable format across these. Pulled in https://github.com/elastic/elasticsearch/pull/65781/files#diff-be6652cff75cf1e283570c14fd5dea042ddb586ac5645f7deef6ef3716732906 and https://github.com/elastic/elasticsearch/pull/65781/files#diff-d893db3430284d2635214c41f0cb66b3ca154d6cd73c29c5c97031ece40d2904 from probakowski Co-authored-by: Michael Bischoff <[email protected]>
Currently we hardcode that the data type for the match field on the enrich index, when the MATCH policy is used, this is (always) keyword. This breaks look ups when source indicies use a rangetype for lookup. For example when a ip_range is used the literal value from the source is indexed as a keyword and therefor doesn't match any of the ip's.
This PR improves the code by checking for the field type on the source indicies and then proceeds to use the range type in the enrich index if this is the type for all source indicies. If the types are different across source indices, we fallback to keyword. For non-range types we also fallback to keyword, emulating the current behavior.
gradle check
?