-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New terms_enum API for discovering terms in the index. #66452
Conversation
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markharwood Exciting API , I envision it to be very useful . I did an initial pass in review, mostly questions.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TermCount.java
Outdated
Show resolved
Hide resolved
...plugin/core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TermEnumResponse.java
Outdated
Show resolved
Hide resolved
...k/plugin/core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TermEnumAction.java
Outdated
Show resolved
Hide resolved
When it comes to the HLRC I'm unsure where best to place the logic:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first pass to review the main API and the options.
I think we're mixing multiple use cases and allow too many options. I left some comments to simplify the API. I'll make a second one to review the concrete actions in a follow up.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TermCount.java
Outdated
Show resolved
Hide resolved
0a1a2d3
to
b48a473
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay.
I left another round of comments regarding the _terms action.
...in/core/src/main/java/org/elasticsearch/xpack/core/termenum/action/ShardTermEnumRequest.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
fa6eaf1
to
7e2ce35
Compare
eb50f69
to
9d955e1
Compare
@elastic-jb Just updated this PR if you want to AB test the sort-by-popularity with current Kibana terms-agg approach. This PR only considers up to a max of 10k matching terms on a node and returns Note, in the above example of music artists sorted by popularity, the results are incomplete because there are more than 10k bands starting with |
@jimczi @elastic-jb @lizozom @giladgal I tried this PR out compared with the existing Kibana impl (terms agg with I had plans to automate the comparisons but there were some interesting findings just from manually playing with some datasets. 1) The existing Kibana impl is case sensitiveUnless the user has chosen to add normalisation to the mapping at index-time, there is no case insensitivity to searches: 2) The existing Kibana impl misses stuffSampling the first 100k docs has no guarantee of finding the term the user is looking for as in this example: 3) The existing Kibana impl is slow in ways we didn't realiseOne of my datasets has many values per doc - person profiles have a list of band "likes" related to the user. This acts as big multiplier to the number of regex tests the 4) The new impl can be inaccurate (but there's a simple solution)For a high cardinality field and a short search string the results from the new impl can be inaccurate. In this example search for |
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
8d528a9
to
6869b7f
Compare
Two things came out of discussions today
|
d481104
to
7bbf5b2
Compare
… core/src/yamlRestTest
…Added security tests
…to case insensitive search
…mRequest constructor to TransportTermsEnumAction#asyncNodeOperation
bcf76a4
to
22312cf
Compare
A search string is supplied which is used as prefix for matching terms found in a given field in the index.
A timeout can limit the amount of time spent looking for matches.
Designed for use in Kibana auto-complete use cases.
Kibana requests for this API would typically look like this:
The time range would avoid any indices that fall outside of the range but does not filter any doc values in overlapping indices. The tier clause would avoid hitting frozen/cold indices.
An optional
timeout
time value can also be passed (default is "1s", one second).The response looks like this:
Any requests that hit the
timeout
setting will return"complete":false