-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] Reformat delimited payload token filter docs #49380
[DOCS] Reformat delimited payload token filter docs #49380
Conversation
Pinging @elastic/es-search (:Search/Analysis) |
Pinging @elastic/es-docs (>docs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrodewig Thanks, nice work, but needs some changes.
==== | ||
A payload is user-defined binary data associated with a token position and | ||
stored as base64-encoded bytes. Payloads are often used with the | ||
<<query-dsl-script-score-query,`script_score`>> query to calculate custom scores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess would be nice to do this some time, but currently there is no way for script_score
to access payloads. I don't know any other ES query that can deal with payloads either. The only way to access them is through _termvectors API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching this. I've updated this note to state that you can view stored payloads using the term vectors API API.
`with_positions_offsets_payloads` | ||
|
||
* Use an index analyzer that includes the `delimted_payload` filter | ||
==== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the correctness of this paragraph. So, for a text/keyword field ES can create 3 Lucene fields: 1) usual indexed field broken into terms and used for search. Here we don't sore payloads (I think) and definitely never use them in any search query. 2) a stored field with index option store:true
. But I am not sure why you mentioned it here (it seems to me that it doesn't have to do anything with payloads). 3) a term vectors field with index option of term_vector
. these are primarily used for highlighting, but can be used just for retrieval purpose as well (as in your examples). And here we can store payloads but again we don't use payloads for anything intelligent besides just retrieval.
nit: delimted_payload
-> delimited_payload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for another great catch. I've removed references to the store
requirement throughout. I also fixed the typo.
Use the <<indices-create-index,create index API>> to create an index that: | ||
|
||
* Includes a field that stores payloads. For this field, set the | ||
<<mapping-store,`store`>> mapping parameter to `true` and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why store
: true
is necessary?
Includes a field that stores payloads -> I would reformulate it to something like "stores term vectors with payloads", as it not a usual indexed field with payloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not. I've rephrased this bullet to "Includes a field that stores term vectors with payloads." as you suggested. Thanks!
Thanks for your thorough review, @mayya-sharipova. You cleared up some of my misunderstandings around storing payloads and their use cases. This PR is ready for another look at your convenience. Thanks again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jrodewig, the changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jrodewig, the changes LGTM
* Adds a title abbreviation * Relocates the older name deprecation warning * Updates the description and adds a Lucene link * Adds a note to explain payloads and how to store them * Adds analyze and custom analyzer snippets * Adds a 'Return stored payloads' example
* Adds a title abbreviation * Relocates the older name deprecation warning * Updates the description and adds a Lucene link * Adds a note to explain payloads and how to store them * Adds analyze and custom analyzer snippets * Adds a 'Return stored payloads' example
* Adds a title abbreviation * Relocates the older name deprecation warning * Updates the description and adds a Lucene link * Adds a note to explain payloads and how to store them * Adds analyze and custom analyzer snippets * Adds a 'Return stored payloads' example
Reformats the delimted payload token filter docs as part of #44726.
Changes
@mayya-sharipova @jtibshirani Would you mind looking at this at your convenience? I want to ensure the added information about payloads and term vectors works. Thanks!