Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Reformat delimited payload token filter docs #49380

Merged
merged 6 commits into from
Nov 25, 2019
Merged

[DOCS] Reformat delimited payload token filter docs #49380

merged 6 commits into from
Nov 25, 2019

Conversation

jrodewig
Copy link
Contributor

Reformats the delimted payload token filter docs as part of #44726.

Changes

  • Adds a title abbreviation
  • Relocates the older name deprecation warning
  • Updates the description and adds a Lucene link
  • Adds a note to explain payloads and how to store them
  • Adds analyze and custom analyzer snippets
  • Adds a 'Return stored payloads' example

@mayya-sharipova @jtibshirani Would you mind looking at this at your convenience? I want to ensure the added information about payloads and term vectors works. Thanks!

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Analysis)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

@jrodewig jrodewig changed the title [DOCS] Reformat delimited payload token filters [DOCS] Reformat delimited payload token filter docs Nov 20, 2019
Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrodewig Thanks, nice work, but needs some changes.

====
A payload is user-defined binary data associated with a token position and
stored as base64-encoded bytes. Payloads are often used with the
<<query-dsl-script-score-query,`script_score`>> query to calculate custom scores
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess would be nice to do this some time, but currently there is no way for script_score to access payloads. I don't know any other ES query that can deal with payloads either. The only way to access them is through _termvectors API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this. I've updated this note to state that you can view stored payloads using the term vectors API API.

`with_positions_offsets_payloads`

* Use an index analyzer that includes the `delimted_payload` filter
====
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the correctness of this paragraph. So, for a text/keyword field ES can create 3 Lucene fields: 1) usual indexed field broken into terms and used for search. Here we don't sore payloads (I think) and definitely never use them in any search query. 2) a stored field with index option store:true. But I am not sure why you mentioned it here (it seems to me that it doesn't have to do anything with payloads). 3) a term vectors field with index option of term_vector. these are primarily used for highlighting, but can be used just for retrieval purpose as well (as in your examples). And here we can store payloads but again we don't use payloads for anything intelligent besides just retrieval.

nit: delimted_payload -> delimited_payload

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for another great catch. I've removed references to the store requirement throughout. I also fixed the typo.

Use the <<indices-create-index,create index API>> to create an index that:

* Includes a field that stores payloads. For this field, set the
<<mapping-store,`store`>> mapping parameter to `true` and the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why store : true is necessary?

Includes a field that stores payloads -> I would reformulate it to something like "stores term vectors with payloads", as it not a usual indexed field with payloads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not. I've rephrased this bullet to "Includes a field that stores term vectors with payloads." as you suggested. Thanks!

@jrodewig
Copy link
Contributor Author

Thanks for your thorough review, @mayya-sharipova. You cleared up some of my misunderstandings around storing payloads and their use cases.

This PR is ready for another look at your convenience. Thanks again!

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jrodewig, the changes LGTM

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jrodewig, the changes LGTM

@jrodewig jrodewig merged commit 1471f34 into elastic:master Nov 25, 2019
@jrodewig jrodewig deleted the reformat.delimited-payload-token-filter branch November 25, 2019 20:38
jrodewig added a commit that referenced this pull request Nov 25, 2019
* Adds a title abbreviation
* Relocates the older name deprecation warning
* Updates the description and adds a Lucene link
* Adds a note to explain payloads and how to store them
* Adds analyze and custom analyzer snippets
* Adds a 'Return stored payloads' example
jrodewig added a commit that referenced this pull request Nov 25, 2019
* Adds a title abbreviation
* Relocates the older name deprecation warning
* Updates the description and adds a Lucene link
* Adds a note to explain payloads and how to store them
* Adds analyze and custom analyzer snippets
* Adds a 'Return stored payloads' example
jrodewig added a commit that referenced this pull request Nov 25, 2019
* Adds a title abbreviation
* Relocates the older name deprecation warning
* Updates the description and adds a Lucene link
* Adds a note to explain payloads and how to store them
* Adds analyze and custom analyzer snippets
* Adds a 'Return stored payloads' example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants