-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add remove types token filter (as opposite to keep_types token filter) #29277
Comments
Pinging @elastic/es-search-aggs |
@edovac Can you please provide your use-case? How are you going to use |
Hi, sorry for the delay. The company I work in is developing an analyzer which can manage (among other things) named entity extractions and structured text.
Sections names are not fixed, the can repeat in the same text and they change on a per project basis. We need to support per extraction type queries, and also constraining the search to a specific section. We also need to support near queries between different extraction types like: Here is a mapping example:
structure-tokenizer is developed by us.
From this records we generate index tokens: example for text: first token
second token
example for people:
example for section: first token
second token
For each token we have an overlapping one in the section field. "text" field contains all tokens and will be used for match and phrase queries. To support constraints by section we use the dedicate field "section" in conjunction with field_masking_span and span_containing.
Coming back to our main point, "text" field requires all tokens except those relatives to "section" and thus the request for a "remove_types" token filter. ie:
vs.
We often have tens of different named extractions. I hope this will clarify my request :) |
Discussed in FixitFriday: we agreed to do it. Here is the plan we discussed:
|
Thanks :) |
Currently the `keep_types` token filter includes all token types specified using its `types` parameter. Lucenes TypeTokenFilter also provides a second mode where instead of keeping the specified tokens (include) they are filtered out (exclude). This change exposes this option as a new `mode` parameter that can either take the values `include` (the default, if not specified) or `exclude`. Closes elastic#29277
Currently the `keep_types` token filter includes all token types specified using its `types` parameter. Lucenes TypeTokenFilter also provides a second mode where instead of keeping the specified tokens (include) they are filtered out (exclude). This change exposes this option as a new `mode` parameter that can either take the values `include` (the default, if not specified) or `exclude`. Closes #29277
Currently the `keep_types` token filter includes all token types specified using its `types` parameter. Lucenes TypeTokenFilter also provides a second mode where instead of keeping the specified tokens (include) they are filtered out (exclude). This change exposes this option as a new `mode` parameter that can either take the values `include` (the default, if not specified) or `exclude`. Closes #29277
Describe the feature:
Hi, Elasticsearch provides the
keep_types
token filter, but does not provide a token filter to exclude specific token types from the token stream.As I understand, the
keep_types
token filter is implemented using Luceneorg.apache.lucene.analysis.core.TypeTokenFilter.TypeTokenFilter(TokenStream, Set<String>, boolean)
which implements both behaviours.It would be nice to have the remove filter too.
Elasticsearch version: 6.2
The text was updated successfully, but these errors were encountered: