Add support for `wildcard` field type #5639

epiphone · 2022-12-27T08:35:09Z

Elasticsearch added the wildcard field type in v7.9. Are there any plans to support the field type in OpenSearch?

Thanks!

The text was updated successfully, but these errors were encountered:

tlfeng · 2022-12-27T09:06:54Z

A clarification for others, Elasticsearch added wildcard field type as an x-pack feature in version 7.9.0, which is not an open-source feature. See the user guide of Elasticsearch 7.9 https://www.elastic.co/guide/en/elasticsearch/reference/7.9/keyword.html#wildcard-field-type

macrakis · 2023-01-24T16:17:37Z

@epiphone Could you describe your use case?
I do understand that it speeds up wildcard searches, but why is wildcard search performance critical in your application? How many unique values are in your dataset and how big are they?
Are there any good workarounds?

josefschiefer27 · 2023-01-28T20:42:11Z

Some good use cases for wildcard:

Matching error messages and stack traces
Matching URL and file paths
Matching fields that have encoded content (e.g. "(8-10)||86128||Women's Apparel||...")

epiphone · 2023-01-30T07:01:27Z

@macrakis my use case is a large index of user-submitted names where most names are short (<100 characters) and unique, and I want to query the names by arbitrary substrings.

As a workaround I'm using an ngram tokenizer which works well enough but is more complicated to set up than the wildcard field type.

macrakis · 2023-01-30T21:17:53Z

Josef, Epiphone, thanks for your answers -- very helpful!

So it sounds like you need to find arbitrary substrings in your corpus, not just strings starting at token boundaries.

That would be not just "clerc" in "Leclerc", but also "ecle", not just "org/open" in "server/src/main/java/org/opensearch/index/query" but also "ense" in that pathname.

Could the problems be solved by different tokenization?

josefschiefer27 · 2023-01-31T05:33:44Z

What makes the 'wildcard' data-type nice is that it is optimized for fields with large values or high cardinality for wildcard and regexp queries without changing the search experiences (e.g. searching via *ense*) and without worrying about tokenization.

stevesimpson418 · 2023-02-08T10:08:21Z

We have a use case for this where we need to index large XML documents that are > 32766 bytes. Our users want to be able to search for a string in an XML document eg *failuremessage122* or just *failure* or even *fail*. A keyword field type would make sense (despite the poor leading wildcard performance) but this is not possible due to exceeding the 32766 byte limit.

Tokenisation is also problematic with XML docs and we also get issues where we have token explosion with > 10000 terms generated when using most of the analyzers.

The XML logstash filter was considered but has similar issues with large documents producing a huge amount of fields. We don't always know ahead of time which elements we need to search for so that pre-processing of data isn't really an option.

Support for a "wildcard" field type would really improve our user experience

vindurriel · 2023-02-22T08:55:58Z

wildcard field type has been supported sans x-pack since ElasticSearch 7.11
https://www.elastic.co/guide/en/elasticsearch/reference/7.11/keyword.html#wildcard-field-type
any plans to support it in OpenSearch?

dblock · 2023-03-03T19:55:12Z

AFAIK nobody is working on this.

If someone wants to give it a shot, there are folks contributing flattened field type via #1018, and looks like there’s a draft PR in #6507 - can be used as an inspiration.

Please note that we cannot accept any code from ES > 7.10.2, which was the last version under APLv2. Would welcome an independent implementation that doesn't look at anything under an incompatible license.

macrakis · 2023-03-30T01:51:50Z

Re using wildcard field for XML (#5639 (comment)), I wonder if you could use the XML logstash filter and then the Flat field type which is coming out in 2.7? (#1018 (comment)).

stevesimpson418 · 2023-03-31T12:01:34Z

@macrakis I've had a look through the docs for a Flat field type and without ruling it out completely I'd have some concerns:

Flat fields are useful when a field and its subfields will mostly be read, and not be used as search criteria.
Performance should be similar to a keyword field. I imagine this will be awful for wildcard searches where we need to search for "foo" OR "bar" in the XML.

That being said I'd be willing to give it a try in our development environments when this feature is released

msfroh · 2024-02-29T22:17:27Z

There was some good discussion over on #12500, which highlighted the value of wildcard fields.

Also, Elastic's blog post about the feature provides a really good explanation: https://www.elastic.co/blog/find-strings-within-strings-faster-with-the-new-elasticsearch-wildcard-field

sandervandegeijn · 2024-04-03T21:44:08Z

Another reason to implement it: if you want to use ECS 8.12, it's used in the standard component templates. Trying to load them:

{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"No handler for type [wildcard] declared on field [content]"}],"type":"mapper_parsing_exception","reason":"Failed to parse mapping [_doc]: No handler for type [wildcard] declared on field [content]","caused_by":{"type":"mapper_parsing_exception","reason":"No handler for type [wildcard] declared on field [content]"}},"status":400}_component_template/ecs_8.0.0_http

stowns · 2024-05-10T14:37:56Z

What makes the 'wildcard' data-type nice is that it is optimized for fields with large values or high cardinality for wildcard and regexp queries without changing the search experiences (e.g. searching via ense) and without worrying about tokenization.

This is similar to our use-case. We are storing large json objects (log data) where the json keys are not known in advance. We are using flat_object for this but cannot store values larger than 32kb. The wildcard type allows for values > 32kb and would save us from having to drop fields > 32kb before indexing.

getsaurabh02 · 2024-05-21T16:11:45Z

There is a draft PR out now: #13461 (comment)

sandervandegeijn · 2024-06-11T09:16:27Z

Fantastic thanks!

epiphone added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 27, 2022

tlfeng added the Indexing Indexing, Bulk Indexing and anything related to indexing label Dec 27, 2022

anasalkouz removed the untriaged label Dec 27, 2022

dblock changed the title ~~wildcard field type~~ Add support for wildcard field type Dec 30, 2022

anasalkouz added Migration:ReqReview and removed Migration:ReqReview labels Mar 16, 2023

macohen added help wanted Extra attention is needed Search Search query, autocomplete ...etc labels Mar 23, 2023

macohen mentioned this issue May 23, 2023

[RFC] Add Field Type Label #7693

Closed

macohen added the Search:Query Capabilities label Oct 19, 2023

ankitkala added the feature New feature or request label Dec 7, 2023

msfroh mentioned this issue Feb 29, 2024

[BUG] regexp and wildcard query giving false negatives #12500

Closed

msfroh mentioned this issue Apr 30, 2024

Add support for wildcard field type #13461

Merged

8 tasks

msfroh mentioned this issue May 21, 2024

[BUG] Poor performance of regex queries #5097

Open

getsaurabh02 added the v2.15.0 Issues and PRs related to version 2.15.0 label May 28, 2024

msfroh closed this as completed in #13461 Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `wildcard` field type #5639

Add support for `wildcard` field type #5639

epiphone commented Dec 27, 2022

tlfeng commented Dec 27, 2022

macrakis commented Jan 24, 2023

josefschiefer27 commented Jan 28, 2023

epiphone commented Jan 30, 2023

macrakis commented Jan 30, 2023

josefschiefer27 commented Jan 31, 2023 •

edited

Loading

stevesimpson418 commented Feb 8, 2023 •

edited

Loading

vindurriel commented Feb 22, 2023

dblock commented Mar 3, 2023 •

edited

Loading

macrakis commented Mar 30, 2023

stevesimpson418 commented Mar 31, 2023

msfroh commented Feb 29, 2024

sandervandegeijn commented Apr 3, 2024

stowns commented May 10, 2024

getsaurabh02 commented May 21, 2024

sandervandegeijn commented Jun 11, 2024

Add support for wildcard field type #5639

Add support for wildcard field type #5639

Comments

epiphone commented Dec 27, 2022

tlfeng commented Dec 27, 2022

macrakis commented Jan 24, 2023

josefschiefer27 commented Jan 28, 2023

epiphone commented Jan 30, 2023

macrakis commented Jan 30, 2023

josefschiefer27 commented Jan 31, 2023 • edited Loading

stevesimpson418 commented Feb 8, 2023 • edited Loading

vindurriel commented Feb 22, 2023

dblock commented Mar 3, 2023 • edited Loading

macrakis commented Mar 30, 2023

stevesimpson418 commented Mar 31, 2023

msfroh commented Feb 29, 2024

sandervandegeijn commented Apr 3, 2024

stowns commented May 10, 2024

getsaurabh02 commented May 21, 2024

sandervandegeijn commented Jun 11, 2024

Add support for `wildcard` field type #5639

Add support for `wildcard` field type #5639

josefschiefer27 commented Jan 31, 2023 •

edited

Loading

stevesimpson418 commented Feb 8, 2023 •

edited

Loading

dblock commented Mar 3, 2023 •

edited

Loading