Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KQL] Support fuzzy queries #54343

Closed
Tracked by #166068
timroes opened this issue Jan 9, 2020 · 11 comments
Closed
Tracked by #166068

[KQL] Support fuzzy queries #54343

timroes opened this issue Jan 9, 2020 · 11 comments
Labels
enhancement New value added to drive a business result Feature:KQL KQL Feature:Search Querying infrastructure in Kibana impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:medium Medium Level of Effort Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL.

Comments

@timroes
Copy link
Contributor

timroes commented Jan 9, 2020

Currently, the Elasticsearch query string query supports fuzzy queries, which allows searching for terms similar to a search term.

For example, quikc~ would search for terms similar to "quikc" (such as "quick"). The query string syntax also supports edit distances (see https://www.elastic.co/guide/en/elasticsearch/reference/8.0/query-dsl-fuzzy-query.html for more details).

This issue is a placeholder for adding support directly in KQL for fuzzy queries. The syntax may not remain the same, but the concept & functionality would.

(Original description)

Add a fuzzy operator (similar to Lucene's ~) to KQL. This would allow also using fuzzy search in KQL, which is currently a more common use-case to still switch back to Lucene.

In KQL I wouldn't make the fuzzy distance configurable, but rather introduce a syntax like: name :~ Mat, which would automatically transfer to the auto fuzzy distance. (Syntax is just a suggestion).

@timroes timroes added enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:KQL KQL labels Jan 9, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@Bargs
Copy link
Contributor

Bargs commented Jan 9, 2020

I think this is the first I've heard a request for fuzzy search in KQL. Has it come up for you often? We intentionally left stuff like this out of the first iteration to keep KQL very simple. I think a lot of magic syntax makes the query language hard to understand. If it's a niche use case we could also consider making it possible in a filter instead of in KQL. I think it's easier to support more complex querying like this in filters since it's possible to put helpful descriptions in the filter editor UI, for both the person creating the filter and anyone looking at it later on.

@timroes
Copy link
Contributor Author

timroes commented Jan 10, 2020

I agree with you. Haven't heard it coming up often yet. @tamros since you brought this up, what is your feeling of how often people are actually looking for this, or do you think it's coming up in trainings more often, because we kind of hint people towards it?

@tamros
Copy link

tamros commented Jan 13, 2020

We have students asking while we are teaching Kibana Data Analyst course, you could be right, that we hint people towards this as we teach lucene fuzziness they kind of want to know how to do the KQL, they really like the contextual search.

Talking about use cases, for security use cases is useful as you can get variations of a website.

@timroes timroes added Team:AppArch and removed Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Feb 20, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch (Team:AppArch)

@besimorhino
Copy link

This would be very helpful for many infosec use cases. Attackers will do typo squatting or rename files to confuse defenders.

Typo squatting use case:
attacker registers m1crosoft.com (with the right font, you might not see the i/1 swap... most often this is with trickier homoglyphs)

The following makes a dead easy detect:
-domain.name:"microsoft.com" AND domain.name:"microsoft.com"~

File name confusion:
svchost.exe is used to launch services in Windows. Attackers may name malware scvhost.exe

Fuzzy search to the rescue:
-file.name:"svchost.exe" AND file.name:"svchost.exe"~

I get that this might be a tricky request to implement, but there are many who would benefit from this.

@markharwood
Copy link
Contributor

The following makes a dead easy detect:
-domain.name:"microsoft.com" AND domain.name:"microsoft.com"~

I call that the "Like this but not this" querying pattern and it's very useful.

You can also use it to find mis-classified content. One example was examining police intel reports searching for those tagged with one of their "customers" - entity:31567. I used significant_text aggregation on text of reports to discover the name of the cafe where he sold heroin, the girlfriend's name etc. These discriminating words associated with 31567 are then ORed and searched for - but NOT-ing entity:31567. Results are a relevance-ranked list of reports that should have been tagged with 31567 but weren't. Shame we don't support significant_text and therefore this technique.

@thefoodiecoder
Copy link

I think this is useful for quick POCs. Although the implementation may not require, but I had to write code just to test my concepts multiple times.

@exalate-issue-sync exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Jun 2, 2021
@smyttie
Copy link

smyttie commented Nov 26, 2021

Would be very useful for our "non-scripting-abled" users. They have to lookup people's names a lot and a lot of names are written in different ways. Fuzzy searching to the rescue, but then techies have to jump in to write the Lucene scripts for them.

@lukasolson lukasolson changed the title Add a fuzzy operator to KQL [KQL] Support fuzzy queries Feb 22, 2022
@exalate-issue-sync exalate-issue-sync bot added loe:medium Medium Level of Effort and removed loe:small Small Level of Effort labels Apr 7, 2022
@petrklapka petrklapka added Feature:Search Querying infrastructure in Kibana Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. and removed Team:AppServicesSv labels Nov 23, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@kertal
Copy link
Member

kertal commented Oct 30, 2023

Closing this because it's not planned to be resolved in the foreseeable future. It will be tracked in our Icebox and will be re-opened if our priorities change. Feel free to re-open if you think it should be melted sooner.

@kertal kertal closed this as not planned Won't fix, can't repro, duplicate, stale Oct 30, 2023
@github-project-automation github-project-automation bot moved this to Done in current release in kibana-app-arch Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:KQL KQL Feature:Search Querying infrastructure in Kibana impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:medium Medium Level of Effort Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL.
Projects
Status: Done in current release
Development

No branches or pull requests

10 participants