Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL searches only case-sensitive #107

Open
Mat0vu opened this issue Dec 10, 2024 · 4 comments
Open

ESQL searches only case-sensitive #107

Mat0vu opened this issue Dec 10, 2024 · 4 comments

Comments

@Mat0vu
Copy link
Contributor

Mat0vu commented Dec 10, 2024

Hi everyone,

last week I discovered some issues regarding case-insensitive search in Elasticsearch using ESQL.
I was testing something with a very simple rule, trying to find a commandline containing git.exe:

detection:
  condition: selection
  selection:
    CommandLine|contains: git.exe

which resulted in ... | where process.command_line Like "*git.exe*" (using the ecs pipeline) and was working as expected until I´ve changed the search term for *Git.exe*. With this there was no hit using ESQL.

In the ecs documentation I found that the regular process.command_line is of type keyword but contains a .text subfield which should be case-insensitive. So I did some experiments with both fields and using ESQL in comparison to KQL.

KQL Search Query ESQL Search Query Found with KQL Found with ESQL Comment
process.command_line: git.exe Where process.command_line Like "git.exe" Yes Yes
process.command_line: Git.exe Where process.command_line Like "Git.exe" No No
process.command_line.text: git.exe Where process.command_line.text Like "git.exe" Yes Yes
process.command_line.text: Git.exe Where process.command_line.text Like "Git.exe" Yes No .text subfield is case-insensitive for KQL; ESQL is still case-sensitive

I was expecting that when I use the .text subfield I would also find something with ESQL, because the field should allow for case-insensitive searches but this was not the case.
After some digging in the github repo of Elastic I found out that they had case-insensitive operators implemented for ESQL but they were disabled again and there doesn´t seem to be any current developments (elastic/elasticsearch#105603).

We got in contact with the support to check how we can get a case-insensitive search behaviour using ESQL and got the feedback that:

  • like and rlike operators are case-sensitive in ESQL
  • to achieve case-insenitive behaviour we should use the TO_LOWER or TO_UPPER function in combination with a lower-case/upper-case search term, e.g. to match IntelliJ one could use ... | Where TO_LOWER(process.command_line) Like "*intellij*"

I tested this approach and it is working although I think it is not very nice to need such a function.

To sum it up, currently the ESQL backend is searching case-sensitive which is not following the sigma specification to treat all strings as case-insensitive and it will lead to missed detections.
Probably the only way to get case-insensitive behaviour is by following the suggestion from the support and implement these functions in the backend. What do you think?

@abulhol
Copy link

abulhol commented Dec 18, 2024

Please note I have incurred another big problem with LIKE operators in ESQL:
elastic/elasticsearch#118932
This lets a total of 645 rules (~25% of all rules) from the main Sigma repo fail.

We definitely need to do some work on this project here to make Sigma fully work with ESQL.

@thomaspatzke
Copy link
Member

Oh my...Elastic case-sensitiveness really drives me crazy 🙁

Just to clarify, this:

  • like and rlike operators are case-sensitive in ESQL

Is falsified by your insights, correct?

TO_LOWER seems the only option to me here, but I fear this could result in a really bad query performance. Did you tests of the performance?

@Mat0vu
Copy link
Contributor Author

Mat0vu commented Dec 30, 2024

Hi Thomas,

unfortunately the statement

like and rlike operators are case-sensitive in ESQL

is true and also confirmed my results from the little experiment.
Unless these operators are combined with something like TO_LOWER, they will miss all strings with a single different capitalization.

The only case-insensitive operator that I have found in ESQL is =~. However, this operator is currently deactivated again (see the linked issue in the initial text) and would only return full matches anyways.

Since the Elastic-support wrote that (r)like + TO_LOWER is the only solution, I fear that right now and also for the near future there is no alternative way, however I havent tested the performance impact yet. I can try to run some experiments in the new year and also ask Elastic if they can give out some statistics about this. If you agree that we should follow this path, I can also start adapting the ESQL-Backend to use TO_LOWER...

I agree with you, this topic is a real pain :(
Also, I do not understand why ESQL does not support (r)like~, which works case-insensitive, but only for EQL...

Anyways, happy new year in advance :)

@Mat0vu
Copy link
Contributor Author

Mat0vu commented Jan 14, 2025

Hi,

I wanted to share the latest information from the Elastic Support regarding ESQL:

  • There are currently no plans to make ESQL case-insensitive by default.
  • The only method to perform case-insensitive searches in ESQL is by using the TO_UPPER/TO_LOWER functions (see ESQL: look to pushdown case insensitive functions elastic/elasticsearch#118304).
  • While they are not able/willing to share specific benchmarks, some customers have reported significant performance decreases to them. I won't disclose the exact figures due to confidentiality statements in the support communications, but the performance impact is quite substantial, particularly when multiple functions are used on various fields within a single query.

Given the lack of alternatives, I have begun updating the ESQL backend to incorporate these functions (you can find the progress here: https://github.com/Mat0vu/pySigma-backend-elasticsearch/tree/case-insensitive-esql). This still requires some additional work and will contain a lot of changes because all unit tests must also be revised to accommodate the need for lowercase search strings.

It is probably needed to include a warning indicating that ESQL may not be the best option for basic search operations and that users are encouraged to consider alternative languages when possible. Especially if the rule does not exclusively utilize the |cased sigma operators, which likely applies for most rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants