Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: Introduce repeatable queries #75082

Merged
merged 6 commits into from
Sep 29, 2021
Merged

Conversation

costin
Copy link
Member

@costin costin commented Jul 7, 2021

Allow individual queries within a sequence to be repeated through a
dedicated keyword without having physical duplication.

sequence
queryA [runs=2]
queryB
queryC [runs=3]
queryD

is the same as:

sequence
queryA
queryA
queryB
queryC
queryC
queryC
queryD

but more concise.

costin added 2 commits July 7, 2021 19:41
Allow individual queries within a sequence to be repeated through a
dedicated keyword without having physical duplication.

sequence
  queryA repeat=2
  queryB
  queryC repeat=3
  queryD

is the same as:

sequence
  queryA
  queryA
  queryB
  queryC
  queryC
  queryC
  queryD

but more concise.
@costin costin added the :Analytics/EQL EQL querying label Jul 7, 2021
@costin costin requested a review from astefan July 7, 2021 17:01
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.
Some unit tests need to be added for the new exceptions. And the scenarios where multiple repeats are used in the same sequence need clarification.

query = '''
sequence
[process where opcode == 1] by unique_pid, process_path
[file where opcode == 0] by unique_pid, process_path repeat=3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the following scenario?

[file where opcode == 0] by unique_pid, process_path repeat=2
[file where opcode == 0] by unique_pid, process_path repeat=1

if (numberOfQueries > 256) {
throw new ParsingException(
source(sequenceTermCtx),
"Sequence cannot contains more than 256 queries; found [{}]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Sequence cannot contains more than 256 queries; found [{}]",
"Sequence cannot contain more than 256 queries; found [{}]",

}

if (queries.size() < 2) {
throw new ParsingException(source, "A sequence requires a minimum of 2 queries, found [{}]", queries.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new ParsingException(source, "A sequence requires a minimum of 2 queries, found [{}]", queries.size());
throw new ParsingException(source, "A sequence requires a minimum of 2 queries, found [{}]", queries.size());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also test this new exception?

@costin costin marked this pull request as ready for review September 29, 2021 10:11
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left one minor comment.

}
}

int runs = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is limited to 100, why not using byte?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Habit - since it's a local variable it doesn't make a difference.

@costin costin merged commit c7ef3a6 into elastic:master Sep 29, 2021
@costin costin deleted the eql/repeatable-queries branch September 29, 2021 12:09
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 75082

costin added a commit to costin/elasticsearch that referenced this pull request Sep 29, 2021
Allow individual queries within a sequence to be ran multiple times
through using the [runs=number] construct as a suffix without
having to redeclare the query.

sequence
  queryA [runs=2]
  queryB
  queryC [runs=3]
  queryD

is the same as:

sequence
  queryA
  queryA
  queryB
  queryC
  queryC
  queryC
  queryD

but more concise.

(cherry picked from commit c7ef3a6)
@costin
Copy link
Member Author

costin commented Sep 29, 2021

/cc @jrodewig

costin added a commit that referenced this pull request Sep 29, 2021
Allow individual queries within a sequence to be ran multiple times
through using the [runs=number] construct as a suffix without
having to redeclare the query.

sequence
queryA [runs=2]
queryB
queryC [runs=3]
queryD

is the same as:

sequence
queryA
queryA
queryB
queryC
queryC
queryC
queryD

but more concise.

(cherry picked from commit c7ef3a6)
jrodewig added a commit that referenced this pull request Sep 30, 2021
Documents the `runs` keyword for running the same event criteria successively in a sequence query.

Relates to #75082.
elasticsearchmachine pushed a commit that referenced this pull request Sep 30, 2021
Documents the `runs` keyword for running the same event criteria successively in a sequence query.

Relates to #75082.

# Conflicts:
#	docs/reference/release-notes/highlights.asciidoc
@rw-access
Copy link
Contributor

rw-access commented Oct 4, 2021

what's up with the brackets? the [runs=X] syntax doesn't make sense (choice of word aside) and now the meaning of brackets is very overloaded and confusing.

before, it just had exactly one meaning that was used in a lot of contexts. it wasn't on accident that it always contained an event filter: [event category where condition]

some of those usages: terms in a sequence or join, descendant of, until, etc.
it could also be used to index into arrays but clearly distinct field[1]

also the placement (after by) does seem a little odd in choice.

i would expect this to be more intuitive: (again, choice of word aside)

sequence
  [process where opcode == 1]          by unique_pid
  [file where    opcode == 0] runs=3   by unique_pid

@costin
Copy link
Member Author

costin commented Oct 7, 2021

Hi Ross. Thanks for the feedback.

tl;dr

To avoid stretching [], it is being replaced with with a construct that already has the semantics we want and sounds good enough.

sequence
  [process where opcode == 1] by unique_pid
  [file where    opcode == 0] by unique_pid with runs=2
  [file where    opcode == 0] by unique_pid

Long story

There were several discussions going back and forth on how to declare runs which is a meta property of the query.
The overall goal in the syntax is to preserve the duplication without affecting the core query so that moving from one declaration to multiple and back is as easy as possible:

sequence
  [queryA] by X
  [queryA] by X
  [queryA] by X

// somehow becomes
sequence
  [queryA] by X // make it 3 times

This favors declaring the property as suffix, just like by and with.
Per this PR, another goal was to minimize the amount of symbols so to avoid adding a new one (such as {}), [] was used which at the time seemed like a good idea (famous last words).

Thanks to your feedback we reconsidered the reuse of [] and opted for with since it fits a lot of the properties wanted and continues the style of using prepositions/glue words as markers and block starters.
It keeps the same rules as within sequence declaration (needs to be as a suffix), allows enumeration for possible extensions in the future and allows other potential prepositions to be added further down the road:

// variants

[queryA]
[queryB] by X
[queryC] with runs=3
[queryD] by Z with runs=2

@rw-access
Copy link
Contributor

rw-access commented Oct 7, 2021

I like it! Thanks!

costin added a commit to costin/elasticsearch that referenced this pull request Oct 10, 2021
Allow individual queries within a sequence to be repeated through a
dedicated keyword without having physical duplication.
Change from using [runs=2] to "with runs=2"

Before:

sequence
queryA [runs=2]
queryB
queryC [runs=3]
queryD

Now:

sequence
queryA with runs=2
queryB
queryC with runs=3
queryD

Which essentially is the same as:

sequence
queryA
queryA
queryB
queryC
queryC
queryC
queryD

but more concise.

Supersedes elastic#75082
costin added a commit that referenced this pull request Oct 10, 2021
Allow individual queries within a sequence to be repeated through a
dedicated keyword without having physical duplication.
Change from using [runs=2] to "with runs=2"

Before:

sequence
queryA [runs=2]
queryB
queryC [runs=3]
queryD

Now:

sequence
queryA with runs=2
queryB
queryC with runs=3
queryD

Which essentially is the same as:

sequence
queryA
queryA
queryB
queryC
queryC
queryC
queryD

but more concise.

Supersedes #75082
costin added a commit to costin/elasticsearch that referenced this pull request Oct 10, 2021
Allow individual queries within a sequence to be repeated through a
dedicated keyword without having physical duplication.
Change from using [runs=2] to "with runs=2"

Before:

sequence
queryA [runs=2]
queryB
queryC [runs=3]
queryD

Now:

sequence
queryA with runs=2
queryB
queryC with runs=3
queryD

Which essentially is the same as:

sequence
queryA
queryA
queryB
queryC
queryC
queryC
queryD

but more concise.

Supersedes elastic#75082
elasticsearchmachine pushed a commit that referenced this pull request Oct 10, 2021
Allow individual queries within a sequence to be repeated through a
dedicated keyword without having physical duplication.
Change from using [runs=2] to "with runs=2"

Before:

sequence
queryA [runs=2]
queryB
queryC [runs=3]
queryD

Now:

sequence
queryA with runs=2
queryB
queryC with runs=3
queryD

Which essentially is the same as:

sequence
queryA
queryA
queryB
queryC
queryC
queryC
queryD

but more concise.

Supersedes #75082
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants