Add support for 'size' in EQL Sample queries #87846

luigidellaquila · 2022-06-20T13:19:45Z

EQL queries allow to pass a "size" parameter to limit the number of returned results:

{
  "size": 3,
  "query": "<an EQL query>"
}

This PR adds support for this parameter on Sample queries.

elasticmachine · 2022-06-21T09:19:44Z

Pinging @elastic/es-ql (Team:QL)

astefan

Left some comments

astefan · 2022-06-21T15:52:52Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/plan/logical/Sample.java

@@ -15,21 +15,28 @@

 public class Sample extends AbstractJoin {

-    public Sample(Source source, List<KeyedFilter> queries) {


I think you need to override hashCode and equals to include limit as well.

astefan · 2022-06-21T16:35:42Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/plan/physical/SampleExec.java

@@ -24,20 +25,22 @@
 public class SampleExec extends PhysicalPlan {

    private final List<List<Attribute>> keys;
+    private final Limit limit;


why not an Integer here?

That's what I did at first, but then I realized that a Limit could have a few advantages:

it's consistent with the other EQL command implementations (eg. SequenceMatcher uses a Limit as well)

it anticipates the need for pagination, that is pretty natural in most scenarios (see also here)

it doesn't really add any significant complication or overhead

Anyway, it's not strictly necessary for the scope of this PR, so let's discuss it; if there is no agreement, I can easily reduce it to a number.

Limit it is.

astefan · 2022-06-21T16:36:05Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/plan/physical/SampleExec.java

@@ -24,20 +25,22 @@
 public class SampleExec extends PhysicalPlan {


And, again, hashCode and equals methods updated.

astefan · 2022-06-21T16:36:41Z

...ck/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sample/SampleIterator.java

@@ -174,8 +178,16 @@ private void finalStep(ActionListener<Payload> listener) {
                if (docGroupsCounter == maxCriteria) {
                    List<SearchHit> match = matchSample(sample, maxCriteria);
                    if (match != null) {
-                        finalSamples.add(match);
-                        samples.add(new Sample(sampleKeys.get(responseIndex / maxCriteria), match));
+                        if (samplesDiscarded < limit.offset()) {


WHy do you use both offset and limit?

As long as we have Limit (see previous comment), IMHO it makes sense to have a complete support for it.
The logic is trivial and the added value could be significant: with a small additional effort on syntax, we'll have at least a basic support for pagination.

Then why not having Limit as a node in the logical plan tree (vs a property of Sample plan)? (like SQL and EQL do it)

Good point, checking

Pagination won't work without state.
Without reading the comments first, I also found this code confusing and had to check if we provide a non-null offset anywhere. I guess this could be clarified with a comment, but not sure where such a comment should be placed (if we ever add an offset, a comment placed here will nearly certainly not be updated/removed).
IMO, since the logic is trivial, it could be added when actually needed. But I guess it could also be kept.

If the functionality is not there, the code should be removed. Let's not add code for functionality that is to be implemented sometime in the future or, worse, not at all. It will only confuse anyone reading the code.

- LimitWithOffset in the logical plan tree - proper equals()/hashCode()

Luegg · 2022-06-22T14:37:19Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/parser/LogicalPlanBuilder.java

@@ -102,7 +102,13 @@ public Object visitStatement(StatementContext ctx) {
            if (ctx.pipe().size() > 0) {
                throw new ParsingException(source(ctx.pipe().get(0)), "Samples do not support pipes yet");
            }
-            return plan;
+            LimitWithOffset limitPlan = new LimitWithOffset(


nit pick

Suggested change

LimitWithOffset limitPlan = new LimitWithOffset(

return new LimitWithOffset(

Luegg · 2022-06-22T14:38:22Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/parser/LogicalPlanBuilder.java

@@ -102,7 +102,13 @@ public Object visitStatement(StatementContext ctx) {
            if (ctx.pipe().size() > 0) {
                throw new ParsingException(source(ctx.pipe().get(0)), "Samples do not support pipes yet");


Shouldn't support for pipes now also be possible since limit and offset are both supported?

No, because the limit/size comes from a request parameter, not from the query itself (through the query parser).

but it works with sequence, doesn't it? why would sample be different in this regard?

With sequence you have | head 5 | tail 3 | head 1. How do you do this with samples?

if you can do sequence [any where true] [any where true] | head 10 | tail 2 why can't you do something like sample [any where true] [any where true] | head 10 | tail 2

I see HEAD and TAIL as very natural operators for sequences, that have well defined semantics defined by their order (eg. TAIL 10: give me last ten - more recent - events).
Samples, on the other hand, do not have a natural order: "last ten samples" is not a clear concept by itself.
In this context, LIMIT/OFFSET is much more similar to the SQL equivalent, where you just need a way to retrieve the results in batches or to show them in a UI in pages.
I agree with @Luegg on the fact that we could support pipes here eventually, but we probably need some slightly different syntax, eg | LIMIT 10 OFFSET 20; or reuse HEAD with | OFFSET 20 | HEAD 10, though it seems less natural in this context.

@Luegg, as @luigidellaquila well put this, re-using HEAD and TAIL doesn't reflect the nature of samples which do not have a chronological ordering concept, like sequences do. The only type of ordering that make sense is the one on the join keys, but at that point the simple concepts of HEAD and TAIL are not enough to express the multitude of ordering options available (order by first join key asc, order by the second join key desc, order by the third join key asc and so on; or order by all keys asc, or all keys desc, but at that point one would ask the question why cannot I order by each key individually etc). At this point in the evolution of samples, I believe having a circuit breaker and a simple way of limiting (without computing all the samples) the number of results returned, is enough. With time, and more input from users, we can evaluate our options in providing something more advanced and more granular in terms of limiting the number of results and, at the same time, sorting the results.

I don't see it as reusing head and tail for sample. It's more like making sample work with the other language constructs as sample is the only construct that can not be fed into pipes. In this perspective, head can just fetch the first x results independent of how the order is specified. I agree though that tail might be a bit more work because it requires to reverse the sort order and that's probably better kept out of this PR.

@Luegg I agree with you, tail is definitely not straight forward at this stage.
Also for a proper implementation of pipes on samples (and limit/offset in particular), I think we still need to define a few small details that do not really fit in the scope of this PR.
For this specific one, I'd suggest to stick to the original goal, but keeping a door open for further improvements.

luigidellaquila · 2022-06-24T13:11:01Z

@elasticmachine run elasticsearch-ci/part-2

astefan

There were two approaches in this PR regarding size:

a Sample having a size attribute, which I don't believe it's something that the sample node must have
a Limit logical plan with a Sample logical plan child node

In the latter case (the one in the PR at this point), transforming the logical plan into a physical one doesn't follow the usual steps. For example, the Mapper is the one that generates physical plans based on the logical ones and, according to what's in there already a LimitWithOffset should map to a LimitWithOffsetExec. In your case, SampleExec physical plan includes a Limit, but it should in fact belong to a physical plan having as parent a LimitWithOffsetExec.

astefan · 2022-06-24T15:59:28Z

x-pack/plugin/eql/qa/common/src/main/resources/test_sample.toml

@@ -250,3 +250,61 @@ expected_event_ids  = [18,11]
 join_keys = ["doom","win10"]


+[[queries]]
+name = "size0"
+size = 0


Other eql queries use size as well, but none of them require the toml tests to have size in them. I'm wondering if there isn't any other way to test size given that so far there was no need for size in IT toml tests.

For example, the Mapper is the one that generates physical plans based on the logical ones and, according to what's in there already a LimitWithOffset should map to a LimitWithOffsetExec. In your case, SampleExec physical plan includes a Limit, but it should in fact belong to a physical plan having as parent a LimitWithOffsetExec.

I'm not sure I got this comment: LimitWithOffset+Sample logical plan is transformed to LimitWithOffsetExec+SampleExec, then QueryFolder.FoldLimit rule applies the limit (from LimitWithOffsetExec) to SampleExec, that is then used to build SampleIterator.
This is the exact same thing that happens for Sequence execution planning.

About "size" in the tests, we have unit tests (mostly for the execution planning, plus some specific tests for the algorithms) in Java, but they do not fully execute any query (mocking the full execution is extremely convoluted due to the usage of internal - Server - components in the execution, that cannot be instantiated in our tests).
The only real execution tests we have are TOML tests, this is why I added a size attribute there. It allows us to test the size on a full execution path.
In addition, I think also Sequence tests can take advantage of it.

bpintea

lgtm

This is required in case new shards allocator might be more proactive with rebalancing.

Clarify that the limits in the docs are absolute maxima that will avoid things just breaking but won't necessarily give great performance.

If the date histogram interval is large and the 'fixed_interval' parameter is very small we might end up with a large number of buckets in the resulting histogram, in case we also generate empty buckets. As a result of this we might generate too many buckets (max date - min date) / fixed_interval > 65536 (roughly).. Here we set minDocCount to 1 so to avoid generation of empty buckets. In the test the maximum value for 'docCount' is 9000 which means, in the worst case we generate 9000 documents, each belonging to a different bucket. In this case we would have 9000 buckets maximum which is well below the default maximum number of buckets allowed by default.

Follow up to elastic#89517 to ensure we don't mistakenly pass the `xpack.ml.enabled` flag on versions of ES that have no such setting.

…apacityIsAtLimit

We can use try-with-resources here, and remove a null check because IndexShard.get never returns null.

) closes elastic#80789

…eck (elastic#89560)

For large snapshot clusters the final cleanup step that removes previous shard metadata might have to delete tens of thousands of blobs. This can take many seconds, for which the repository is needlessly blocked from running subsequent finalizations or deletes. Similar to how the final cleanup step was made async and taken out of the finalization loop for deletes in elastic#86514 this PR moves the final cleanup step for finalization async.

Fixes elastic#89518

Closes elastic#89584

@timestamp

**The issue** The flaky test simulates the following: - Create a shrink policy with an invalid target shard count - Then change the policy to have a valid target shard count - Expectation: the `check-target-shards-count` will return true and the shrink operation will be successful. What was happening in some cases in the background: - Create the shrink policy with an invalid target shard count - The `check-target-shards-count` gets created and queued to be executed with the invalid target shards count. - The task doesn't get enough priority to be executed - We change the policy to have a valid target shards count - We execute the queued task which still has the outdated target shard count. **Proof** We enriched the code with some extra logging to verify that the above scenario is actually correct: ``` ## Adding the check target shards to the executingTasks [2022-08-08T18:02:52,824][INFO ][o.e.x.i.IndexLifecycleRunner] [javaRestTest-0] elastic#78460: Adding task to queue check if I can shrink to numberOfShards = 5 [2022-08-08T18:02:52,825][TRACE][o.e.x.i.h.ILMHistoryStore] [javaRestTest-0] queueing ILM history item for indexing [ilm-history-5]: [{"index":"index-zmmrkzfhht","policy":"policy-bEmKF","@timestamp":1659970972825,"index_age":12608,"success":true,"state":{"phase":"warm","phase_definition":"{\"policy\":\"policy-bEmKF\",\"phase_definition\":{\"min_age\":\"0ms\",\"actions\":{\"shrink\":{\"number_of_shards\":5}}},\"version\":1,\"modified_date_in_millis\":1659970962968}","action_time":"1659970968847","phase_time":"1659970966014","action":"shrink","step":"check-target-shards-count","creation_date":"1659970960217","step_time":"1659970972076"}}] ## We change the policy before even the condition is never even evaluated. [2022-08-08T18:02:52,825][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [javaRestTest-0] updating index lifecycle policy [policy-bEmKF] [2022-08-08T18:02:52,826][DEBUG][o.e.x.c.i.PhaseCacheManagement] [javaRestTest-0] [index-zmmrkzfhht] updated policy [policy-bEmKF] contains the same phase step keys and can be refreshed [2022-08-08T18:02:52,826][TRACE][o.e.x.c.i.PhaseCacheManagement] [javaRestTest-0] [index-zmmrkzfhht] updating cached phase definition for policy [policy-bEmKF] [2022-08-08T18:02:52,826][DEBUG][o.e.x.c.i.PhaseCacheManagement] [javaRestTest-0] refreshed policy [policy-bEmKF] phase definition for [1] indices ## We check the condition for the first time but the target shard count is already outdated [2022-08-08T18:02:53,406][ERROR][o.e.x.c.i.CheckTargetShardsCountStep] [javaRestTest-0] elastic#78460: Policy has different target number of shards in cluster state 2 vs what will be executed 5. [2022-08-08T18:02:53,441][DEBUG][o.e.x.c.i.CheckTargetShardsCountStep] [javaRestTest-0] lifecycle action of policy [policy-bEmKF] for index [index-zmmrkzfhht] cannot make progress because the target shards count [5] must be a factor of the source index's shards count [4] ``` **Impact** We do not think that the impact is that big for production clusters because there are many more cluster state updates. However, it might cause some inconvenience to the users who fixed a policy and do not see the effect as soon as they could have. **The fix** Our proposed fix is to not provide the target shard count upon task creation but to retrieve from the cluster state. This way we ensure it will have the newest value. **Future work** Currently for every cluster state we go through all the indices and we check if any step needs to be executed. This doesn't scale well. We would like to try to switch to a more efficient model potentially with a cluster state observer. Issue will be created soon. Resolves elastic#78460

As it has been backported to 8.4.1 with elastic#89611

As per elastic#66908, the setting now defaults to `True`, but it's not shown in the doc. Can we please have the doc updated?

* [DOCS] Update add node section * Include information for enrolling nodes

We never create a stored context that has transient headers to clear while at the same time preserves response headers. Not supporting this path allows for splitting up the logic into 3 distinct methods that are much easier to follow logically. Also, this commit removes a few redundant threadlocal.get and set calls.

This commit changes the status code returned when the start trained model deployment api times out from `500` to `408`. In addition, we add validation that the timeout must be positive. Relates elastic#89585

costin

LGTM with a small nitpick.

What happens if both size and limit are specified?

costin · 2022-08-26T01:03:34Z

x-pack/plugin/eql/qa/common/src/main/java/org/elasticsearch/test/eql/BaseEqlSpecTestCase.java

@@ -46,6 +46,7 @@ public abstract class BaseEqlSpecTestCase extends RemoteClusterAwareEqlRestTestC
     * For now, every value will be converted to a String.
     */
    private final String[] joinKeys;
+    private final Integer size;


I have only one nitpick around the usage of Integer (object) instead of a primitive.
The sign doesn't seem to be used (head and tail take care of where the limit is applied) hence why not use the negative value as indicator that the size is out of band and thus unspecified?
As Integer, size has 3 states - valid if >=0, unspecified if null and invalid if <0.
What I'm saying is using only 2 through a primitive (int) - valid if >=0, unspecified if <0.

What happens if both size and limit are specified?

Do you mean something like ... | tail n with n different from size?
For samples, at this stage, pipes are not supported yet, so it cannot happen.
For sequences, both conditions are applied, so the smaller one prevails; this seems a reasonable behavior, so I'd say we could have the same for samples in the future

When DLS/FLS is enabled on an index for a user, the user will not be able to perform operations that can make data content accessible under a different name. This is intentional. This PR makes it clear in the doc. doc preview link: https://elasticsearch_89606.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/security-limitations.html#field-document-limitations

luigidellaquila · 2022-08-26T13:02:13Z

Sorry for the noise here, it seems after a merge from main branch GitHub is struggling to re-calculate the diff with the head branch feature/eql_samples.

The PR shows ~200 commits ahead of the feature branch, but with a manual check it's clear that all of them are already in the feature branch and that both the feature branch and the PR are up to date with main, so this behavior is unexpected, most likely related to isaacs/github#750 and to our switch from master to main branch.

I'll give it a few hours to settle and hopefully re-calculate the diff. If it doesn't fix by itself, I'll close this and open a new PR

Add support for 'size' in EQL Sample queries

f6003b8

luigidellaquila added the v8.4.0 label Jun 20, 2022

luigidellaquila added 2 commits June 20, 2022 15:49

Merge branch 'feature/eql_samples' into enhancement/eql_samples_limit

872cdfa

Merge branch 'feature/eql_samples' into enhancement/eql_samples_limit

b690f64

luigidellaquila added >enhancement :Analytics/EQL labels Jun 21, 2022

luigidellaquila marked this pull request as ready for review June 21, 2022 09:19

elasticmachine added the Team:QL (Deprecated) label Jun 21, 2022

luigidellaquila requested review from astefan, bpintea, costin and Luegg June 21, 2022 09:19

astefan reviewed Jun 21, 2022

View reviewed changes

Implement review suggestions

8571f04

- LimitWithOffset in the logical plan tree - proper equals()/hashCode()

Luegg reviewed Jun 22, 2022

View reviewed changes

astefan reviewed Jun 24, 2022

View reviewed changes

Fix hashCode()

7299858

bpintea approved these changes Jun 28, 2022

View reviewed changes

luigidellaquila added 3 commits June 30, 2022 09:22

Merge branch 'feature/eql_samples' into enhancement/eql_samples_limit

a2d7d31

Merge branch 'feature/eql_samples' into enhancement/eql_samples_limit

8dab37d

Merge branch 'feature/eql_samples' into enhancement/eql_samples_limit

21111f7

mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022

idegtiarenko and others added 5 commits August 8, 2022 09:13

Make it explicit that test expects no rebalancing. (elastic#89040)

7c38041

This is required in case new shards allocator might be more proactive with rebalancing.

Fix typo in geo-distance-query doc (elastic#89148)

c4bd4d3

Fix typo in TRACING.md

259d2e0

Refine size-your-shards wording (elastic#89081)

c81f907

Clarify that the limits in the docs are absolute maxima that will avoid things just breaking but won't necessarily give great performance.

mark-vieira and others added 21 commits August 25, 2022 00:14

Don't try to disable ML on incompatible versions (elastic#89565)

d506aa4

Follow up to elastic#89517 to ensure we don't mistakenly pass the `xpack.ml.enabled` flag on versions of ES that have no such setting.

Mute StartTrainedModelDeploymentRequestTests.testValidate_GivenQueueC…

0aa0477

…apacityIsAtLimit

Small cleanup in ShardGetService (elastic#89578)

5422860

We can use try-with-resources here, and remove a null check because IndexShard.get never returns null.

Bump versions after 8.4.0 release

365be4d

Prune changelogs after 8.4.0 release

caafd4c

Bump versions after 7.17.6 release

773aeab

check parent circuit breaker when allocating empty bucket (elastic#89568

061e643

) closes elastic#80789

Add convenience script for pruning old dev branch CI jobs

114955f

Generate release notes for 8.4 release (elastic#89475)

875164c

Remove coming tag from release notes (elastic#89577)

7f91884

Ensure only snapshot jdbc driver versions are testing when running ch…

ffcf0ea

…eck (elastic#89560)

Fix log expectation inconsistency in flush test (elastic#89571)

8cac490

Fixes elastic#89518

[ML] Fix random threads_per_allocation in unit test (elastic#89588)

16e4cb1

Closes elastic#89584

[ML] Remove changelog entry for elastic#89573 (elastic#89613)

8b3293d

As it has been backported to 8.4.1 with elastic#89611

Add default value for destructive_requires_name (elastic#85591)

65b05f8

As per elastic#66908, the setting now defaults to `True`, but it's not shown in the doc. Can we please have the doc updated?

[DOCS] Update add node section (elastic#89393)

2fad061

* [DOCS] Update add node section * Include information for enrolling nodes

[ML] Return 408 when start deployment api times out (elastic#89612)

3fca120

This commit changes the status code returned when the start trained model deployment api times out from `500` to `408`. In addition, we add validation that the timeout must be positive. Relates elastic#89585

[TEST] Fix SimpleThreadPoolIT file watcher thread name (elastic#89624)

dc4debc

costin approved these changes Aug 26, 2022

View reviewed changes

ywangd and others added 3 commits August 26, 2022 14:18

Merge branch 'main' into enhancement/eql_samples_limit

98475ee

Implement review suggestions

ff7291e

luigidellaquila requested a review from a team as a code owner August 26, 2022 09:37

luigidellaquila removed the request for review from a team August 26, 2022 13:08

luigidellaquila mentioned this pull request Aug 29, 2022

Add support for 'size' in EQL Sample queries #89684

Merged

luigidellaquila closed this Aug 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for 'size' in EQL Sample queries #87846

Add support for 'size' in EQL Sample queries #87846

luigidellaquila commented Jun 20, 2022

elasticmachine commented Jun 21, 2022

astefan left a comment

astefan Jun 21, 2022

astefan Jun 21, 2022

luigidellaquila Jun 22, 2022

astefan Aug 2, 2022

astefan Jun 21, 2022

astefan Jun 21, 2022

luigidellaquila Jun 22, 2022

astefan Jun 22, 2022

luigidellaquila Jun 22, 2022

bpintea Jun 28, 2022

astefan Aug 1, 2022

Luegg Jun 22, 2022

Luegg Jun 22, 2022

astefan Jun 22, 2022

Luegg Jun 22, 2022

astefan Jun 22, 2022

Luegg Jun 22, 2022

luigidellaquila Jun 22, 2022

astefan Jun 23, 2022

Luegg Jun 23, 2022

luigidellaquila Jun 23, 2022

luigidellaquila commented Jun 24, 2022

astefan left a comment

astefan Jun 24, 2022

luigidellaquila Jun 27, 2022

bpintea left a comment

costin left a comment •

edited

Loading

costin Aug 26, 2022

luigidellaquila Aug 26, 2022

luigidellaquila commented Aug 26, 2022

		@@ -15,21 +15,28 @@

		public class Sample extends AbstractJoin {

		public Sample(Source source, List<KeyedFilter> queries) {

		@@ -24,20 +25,22 @@
		public class SampleExec extends PhysicalPlan {

	LimitWithOffset limitPlan = new LimitWithOffset(
	return new LimitWithOffset(

		@@ -102,7 +102,13 @@ public Object visitStatement(StatementContext ctx) {
		if (ctx.pipe().size() > 0) {
		throw new ParsingException(source(ctx.pipe().get(0)), "Samples do not support pipes yet");

Add support for 'size' in EQL Sample queries #87846

Add support for 'size' in EQL Sample queries #87846

Conversation

luigidellaquila commented Jun 20, 2022

elasticmachine commented Jun 21, 2022

astefan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luigidellaquila commented Jun 24, 2022

astefan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bpintea left a comment

Choose a reason for hiding this comment

costin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luigidellaquila commented Aug 26, 2022

costin left a comment •

edited

Loading