EQL: Optimize string retention #66207

costin · 2020-12-11T17:30:49Z

When iterating across search hits, common strings such as the index name
or common keys get allocated new strings. When dealing with a large
number of potential keys these add up and end up wasting memory though
their content is the same.
This commit introduces a simple LRU cache (up to 64 entries) to minimize
the duplication.

When iterating across search hits, common strings such as the index name or common keys get allocated new strings. When dealing with a large number of potential keys these add up and end up wasting memory though their content is the same. This commit introduces a simple LRU cache (up to 64 entries) to minimize the duplication.

elasticmachine · 2020-12-11T17:30:52Z

Pinging @elastic/es-ql (Team:QL)

costin · 2020-12-11T17:37:43Z

See the impact based on MITRE dataset.
Before:

After:

astefan

LGTM. Left two minor comments.

astefan · 2020-12-12T09:58:46Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/assembler/Criterion.java

-        } else {
+    public Object[] key(SearchHit hit) {
+        Object[] key = null;
+        if (keys.isEmpty() == false) {
            Object[] docKeys = new Object[keys.size()];


You could have used a variable initialized with keys.size() and use that in the Object[] initialization and further down in the for loop.

astefan · 2020-12-12T09:59:31Z

.../plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/TumblingWindow.java

@@ -47,8 +49,23 @@
 */
 public class TumblingWindow implements Executable {

+    private static final int CACHE_MAX_SIZE = 63;


The PR description mentioned 64.

I wanted to make sure the eviction occurs before the map gets resized. Checking the code it looks like checking the equality on 64 should work.

matriv

LGTM

matriv · 2020-12-13T22:57:27Z

.../plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/TumblingWindow.java

+    /**
+     * Simple cache for removing duplicate strings (such as index name or common keys).
+     * Designed to be low-effort and thus optimistic in nature.
+     * Thus it has a small, upper limit so that it doesn't require any cleaning up.


Since we have a Cache class in the common lib, which is more complex and supports concurrency, could we have a comment here that concurrency is not needed?

When iterating across search hits, common strings such as the index name or common keys get allocated new strings. When dealing with a large number of potential keys these add up and end up wasting memory though their content is the same. This commit introduces a simple LRU cache (up to 64 entries) to minimize the duplication. (cherry picked from commit 86ebfba)

palesz

LGTM too, only one thing I'd consider changing.

palesz · 2020-12-14T15:46:28Z

.../plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/TumblingWindow.java

+     * Thus it has a small, upper limit so that it doesn't require any cleaning up.
+     */
+    // start with the default size and allow growth until the max size
+    private final Map<String, String> stringCache = new LinkedHashMap<>(16, 0.75f, true) {


One things to consider: Start with the CACHE_MAX_SIZE instead of the default size (16). The HashMap will have to grow anyways, unless you think that the chances of having <= 32 different strings are high.

costin added >enhancement v8.0.0 :Analytics/EQL EQL querying v7.11.0 labels Dec 11, 2020

costin assigned astefan, palesz, bpintea and matriv Dec 11, 2020

elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Dec 11, 2020

costin unassigned astefan, palesz, bpintea and matriv Dec 11, 2020

costin requested review from astefan, matriv, bpintea and palesz December 11, 2020 17:37

astefan approved these changes Dec 12, 2020

View reviewed changes

matriv reviewed Dec 13, 2020

View reviewed changes

matriv approved these changes Dec 14, 2020

View reviewed changes

Address feedback

1d0baac

costin merged commit 86ebfba into elastic:master Dec 14, 2020

costin deleted the eql/remove-string-duplication branch December 14, 2020 08:59

palesz reviewed Dec 14, 2020

View reviewed changes

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EQL: Optimize string retention #66207

EQL: Optimize string retention #66207

costin commented Dec 11, 2020

elasticmachine commented Dec 11, 2020

costin commented Dec 11, 2020

astefan left a comment

astefan Dec 12, 2020

astefan Dec 12, 2020

costin Dec 14, 2020

matriv left a comment

matriv Dec 13, 2020

palesz left a comment

palesz Dec 14, 2020

EQL: Optimize string retention #66207

EQL: Optimize string retention #66207

Conversation

costin commented Dec 11, 2020

elasticmachine commented Dec 11, 2020

costin commented Dec 11, 2020

astefan left a comment

Choose a reason for hiding this comment

astefan Dec 12, 2020

Choose a reason for hiding this comment

astefan Dec 12, 2020

Choose a reason for hiding this comment

costin Dec 14, 2020

Choose a reason for hiding this comment

matriv left a comment

Choose a reason for hiding this comment

matriv Dec 13, 2020

Choose a reason for hiding this comment

palesz left a comment

Choose a reason for hiding this comment

palesz Dec 14, 2020

Choose a reason for hiding this comment