Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: Optimize string retention #66207

Merged
merged 2 commits into from
Dec 14, 2020

Conversation

costin
Copy link
Member

@costin costin commented Dec 11, 2020

When iterating across search hits, common strings such as the index name
or common keys get allocated new strings. When dealing with a large
number of potential keys these add up and end up wasting memory though
their content is the same.
This commit introduces a simple LRU cache (up to 64 entries) to minimize
the duplication.

When iterating across search hits, common strings such as the index name
or common keys get allocated new strings. When dealing with a large
number of potential keys these add up and end up wasting memory though
their content is the same.
This commit introduces a simple LRU cache (up to 64 entries) to minimize
the duplication.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@costin
Copy link
Member Author

costin commented Dec 11, 2020

See the impact based on MITRE dataset.
Before:
image
After:
image

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left two minor comments.

} else {
public Object[] key(SearchHit hit) {
Object[] key = null;
if (keys.isEmpty() == false) {
Object[] docKeys = new Object[keys.size()];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have used a variable initialized with keys.size() and use that in the Object[] initialization and further down in the for loop.

@@ -47,8 +49,23 @@
*/
public class TumblingWindow implements Executable {

private static final int CACHE_MAX_SIZE = 63;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentioned 64.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make sure the eviction occurs before the map gets resized. Checking the code it looks like checking the equality on 64 should work.

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/**
* Simple cache for removing duplicate strings (such as index name or common keys).
* Designed to be low-effort and thus optimistic in nature.
* Thus it has a small, upper limit so that it doesn't require any cleaning up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have a Cache class in the common lib, which is more complex and supports concurrency, could we have a comment here that concurrency is not needed?

@costin costin merged commit 86ebfba into elastic:master Dec 14, 2020
@costin costin deleted the eql/remove-string-duplication branch December 14, 2020 08:59
costin added a commit that referenced this pull request Dec 14, 2020
When iterating across search hits, common strings such as the index name
or common keys get allocated new strings. When dealing with a large
number of potential keys these add up and end up wasting memory though
their content is the same.
This commit introduces a simple LRU cache (up to 64 entries) to minimize
the duplication.

(cherry picked from commit 86ebfba)
Copy link
Contributor

@palesz palesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too, only one thing I'd consider changing.

* Thus it has a small, upper limit so that it doesn't require any cleaning up.
*/
// start with the default size and allow growth until the max size
private final Map<String, String> stringCache = new LinkedHashMap<>(16, 0.75f, true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One things to consider: Start with the CACHE_MAX_SIZE instead of the default size (16). The HashMap will have to grow anyways, unless you think that the chances of having <= 32 different strings are high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants