-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EQL: Optimize string retention #66207
Conversation
When iterating across search hits, common strings such as the index name or common keys get allocated new strings. When dealing with a large number of potential keys these add up and end up wasting memory though their content is the same. This commit introduces a simple LRU cache (up to 64 entries) to minimize the duplication.
Pinging @elastic/es-ql (Team:QL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left two minor comments.
} else { | ||
public Object[] key(SearchHit hit) { | ||
Object[] key = null; | ||
if (keys.isEmpty() == false) { | ||
Object[] docKeys = new Object[keys.size()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have used a variable initialized with keys.size()
and use that in the Object[]
initialization and further down in the for
loop.
@@ -47,8 +49,23 @@ | |||
*/ | |||
public class TumblingWindow implements Executable { | |||
|
|||
private static final int CACHE_MAX_SIZE = 63; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description mentioned 64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make sure the eviction occurs before the map gets resized. Checking the code it looks like checking the equality on 64 should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/** | ||
* Simple cache for removing duplicate strings (such as index name or common keys). | ||
* Designed to be low-effort and thus optimistic in nature. | ||
* Thus it has a small, upper limit so that it doesn't require any cleaning up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have a Cache
class in the common lib, which is more complex and supports concurrency, could we have a comment here that concurrency is not needed?
When iterating across search hits, common strings such as the index name or common keys get allocated new strings. When dealing with a large number of potential keys these add up and end up wasting memory though their content is the same. This commit introduces a simple LRU cache (up to 64 entries) to minimize the duplication. (cherry picked from commit 86ebfba)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too, only one thing I'd consider changing.
* Thus it has a small, upper limit so that it doesn't require any cleaning up. | ||
*/ | ||
// start with the default size and allow growth until the max size | ||
private final Map<String, String> stringCache = new LinkedHashMap<>(16, 0.75f, true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One things to consider: Start with the CACHE_MAX_SIZE instead of the default size (16
). The HashMap will have to grow anyways, unless you think that the chances of having <= 32 different strings are high.
When iterating across search hits, common strings such as the index name
or common keys get allocated new strings. When dealing with a large
number of potential keys these add up and end up wasting memory though
their content is the same.
This commit introduces a simple LRU cache (up to 64 entries) to minimize
the duplication.