-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Page shouldn't close a block twice #100370
Conversation
Page now takes into account that a block can be used in multiple positions (such as the same column aliased under multiple names). Relates elastic#100001 Fix elastic#100365 Fix elastic#100356
Pinging @elastic/es-ql (Team:QL) |
Hi @costin, I've created a changelog YAML for you. |
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alex-spies should probably look at this too. It does get the job done. And we don't need to keep it forever. If we have reference counting we wouldn't need it. But we don't have it now.
@@ -222,6 +223,13 @@ public void writeTo(StreamOutput out) throws IOException { | |||
*/ | |||
public void releaseBlocks() { | |||
blocksReleased = true; | |||
Releasables.closeExpectNoException(blocks); | |||
// blocks can be used as multiple columns | |||
var set = new IdentityHashMap<Block, Object>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collections.newSetFromMap
is the idiomatic way to do this. And all the cool kids want to be idiots, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
I actually tried that first but it blew up because the blocks equals implementation checks the content meaning different blocks end up looking equal meaning they are not closed as they are considered a duplicate.
Which causes a bunch of leaks - hence why I'm using the Identity match alone.
@ChrisHegarty also might have opinions on this. |
* That is, allows clean-up of the current page _after_ external manipulation of the blocks. | ||
* The current page should no longer be used and be considered closed. | ||
*/ | ||
public Page newPageAndRelease(Block... keep) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've introduced this method for handling the case of reusing some blocks from an old page into a new one.
This happens in ProjectExec and OutputExec (inside the planner) - I've moved this method from the Project into the page and clean-it up a bit (the usual reduction complexity from using two lists - O(N*M) to O(N) with the extra memory for the map).
for (int i = 0; i < toAdd.length; i++) { | ||
this.blocks[prev.blocks.length + i] = toAdd[i]; | ||
} | ||
System.arraycopy(toAdd, 0, this.blocks, prev.blocks.length + 0, toAdd.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small improvement .
are not-used when creating a new page
for (int i = 0; i < toAdd.length; i++) { | ||
this.blocks[prev.blocks.length + i] = toAdd[i]; | ||
} | ||
System.arraycopy(toAdd, 0, this.blocks, prev.blocks.length, toAdd.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small improvement.
@elasticsearchmachine run elasticsearch-ci/part-1 |
FTR, it looks like all the builders for part-1 end up being disabled... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as an intermediate fix.
IMO we're beginning to create a lot of workarounds to make up for a lack of ref counted shallow copies of blocks and that makes the code harder to reason about.
var map = new IdentityHashMap<Block, Object>(mapSize(blocks.length)); | ||
var DUMMY = new Object(); | ||
for (Block b : blocks) { | ||
if (map.putIfAbsent(b, DUMMY) == null) { | ||
Releasables.closeExpectNoException(b); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this improves the issue, but there still can be problems because non-identical blocks can be backed by the same vector or array.
Additionally, we create and populate a new hash map on each page release; that's probably quite expensive.
We can merge this to fix the problem right now, but IMO should mark this with a todo comment to remove this logic once possible. We want ref counting, this will fix the problem more idiomatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-identical blocks can be backed by the same vector or array.
Why would they?
we create and populate a new hash map on each page release; that's probably quite expensive.
I've added a check to skip the release once it's one - we can change that if we want a page to only be released once.
Releasables.closeExpectNoException(blocks); | ||
// blocks can be used as multiple columns | ||
var map = new IdentityHashMap<Block, Object>(mapSize(blocks.length)); | ||
var DUMMY = new Object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be moved into a static final var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or...
var map = new IdentityHashMap<Block, Boolean>(mapSize(blocks.length));
for (Block b : blocks) {
if (map.putIfAbsent(b, Boolean.TRUE) == null) { ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I left a few small comments / suggestions.
Additionally, it would be good to have a small unit test in BasicPageTests for the Page additions.
Releasables.closeExpectNoException(blocks); | ||
// blocks can be used as multiple columns | ||
var map = new IdentityHashMap<Block, Object>(mapSize(blocks.length)); | ||
var DUMMY = new Object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or...
var map = new IdentityHashMap<Block, Boolean>(mapSize(blocks.length));
for (Block b : blocks) {
if (map.putIfAbsent(b, Boolean.TRUE) == null) { ..
|
||
// create identity set | ||
for (Block b : keep) { | ||
map.putIfAbsent(b, DUMMY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subjective:
// create identity set
var set = Collections.newSetFromMap(new IdentityHashMap<Block, Boolean>(mapSize(keep.length)));
set.addAll(Arrays.asList(keep));
..
if (set.contains(b) == false) { ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
static int mapSize(int expectedSize) { | ||
return expectedSize < 2 ? expectedSize + 1 : (int) (expectedSize / 0.75 + 1.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CollectionUtils.mapSize()
?
Edit: not avail in compute package, disregard.
💔 Backport failed
You can use sqren/backport to manually backport by running |
Page now takes into account that a block can be used in multiple positions (such as the same column aliased under multiple names). Introduce newPageAndRelease method that handles clean-up of blocks that are not-used when creating a new page Relates elastic#100001 Fix elastic#100365 Fix elastic#100356 (cherry picked from commit 44068cb)
* ESQL: Page shouldn't close a block twice (#100370) Page now takes into account that a block can be used in multiple positions (such as the same column aliased under multiple names). Introduce newPageAndRelease method that handles clean-up of blocks that are not-used when creating a new page Relates #100001 Fix #100365 Fix #100356 (cherry picked from commit 44068cb) * Fix order inside test
Page now takes into account that a block can be used in multiple
positions (such as the same column aliased under multiple names).
Relates #100001
Fix #100365
Fix #100356