-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Handle allocation errors inside topn #99931
ESQL: Handle allocation errors inside topn #99931
Conversation
This properly handles allocation errors inside of topn by making `Block.Builder` and `Vector.Builder` `Releasable`. The "new way" to deal with block factories is like this: ``` try (var b = IntBlock.builder(3, blockFactory) { b.append(1); b.append(2); b.append(3); return b.build(); } ``` If anything goes wrong the block factory's `close` method will be called by the `try` block and all of the circuit breaking that it reserves will be released. For this all to work well `Block.Builder`s have to be one-shot. In other words, you can only call `.build` on them one time. That shifts the accounting from the builder into the block. It is an error to call `build` twice.
82388e5
to
15735e0
Compare
@@ -27,31 +29,32 @@ public class CrankyCircuitBreakerService extends CircuitBreakerService { | |||
public static final String ERROR_MESSAGE = "cranky breaker"; | |||
|
|||
private final CircuitBreaker breaker = new CircuitBreaker() { | |||
@Override | |||
public void circuitBreak(String fieldName, long bytesNeeded) { | |||
private final AtomicLong used = new AtomicLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm modifying this so I can assert that we release all memory after we break.
@@ -20,7 +22,7 @@ final class BooleanBlockBuilder extends AbstractBlockBuilder implements BooleanB | |||
BooleanBlockBuilder(int estimatedSize, BlockFactory blockFactory) { | |||
super(blockFactory); | |||
int initialSize = Math.max(estimatedSize, 2); | |||
adjustBreaker(initialSize); | |||
adjustBreaker(RamUsageEstimator.NUM_BYTES_ARRAY_HEADER + initialSize * elementSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were pretty far off so I took the liberty of making them a bit more accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks.
if (elementType == ElementType.UNKNOWN || elementType == ElementType.NULL || elementType == ElementType.DOC) { | ||
continue; | ||
} | ||
params.add(new Object[] { elementType }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this to parameterized tests so it'll pick up new elementTypes by default.
public static BlockFactory blockFactory(ByteSizeValue size) { | ||
BigArrays bigArrays = new MockBigArrays(PageCacheRecycler.NON_RECYCLING_INSTANCE, size); | ||
return new BlockFactory(bigArrays.breakerService().getBreaker(CircuitBreaker.REQUEST), bigArrays); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just felt like a nice place to stick this so I could share it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
BytesRefBlock block1 = builder.build(); | ||
BytesRefBlock block2 = builder.build(); | ||
BytesRefBlock.Builder builder1 = BytesRefBlock.newBlockBuilder(grow ? 0 : positions); | ||
BytesRefBlock.Builder builder2 = BytesRefBlock.newBlockBuilder(grow ? 0 : positions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These happened because block builder are one-shot now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually more readably. 👍
public void close() { | ||
while (page.hasNext()) { | ||
page.next().releaseBlocks(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This let's me assert that we've closed all the input pages even if these's an error!
Pinging @elastic/es-ql (Team:QL) |
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
run elasticsearch-ci/part-2 |
There's some kind of double release going on, I think it's on blocks but I'm not sure. |
CrankyBreaker is my hero!
|
OK! The problem is with the BytesRefBuilder. We "estimate" the bytes used, but they aren't really an estimate at all. |
I got it! Fix incoming. |
Almost there! BytesRefs weren't building in the same way as everything else but they sure tried to. It wasn't that big a deal that we were off before, but this should catch it! Lots more tests incoming too. |
OK! That should work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
@Override | ||
public String toString() { | ||
return "1gb"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. The toString is super helpful here. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I got the old error message from gradle and thought "I have no idea what this means". Easy enough fix!
* Memory used by the {@link BigArrays} portion of this {@link BytesRefArray}. | ||
*/ | ||
public long bigArraysRamBytesUsed() { | ||
return startOffsets.ramBytesUsed() + bytes.ramBytesUsed(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@@ -75,6 +75,7 @@ public BooleanBlock expand() { | |||
public static long ramBytesEstimated(boolean[] values, int[] firstValueIndexes, BitSet nullsMask) { | |||
return BASE_RAM_BYTES_USED + RamUsageEstimator.sizeOf(values) + BlockRamUsageEstimator.sizeOf(firstValueIndexes) | |||
+ BlockRamUsageEstimator.sizeOfBitSet(nullsMask) + RamUsageEstimator.shallowSizeOfInstance(MvOrdering.class); | |||
// TODO mvordering is shared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, of course. 👍
@@ -20,7 +22,7 @@ final class BooleanBlockBuilder extends AbstractBlockBuilder implements BooleanB | |||
BooleanBlockBuilder(int estimatedSize, BlockFactory blockFactory) { | |||
super(blockFactory); | |||
int initialSize = Math.max(estimatedSize, 2); | |||
adjustBreaker(initialSize); | |||
adjustBreaker(RamUsageEstimator.NUM_BYTES_ARRAY_HEADER + initialSize * elementSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks.
UNKNOWN((estimatedSize, blockFactory) -> { throw new UnsupportedOperationException("can't build null blocks"); }); | ||
|
||
interface BuilderSupplier { | ||
Block.Builder newBlockBuilder(int estimatedSize, BlockFactory blockFactory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
close.add(p::releaseBlocks); | ||
} | ||
Collections.addAll(close, builders); | ||
Releasables.closeExpectNoException(Releasables.wrap(close)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turns out to be not so bad. 👍
public static BlockFactory blockFactory(ByteSizeValue size) { | ||
BigArrays bigArrays = new MockBigArrays(PageCacheRecycler.NON_RECYCLING_INSTANCE, size); | ||
return new BlockFactory(bigArrays.breakerService().getBreaker(CircuitBreaker.REQUEST), bigArrays); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
BytesRefBlock block1 = builder.build(); | ||
BytesRefBlock block2 = builder.build(); | ||
BytesRefBlock.Builder builder1 = BytesRefBlock.newBlockBuilder(grow ? 0 : positions); | ||
BytesRefBlock.Builder builder2 = BytesRefBlock.newBlockBuilder(grow ? 0 : positions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually more readably. 👍
|
||
// Note the lack of try/finally here - we're asserting that when the driver throws an exception we clear the breakers. | ||
assertThat(bigArrays.breakerService().getBreaker(CircuitBreaker.REQUEST).getUsed(), equalTo(0L)); | ||
assertThat(inputFactoryContext.breaker().getUsed(), equalTo(0L)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@After | ||
public void allBreakersEmpty() { | ||
for (CircuitBreaker breaker : breakers) { | ||
assertThat(breaker.getUsed(), equalTo(0L)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be useful. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I think we should drag this into AnyOperatorTests
- maybe pretty soon.
The part two errors look real! I'll take a look. |
run elasticsearch-ci/part-3 |
This adds things like `IntVector.FixedBuilder` which is slightly simpler to use than constructing the arrays by hand. It also measures bytes used up front in the circuit breaker. And it'll be easier to integrate it into framework happening over in elastic#99931 to handle errors in topn. This also uses it in `mv_` functions.
This adds things like `IntVector.FixedBuilder` which is slightly simpler to use than constructing the arrays by hand. It also measures bytes used up front in the circuit breaker. And it'll be easier to integrate it into framework happening over in #99931 to handle errors in topn. This also uses it in `mv_` functions.
This adds things like `IntVector.FixedBuilder` which is slightly simpler to use than constructing the arrays by hand. It also measures bytes used up front in the circuit breaker. And it'll be easier to integrate it into framework happening over in elastic#99931 to handle errors in topn. This also uses it in `mv_` functions.
This properly handles allocation errors inside of topn by making
Block.Builder
andVector.Builder
Releasable
. The "new way" to deal with block factories is like this:If anything goes wrong the block factory's
close
method will be called by thetry
block and all of the circuit breaking that it reserves will be released.For this all to work well
Block.Builder
s have to be one-shot. In other words, you can only call.build
on them one time. That shifts the accounting from the builder into the block. It is an error to callbuild
twice.