-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colmem: improve the behavior of ResetMaybeReallocate #81535
Conversation
Previously, `Allocator.ResetMaybeReallocate` would always grow the capacity of the batch until `coldata.BatchSize()` (unless the memory limit has been exceeded). This behavior allows us to have batches with dynamic size when we don't know how many rows we need to process (for example, in the `cFetcher` we start out with 1 row, then grow it exponentially - 2, 4, 8, etc). However, in some cases we know exactly how many rows we want to include into the batch, so that behavior can result in us re-allocating a batch needlessly when the old batch already have enough capacity. This commit improves the situation by adding a knob to indicate that if the desired capacity is satisfied by the old batch, then it should not be re-allocated. All callers have been audited accordingly. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement! I have some questions around surrounding code.
Reviewed 15 of 28 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/sql/colexec/colexecjoin/crossjoiner.go
line 112 at r1 (raw file):
true, /* desiredCapacitySufficient */ ) if willEmit > c.output.Capacity() {
Will this condition ever trigger? It doesn't seem like it would now, but I'm not sure it would have before this change, either, since willEmit is the min desired capacity.
pkg/sql/colexec/colexecutils/deselector.go
line 62 at r1 (raw file):
// so we don't use a memory limit here. It is up to the wrapped operator to // limit the size of batches based on the memory footprint. const maxBatchMemSize = math.MaxInt64
Do you know why maxBatchMemSize is a MaxInt64? It seems high, and, looking at ResetMaybeReallocate, if this is the max batch size then we will never reuse a batch, since batches seem to be capped at coldata.BatchSize().
Also, when desiredCapacitySufficient is true, does it really matter what the max batch size is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @rharding6373, and @yuzefovich)
pkg/sql/colmem/allocator.go
line 193 at r1 (raw file):
if desiredCapacitySufficient && oldBatch.Capacity() >= minDesiredCapacity { useOldBatch = true }
Why does this logic need desiredCapacitySufficient
? Why can't it simply compare against minDesiredCapacity
?
Code quote:
if desiredCapacitySufficient && oldBatch.Capacity() >= minDesiredCapacity {
useOldBatch = true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
pkg/sql/colexec/colexecjoin/crossjoiner.go
line 112 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Will this condition ever trigger? It doesn't seem like it would now, but I'm not sure it would have before this change, either, since willEmit is the min desired capacity.
willEmit
here can be arbitrarily large (meaning it can exceed coldata.BatchSize()
), and ResetMaybeReallocate
will not "honor" the "min desired capacity" and will reduce it to coldata.BatchSize()
. This happens at the very top of ResetMaybeReallocate
function. In other words, although we ask for willEmit
capacity, a smaller batch might be returned, so we have to reduce willEmit
accordingly.
pkg/sql/colexec/colexecutils/deselector.go
line 62 at r1 (raw file):
There is a comment right above - deselectorOp
relies on its input to produce reasonably sized batches, so if the input batch is of a small enough size when the selection vector is applied to it, then the "deselected" batch will also be of a small enough size (since the "deselected" batch is not larger in its footprint than the original batch with a selection vector).
Also, when desiredCapacitySufficient is true, does it really matter what the max batch size is?
Yeah, it does matter since these two knobs operate in "different dimensions".
Imagine a scenario where we have a batch with 10 very wide rows, so its footprint exceeds maxBatchMemSize
, then even if we ask for 20 rows capacity in ResetMaybeReallocate
, we still should get the original batch with the capacity of 10 because rows are wide and we don't want to exceed the memory target even more, regardless of the value of desiredCapacitySufficient
knob.
On the other hand, if our old batch has capacity of 10, then asking for a capacity of 5 if desiredCapacitySufficient
is true
should give us the same old batch with cap of 10, regardless of the max batch mem size.
pkg/sql/colmem/allocator.go
line 193 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Why does this logic need
desiredCapacitySufficient
? Why can't it simply compare againstminDesiredCapacity
?
ResetMaybeReallocate
with desiredCapacitySufficient == false
(the only previous behavior) is our "dynamic batch size" mechanism. The idea is that some callers don't know how many rows they will add into the batch (e.g. cFetcher
in the absence of limits and limit hints), so they start out small (usually with a batch of 1 row), so ResetMaybeReallocate
tries to find the best capacity growing it exponentially.
Going back to the cFetcher
example: imagine we have 1m rows in the table and we're performing a full table scan, the cFetcher
will return a batch with 1 row first, then with 2, with 4, etc, until it gets to coldata.BatchSize()
on the eleventh batch, from which point it will only be reusing that maximum batch. This ensures that we get to the maximum batch quickly. At the same time, such a strategy works well when the table is small - if we're performing a full scan of a table with 1 row, it'd be inefficient to allocate a batch for coldata.BatchSize()
rows right away.
The dynamic sizing behavior provided by Allocator.ResetMaybeReallocate
is our heuristic-based answer to these concerns.
However, in some cases the callers do know how many rows they will add into the batch, and this PR makes the behavior in such a scenario more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
pkg/sql/colexec/colexecjoin/crossjoiner.go
line 112 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
willEmit
here can be arbitrarily large (meaning it can exceedcoldata.BatchSize()
), andResetMaybeReallocate
will not "honor" the "min desired capacity" and will reduce it tocoldata.BatchSize()
. This happens at the very top ofResetMaybeReallocate
function. In other words, although we ask forwillEmit
capacity, a smaller batch might be returned, so we have to reducewillEmit
accordingly.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2, @rharding6373, and @yuzefovich)
pkg/sql/colmem/allocator.go
line 193 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
ResetMaybeReallocate
withdesiredCapacitySufficient == false
(the only previous behavior) is our "dynamic batch size" mechanism. The idea is that some callers don't know how many rows they will add into the batch (e.g.cFetcher
in the absence of limits and limit hints), so they start out small (usually with a batch of 1 row), soResetMaybeReallocate
tries to find the best capacity growing it exponentially.Going back to the
cFetcher
example: imagine we have 1m rows in the table and we're performing a full table scan, thecFetcher
will return a batch with 1 row first, then with 2, with 4, etc, until it gets tocoldata.BatchSize()
on the eleventh batch, from which point it will only be reusing that maximum batch. This ensures that we get to the maximum batch quickly. At the same time, such a strategy works well when the table is small - if we're performing a full scan of a table with 1 row, it'd be inefficient to allocate a batch forcoldata.BatchSize()
rows right away.The dynamic sizing behavior provided by
Allocator.ResetMaybeReallocate
is our heuristic-based answer to these concerns.However, in some cases the callers do know how many rows they will add into the batch, and this PR makes the behavior in such a scenario more efficient.
Making sure I understand: so there are two modes: (a) doubling if below coldata.BatchSize()
or (b) ensuring at least minDesiredCapacity
. In mode (a) minDesiredCapacity
only matters for the first call with oldBatch == nil
. Is that right?
Are we ever going to call this with a dynamic value for desiredCapacitySufficient
? If we're always going to hardcode either true or false, I think it would be simpler for readers (and simpler for compiler inlining) to use two functions instead of sharing code. Then we can give them descriptive names like "doubleEachTime" and "ensureAtLeastThisLarge".
The cFetcher
example is confusing me. All other callers with desiredCapacitySufficient == false
use minDesiredCapacity == 1
. cFetcher
, on the other hand, seems to know what it's doing with minDesiredCapacity
and yet it still uses the (a) mode. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
pkg/sql/colmem/allocator.go
line 193 at r1 (raw file):
so there are two modes: (a) doubling if below
coldata.BatchSize()
or (b) ensuring at leastminDesiredCapacity
. In mode (a)minDesiredCapacity
only matters for the first call witholdBatch == nil
. Is that right?
I think it's not quite right, but let me spell things out.
Goals for ResetMaybeReallocate
in terms of capacity allocation, according to priority:
- make batches not exceed memory limits (
maxBatchMemSize
parameter) - grow the capacity of the batches exponentially, until maximum
- try to provide the capacity of at least
minDesiredCapacity
, subject to the memory limit though.
We achieve this with the following approach:
a) if there is no old batch, since we don't know the footprint of rows that will be added to the batch, we just allocate a new batch of minDesiredCapacity
(which is made to be in [1, coldata.BatchSize()]
range)
b) if there is an old batch, then:
b1) if the old batch already exceeded the memory limit, then we reuse it. This makes sure that we stay as close to goal 1. as possible.
b2) if the old batch is of the maximum capacity, then we also reuse it. It doesn't matter if we reached the maximum memory size - we have nowhere to grow anyway, so goal 2. has been reached.
b3) if the old batch has not exceeded the memory limit yet nor is of maximum capacity, then according to goal 2. we want to grow the capacity exponentially, so we make the new capacity double of the old one. Next, if minDesiredCapacity
is still larger than the double of old capacity, we ratchet new capacity up to the desired capacity. The reasoning being that we're still within memory limits (goal 1. is good), we increased the capacity by a factor of more than 2 (goal 2. is good), and we got exactly the desired capacity (goal 3. is good).
(Side comment: note that since we learn the batch memory footprint only after adding rows into it, we actually are expected to exceed the memory limit and not stay under it, but once exceeded, the footprint will not grow further. We think it's acceptable given that we're still accounting for the footprint with the memory accounting system precisely.)
Now, if we examine the behavior in b3) then we see that we always double the capacity for the new batch (subject to not exceeding coldata.BatchSize()
) even if minDesiredCapacity
was already satisfied by the old batch. This is the case this PR attempts to address. When desiredCapacitySufficient
is true
, then goal 2. is effectively disabled and goal 3. rises in priority.
Looking over this description, I think the behavior is a bit more nuanced than two modes you described.
Are we ever going to call this with a dynamic value for
desiredCapacitySufficient
? If we're always going to hardcode either true or false, I think it would be simpler for readers (and simpler for compiler inlining) to use two functions instead of sharing code. Then we can give them descriptive names like "doubleEachTime" and "ensureAtLeastThisLarge".
I think we will always hardcode either true
or false
there, but to me it seems two methods would be almost the same. For example, I think it's still beneficial to have the doubling behavior when desiredCapacitySufficient
is true
for the case when the old batch does not have enough capacity. For example, the deselector operator knows have many rows it will copy into the output batch from its current input batch, but the next input batch can be much larger - so if the old output batch doesn't enough capacity for the current input batch, it is probably a good idea to allocate the new output batch with some extra space. That said, most callers (like sorter or hash aggregator) know the total number of rows they will add into the output batch throughout its lifetime, so for those we could have a much simpler function. I guess to me having a single function with an additional parameter seems cleaner still.
The
cFetcher
example is confusing me. All other callers withdesiredCapacitySufficient == false
useminDesiredCapacity == 1
.cFetcher
, on the other hand, seems to know what it's doing withminDesiredCapacity
and yet it still uses the (a) mode. Why?
Yeah, maybe my choice of cFetcher
for the example was unfortunate given that it wants the most complicated behavior. The idea there if we have a limit hint, then the cFetcher
will try to allocate the first batch according to that hint (so it specifies minDesiredCapacity
argument equal to the hint); however, if it turns out that the limit hint number of rows was insufficient by the caller of the cFetcher
, then the hint was wrong, and we fallback to the doubling behavior.
Similarly, when performing a full table scan, we use the estimated row count from the table stats to size the first batch for the cFetcher
. Imagine we're performing a full table scan of the table with 10k rows but our stats are stale and say that the table only has 100 rows - in such a scenario we create the first batch with the capacity of 100. Then it turns out that there are still more rows to read, so for the second batch we use the same minDesiredCapacity = 100
but with desiredCapacitySufficient = false
, so we'll get the double capacity of 200, then of 400, etc, eventually getting to coldata.BatchSize()
.
Wow, I wrote up a lot of words, and they makes sense. Curious to hear your opinion on this setup.
*and I hope the make sense |
😅
**and I hope they make sense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just a cranky old man. This is good.
Reviewed 16 of 28 files at r1.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
pkg/sql/colmem/allocator.go
line 193 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
so there are two modes: (a) doubling if below
coldata.BatchSize()
or (b) ensuring at leastminDesiredCapacity
. In mode (a)minDesiredCapacity
only matters for the first call witholdBatch == nil
. Is that right?I think it's not quite right, but let me spell things out.
Goals for
ResetMaybeReallocate
in terms of capacity allocation, according to priority:
- make batches not exceed memory limits (
maxBatchMemSize
parameter)- grow the capacity of the batches exponentially, until maximum
- try to provide the capacity of at least
minDesiredCapacity
, subject to the memory limit though.We achieve this with the following approach:
a) if there is no old batch, since we don't know the footprint of rows that will be added to the batch, we just allocate a new batch ofminDesiredCapacity
(which is made to be in[1, coldata.BatchSize()]
range)
b) if there is an old batch, then:
b1) if the old batch already exceeded the memory limit, then we reuse it. This makes sure that we stay as close to goal 1. as possible.
b2) if the old batch is of the maximum capacity, then we also reuse it. It doesn't matter if we reached the maximum memory size - we have nowhere to grow anyway, so goal 2. has been reached.
b3) if the old batch has not exceeded the memory limit yet nor is of maximum capacity, then according to goal 2. we want to grow the capacity exponentially, so we make the new capacity double of the old one. Next, ifminDesiredCapacity
is still larger than the double of old capacity, we ratchet new capacity up to the desired capacity. The reasoning being that we're still within memory limits (goal 1. is good), we increased the capacity by a factor of more than 2 (goal 2. is good), and we got exactly the desired capacity (goal 3. is good).(Side comment: note that since we learn the batch memory footprint only after adding rows into it, we actually are expected to exceed the memory limit and not stay under it, but once exceeded, the footprint will not grow further. We think it's acceptable given that we're still accounting for the footprint with the memory accounting system precisely.)
Now, if we examine the behavior in b3) then we see that we always double the capacity for the new batch (subject to not exceeding
coldata.BatchSize()
) even ifminDesiredCapacity
was already satisfied by the old batch. This is the case this PR attempts to address. WhendesiredCapacitySufficient
istrue
, then goal 2. is effectively disabled and goal 3. rises in priority.Looking over this description, I think the behavior is a bit more nuanced that two modes you described.
Are we ever going to call this with a dynamic value for
desiredCapacitySufficient
? If we're always going to hardcode either true or false, I think it would be simpler for readers (and simpler for compiler inlining) to use two functions instead of sharing code. Then we can give them descriptive names like "doubleEachTime" and "ensureAtLeastThisLarge".I think we will always hardcode either
true
orfalse
there, but to me it seems two methods would be almost the same. For example, I think it's still beneficial to have the doubling behavior whendesiredCapacitySufficient
istrue
for the case when the old batch does not have enough capacity. For example, the deselector operator knows have many rows it will copy into the output batch from its current input batch, but the next input batch can be much larger - so if the old output batch doesn't enough capacity for the current input batch, it is probably a good idea to allocate the new output batch with some extra space. That said, most callers (like sorter or hash aggregator) know the total number of rows they will add into the output batch throughout its lifetime, so for those we could have a much simpler function. I guess to me having a single function with an additional parameter seems cleaner still.The
cFetcher
example is confusing me. All other callers withdesiredCapacitySufficient == false
useminDesiredCapacity == 1
.cFetcher
, on the other hand, seems to know what it's doing withminDesiredCapacity
and yet it still uses the (a) mode. Why?Yeah, maybe my choice of
cFetcher
for the example was unfortunate given that it wants the most complicated behavior. The idea there if we have a limit hint, then thecFetcher
will try to allocate the first batch according to that hint (so it specifiesminDesiredCapacity
argument equal to the hint); however, if it turns out that the limit hint number of rows was insufficient by the caller of thecFetcher
, then the hint was wrong, and we fallback to the doubling behavior.Similarly, when performing a full table scan, we use the estimated row count from the table stats to size the first batch for the
cFetcher
. Imagine we're performing a full table scan of the table with 10k rows but our stats are stale and say that the table only has 100 rows - in such a scenario we create the first batch with the capacity of 100. Then it turns out that there are still more rows to read, so for the second batch we use the sameminDesiredCapacity = 100
but withdesiredCapacitySufficient = false
, so we'll get the double capacity of 200, then of 400, etc, eventually getting tocoldata.BatchSize()
.
Wow, I wrote up a lot of words, and they makes sense. Curious to hear your opinion on this setup.
Your words make sense. I still think two functions would be clearer, but that is only an opinion, and your point that they would have a lot in common is valid.
Thank you for pointing out that maxBatchMemSize
is the highest-priority constraint. I was overlooking that.
And thank you for explaining the cFetcher
motivation, that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol
TFTRs!
bors r+
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
Build succeeded: |
Previously,
Allocator.ResetMaybeReallocate
would always grow thecapacity of the batch until
coldata.BatchSize()
(unless the memorylimit has been exceeded). This behavior allows us to have batches with
dynamic size when we don't know how many rows we need to process (for
example, in the
cFetcher
we start out with 1 row, then grow itexponentially - 2, 4, 8, etc). However, in some cases we know exactly
how many rows we want to include into the batch, so that behavior can
result in us re-allocating a batch needlessly when the old batch already
have enough capacity.
This commit improves the situation by adding a knob to indicate that if
the desired capacity is satisfied by the old batch, then it should not
be re-allocated. All callers have been audited accordingly.
Release note: None