colexec: fix sort chunks with disk spilling in very rare circumstances #80679

yuzefovich · 2022-04-28T00:43:06Z

This commit fixes a long-standing but very rare bug which could result
in some rows being dropped when sort chunks ("segmented sort") spills
to disk.

The root cause is that a deselector operator is placed on top of the
input to the sort chunks op (because its "chunker" spooler assumes no
selection vector on batches), and that deselector uses the same
allocator as the sort chunks. If the allocator's budget is used up, then
an error is thrown, and it is caught by the disk-spilling infrastructure
that is wrapping this whole sort chunks -> chunker -> deselector
chain; the error is then suppressed, and spilling to disk occurs.
However, crucially, it was always assumed that the error occurred in
chunker, so only that component knows how to properly perform the
fallover. If the error occurs in the deselector, the deselector might
end up losing a single input batch.

We worked around this by making a fake allocation in the deselector
before reading the input batch. However, if the stars align, and the
error occurs after reading the input batch in the deselector, that
input batch will be lost, and we might get incorrect results.

For the bug to occur a couple of conditions need to be met:

The "memory budget exceeded" error must occur for the sort chunks
operation. It is far more likely that it will occur in the "chunker"
because that component can buffer an arbitrarily large number of tuples
and because we did make that fake allocation.
The input operator to the chain must be producing batches with
selection vectors on top - if this is not the case, then the deselector
is a noop. An example of such an input is a table reader with a filter
on top.

The fix is quite simple - use a separate allocator for the deselector
that has an unlimited budget. This allows us to still properly track the
memory usage of an extra batch created in the deselector without it
running into these difficulties with disk spilling. This also makes it
so that if a "memory budget exceeded" error does occur in the deselector
(which is possible if --max-sql-memory has been used up), it will not
be caught by the disk-spilling infrastructure and will be propagate to
the user - which is the expected and desired behavior in such
a scenario.

There is no explicit regression test for this since our existing unit
tests already exercise this scenario once the fake allocation in the
deselector is removed.

Fixes: #80645.

Release note (bug fix): Previously, in very rare circumstances
CockroachDB could incorrectly evaluate queries with ORDER BY clause when
the prefix of ordering was already provided by the index ordering of the
scanned table.

cockroach-teamcity · 2022-04-28T00:43:15Z

This change is

This commit fixes a long-standing but very rare bug which could result in some rows being dropped when sort chunks ("segmented sort") spills to disk. The root cause is that a deselector operator is placed on top of the input to the sort chunks op (because its "chunker" spooler assumes no selection vector on batches), and that deselector uses the same allocator as the sort chunks. If the allocator's budget is used up, then an error is thrown, and it is caught by the disk-spilling infrastructure that is wrapping this whole `sort chunks -> chunker -> deselector` chain; the error is then suppressed, and spilling to disk occurs. However, crucially, it was always assumed that the error occurred in `chunker`, so only that component knows how to properly perform the fallover. If the error occurs in the deselector, the deselector might end up losing a single input batch. We worked around this by making a fake allocation in the deselector before reading the input batch. However, if the stars align, and the error occurs _after_ reading the input batch in the deselector, that input batch will be lost, and we might get incorrect results. For the bug to occur a couple of conditions need to be met: 1. The "memory budget exceeded" error must occur for the sort chunks operation. It is far more likely that it will occur in the "chunker" because that component can buffer an arbitrarily large number of tuples and because we did make that fake allocation. 2. The input operator to the chain must be producing batches with selection vectors on top - if this is not the case, then the deselector is a noop. An example of such an input is a table reader with a filter on top. The fix is quite simple - use a separate allocator for the deselector that has an unlimited budget. This allows us to still properly track the memory usage of an extra batch created in the deselector without it running into these difficulties with disk spilling. This also makes it so that if a "memory budget exceeded" error does occur in the deselector (which is possible if `--max-sql-memory` has been used up), it will not be caught by the disk-spilling infrastructure and will be propagate to the user - which is the expected and desired behavior in such a scenario. There is no explicit regression test for this since our existing unit tests already exercise this scenario once the fake allocation in the deselector is removed. Release note (bug fix): Previously, in very rare circumstances CockroachDB could incorrectly evaluate queries with ORDER BY clause when the prefix of ordering was already provided by the index ordering of the scanned table.

cucaroach

Nice sluething! I wonder if there's a way to combine metamorphic testing and code coverage to ensure all memory monitor error paths are hit, maybe combined with the new fuzzing stuff?

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek)

rytaft

Nice work! Does this also need to be backported to 21.1? Or does the bug not exist there?

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @msirek)

yuzefovich · 2022-04-28T14:38:18Z

I wonder if there's a way to combine metamorphic testing and code coverage to ensure all memory monitor error paths are hit, maybe combined with the new fuzzing stuff?

I doubt it's possible. The way we propagate errors in the vectorized engine (by panic-catch mechanism) seems at odds with the code coverage idea. Probably the best we can do is attempt to randomize memory limits as much as possible when running as many of our tests as possible.

Does this also need to be backported to 21.1? Or does the bug not exist there?

I thought we don't have any more 21.1.x releases scheduled (i.e. it is at end-of-life), so I didn't bother putting the corresponding backport label. Just checked the release schedule, and we do have 21.1.19 (the last one) scheduled for next week, so adding the label. The bug is definitely present on 21.1.

TFTRs!

bors r+

craig · 2022-04-28T15:57:08Z

Build succeeded:

GitHub CI (Cockroach)

blathers-crl · 2022-04-28T15:57:40Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from ef7d3ac to blathers/backport-release-21.1-80679: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.1.x failed. See errors above.

error creating merge commit from ef7d3ac to blathers/backport-release-21.2-80679: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

yuzefovich added backport-21.2.x labels Apr 28, 2022

yuzefovich requested review from msirek, cucaroach and a team April 28, 2022 00:43

yuzefovich changed the title ~~colexec: fix sort chunks when disk spilling in very rare circumstances~~ colexec: fix sort chunks with disk spilling in very rare circumstances Apr 28, 2022

yuzefovich force-pushed the fix-sort-chunks branch from 135fb77 to ef7d3ac Compare April 28, 2022 00:44

cucaroach approved these changes Apr 28, 2022

View reviewed changes

rytaft approved these changes Apr 28, 2022

View reviewed changes

yuzefovich added the backport-21.1.x label Apr 28, 2022

craig bot merged commit 3831e27 into cockroachdb:master Apr 28, 2022

This was referenced Apr 28, 2022

release-22.1: colexec: fix sort chunks with disk spilling in very rare circumstances #80714

Merged

release-22.1.0: colexec: fix sort chunks with disk spilling in very rare circumstances #80715

Merged

yuzefovich mentioned this pull request Apr 28, 2022

release-21.2: colexec: fix sort chunks with disk spilling in very rare circumstances #80731

Merged

yuzefovich deleted the fix-sort-chunks branch April 28, 2022 20:12

yuzefovich mentioned this pull request Apr 28, 2022

release-21.1: colexec: fix sort chunks with disk spilling in very rare circumstances #80732

Merged

tbg mentioned this pull request May 16, 2022

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [self-delegated snaps] #72083

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: fix sort chunks with disk spilling in very rare circumstances #80679

colexec: fix sort chunks with disk spilling in very rare circumstances #80679

yuzefovich commented Apr 28, 2022

cockroach-teamcity commented Apr 28, 2022

cucaroach left a comment

rytaft left a comment

yuzefovich commented Apr 28, 2022

craig bot commented Apr 28, 2022

blathers-crl bot commented Apr 28, 2022

colexec: fix sort chunks with disk spilling in very rare circumstances #80679

colexec: fix sort chunks with disk spilling in very rare circumstances #80679

Conversation

yuzefovich commented Apr 28, 2022

cockroach-teamcity commented Apr 28, 2022

cucaroach left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

yuzefovich commented Apr 28, 2022

craig bot commented Apr 28, 2022

blathers-crl bot commented Apr 28, 2022