Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/diskmap: remove SortedDiskMapIterator.{Key,Value} #43145

Merged
merged 1 commit into from
Dec 17, 2019

Conversation

petermattis
Copy link
Collaborator

Remove SortedDiskMapIterator.{Key,Value} as these accessors are
horribly slow due to performing an allocation on every call. Change the
existing uses of these methods to Unsafe{Key,Value} adding copying
when necessary. Most of the use cases were easy to fix, though note that
diskRowIterator.Row() contains a TODO because the removal of the
allocation there caused many test failures.

The PebbleMapIteration benchmarks still show a regression in
comparison to f5009c8. That regression
is entirely due to Pebble's new memtable sizing heuristics which start
with a small memtable size and dynamically grow the size up to the
configured limit. Adding a knob to disable that behavior for the
purposes of a benchmark does not seem worthwhile.

name                                     old time/op    new time/op    delta
RocksDBMapIteration/InputSize4096-16       1.18ms ± 3%    0.81ms ± 0%   -31.56%  (p=0.000 n=10+8)
RocksDBMapIteration/InputSize16384-16      5.83ms ± 1%    4.14ms ± 3%   -29.13%  (p=0.000 n=9+10)
RocksDBMapIteration/InputSize65536-16      24.8ms ± 1%    17.7ms ± 3%   -28.54%  (p=0.000 n=9+10)
RocksDBMapIteration/InputSize262144-16      137ms ± 0%     105ms ± 2%   -23.65%  (p=0.000 n=9+9)
RocksDBMapIteration/InputSize1048576-16     547ms ± 1%     430ms ± 1%   -21.44%  (p=0.000 n=8+9)
PebbleMapIteration/InputSize4096-16         594µs ± 1%     323µs ± 2%   -45.65%  (p=0.000 n=9+9)
PebbleMapIteration/InputSize16384-16       3.29ms ± 1%    2.15ms ± 1%   -34.70%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize65536-16       16.0ms ± 7%    11.2ms ±11%   -30.26%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16      96.7ms ± 3%    76.5ms ± 5%   -20.88%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16      267ms ± 0%     185ms ± 1%   -30.60%  (p=0.000 n=9+10)

name                                     old alloc/op   new alloc/op   delta
RocksDBMapIteration/InputSize4096-16        262kB ± 0%       0kB ± 0%   -99.97%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16      1.31MB ± 0%    0.00MB ± 0%   -99.99%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16      5.51MB ± 0%    0.00MB ± 3%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16     22.3MB ± 0%     0.0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16    89.4MB ± 0%     0.0MB ± 0%  -100.00%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize4096-16         263kB ± 0%       0kB ± 0%   -99.91%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16       1.31MB ± 0%    0.00MB ± 0%   -99.98%  (p=0.000 n=10+8)
PebbleMapIteration/InputSize65536-16       5.50MB ± 0%    0.00MB ± 3%   -99.99%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize262144-16      22.3MB ± 0%     0.0MB ± 0%   -99.99%  (p=0.000 n=10+7)
PebbleMapIteration/InputSize1048576-16     89.3MB ± 0%     0.0MB ±26%  -100.00%  (p=0.000 n=10+10)

name                                     old allocs/op  new allocs/op  delta
RocksDBMapIteration/InputSize4096-16        8.20k ± 0%     0.00k ± 0%   -99.96%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16       41.0k ± 0%      0.0k ± 0%   -99.99%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16        172k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16       696k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16     2.79M ± 0%     0.00M ± 0%  -100.00%  (p=0.000 n=9+9)
PebbleMapIteration/InputSize4096-16         8.20k ± 0%     0.01k ± 0%   -99.94%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16        41.0k ± 0%      0.0k ± 0%   -99.99%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize65536-16         172k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16        696k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16      2.79M ± 0%     0.00M ± 9%  -100.00%  (p=0.000 n=10+10)

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @nvanbenschoten)


pkg/sql/rowcontainer/disk_row_container.go, line 311 at r1 (raw file):

	k := r.UnsafeKey()
	v := r.UnsafeValue()
	// TODO(asubiotto): the "true ||" should not be necessary. We should be to

@asubiotto This deserves a look. Seems like a caller of Row() is misusing the API. If the results from Row() need to be stable, we could add something like a util/bufalloc.ByteAllocator here, though it would be better to track down the offending caller.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: other than the condition in diskRowIterator.Row. Thanks for tracking this down.

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @petermattis)


pkg/sql/rowcontainer/disk_row_container.go, line 311 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

@asubiotto This deserves a look. Seems like a caller of Row() is misusing the API. If the results from Row() need to be stable, we could add something like a util/bufalloc.ByteAllocator here, though it would be better to track down the offending caller.

I think what's going on here is that this type of contract (you may only reuse the row until the next call) is a bit unclear. CopyRows of EncDatums are only implemented as shallow copies, so the reference to this reuse only applies to the EncDatumRow, but not to the encoded field, which is what is rewritten in this case. The encoded field will not be copied, so I think the tests are probably failing due to that. This is unfortunate and there doesn't seem to be a good reason for it. To implement deep copies, we will have to implement deep copies for Datums as well. I think we should use a ByteAllocator here with a TODO referencing #43182

Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/rowcontainer/disk_row_container.go, line 311 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I think what's going on here is that this type of contract (you may only reuse the row until the next call) is a bit unclear. CopyRows of EncDatums are only implemented as shallow copies, so the reference to this reuse only applies to the EncDatumRow, but not to the encoded field, which is what is rewritten in this case. The encoded field will not be copied, so I think the tests are probably failing due to that. This is unfortunate and there doesn't seem to be a good reason for it. To implement deep copies, we will have to implement deep copies for Datums as well. I think we should use a ByteAllocator here with a TODO referencing #43182

The comment above diskRowIterator.Row seems to make the contract for the lifetime of the result pretty explicit. Do you mind if I leave this TODO as-is for you (or someone else on the SQL exec team) to fix? Doing a copy here as this PR is doing, is equivalent to what the prior code was doing. In fact, this PR is slightly better as it is replacing 2 allocations with 1.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @petermattis)


pkg/sql/rowcontainer/disk_row_container.go, line 311 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The comment above diskRowIterator.Row seems to make the contract for the lifetime of the result pretty explicit. Do you mind if I leave this TODO as-is for you (or someone else on the SQL exec team) to fix? Doing a copy here as this PR is doing, is equivalent to what the prior code was doing. In fact, this PR is slightly better as it is replacing 2 allocations with 1.

👍

Remove `SortedDiskMapIterator.{Key,Value}` as these accessors are
horribly slow due to performing an allocation on every call. Change the
existing uses of these methods to `Unsafe{Key,Value}` adding copying
when necessary. Most of the use cases were easy to fix, though note that
`diskRowIterator.Row()` contains a TODO because the removal of the
allocation there caused many test failures.

The `PebbleMapIteration` benchmarks still show a regression in
comparison to f5009c8. That regression
is entirely due to Pebble's new memtable sizing heuristics which start
with a small memtable size and dynamically grow the size up to the
configured limit. Adding a knob to disable that behavior for the
purposes of a benchmark does not seem worthwhile.

name                                     old time/op    new time/op    delta
RocksDBMapIteration/InputSize4096-16       1.18ms ± 3%    0.81ms ± 0%   -31.56%  (p=0.000 n=10+8)
RocksDBMapIteration/InputSize16384-16      5.83ms ± 1%    4.14ms ± 3%   -29.13%  (p=0.000 n=9+10)
RocksDBMapIteration/InputSize65536-16      24.8ms ± 1%    17.7ms ± 3%   -28.54%  (p=0.000 n=9+10)
RocksDBMapIteration/InputSize262144-16      137ms ± 0%     105ms ± 2%   -23.65%  (p=0.000 n=9+9)
RocksDBMapIteration/InputSize1048576-16     547ms ± 1%     430ms ± 1%   -21.44%  (p=0.000 n=8+9)
PebbleMapIteration/InputSize4096-16         594µs ± 1%     323µs ± 2%   -45.65%  (p=0.000 n=9+9)
PebbleMapIteration/InputSize16384-16       3.29ms ± 1%    2.15ms ± 1%   -34.70%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize65536-16       16.0ms ± 7%    11.2ms ±11%   -30.26%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16      96.7ms ± 3%    76.5ms ± 5%   -20.88%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16      267ms ± 0%     185ms ± 1%   -30.60%  (p=0.000 n=9+10)

name                                     old alloc/op   new alloc/op   delta
RocksDBMapIteration/InputSize4096-16        262kB ± 0%       0kB ± 0%   -99.97%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16      1.31MB ± 0%    0.00MB ± 0%   -99.99%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16      5.51MB ± 0%    0.00MB ± 3%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16     22.3MB ± 0%     0.0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16    89.4MB ± 0%     0.0MB ± 0%  -100.00%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize4096-16         263kB ± 0%       0kB ± 0%   -99.91%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16       1.31MB ± 0%    0.00MB ± 0%   -99.98%  (p=0.000 n=10+8)
PebbleMapIteration/InputSize65536-16       5.50MB ± 0%    0.00MB ± 3%   -99.99%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize262144-16      22.3MB ± 0%     0.0MB ± 0%   -99.99%  (p=0.000 n=10+7)
PebbleMapIteration/InputSize1048576-16     89.3MB ± 0%     0.0MB ±26%  -100.00%  (p=0.000 n=10+10)

name                                     old allocs/op  new allocs/op  delta
RocksDBMapIteration/InputSize4096-16        8.20k ± 0%     0.00k ± 0%   -99.96%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16       41.0k ± 0%      0.0k ± 0%   -99.99%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16        172k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16       696k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16     2.79M ± 0%     0.00M ± 0%  -100.00%  (p=0.000 n=9+9)
PebbleMapIteration/InputSize4096-16         8.20k ± 0%     0.01k ± 0%   -99.94%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16        41.0k ± 0%      0.0k ± 0%   -99.99%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize65536-16         172k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16        696k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16      2.79M ± 0%     0.00M ± 9%  -100.00%  (p=0.000 n=10+10)

Release note: None
@petermattis petermattis force-pushed the pmattis/map-iteration branch from c3108ed to ff890ab Compare December 16, 2019 17:13
@petermattis
Copy link
Collaborator Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 16, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Dec 16, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Dec 16, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Dec 16, 2019

Build failed

@petermattis
Copy link
Collaborator Author

4 different CI failures, 3 of which are unrelated to this PR. The 4th is out of space on make bench, which might be due to this PR touching the storage/engine package. Maybe. Retrying.

bors r+

craig bot pushed a commit that referenced this pull request Dec 17, 2019
43145: storage/diskmap: remove SortedDiskMapIterator.{Key,Value} r=petermattis a=petermattis

Remove `SortedDiskMapIterator.{Key,Value}` as these accessors are
horribly slow due to performing an allocation on every call. Change the
existing uses of these methods to `Unsafe{Key,Value}` adding copying
when necessary. Most of the use cases were easy to fix, though note that
`diskRowIterator.Row()` contains a TODO because the removal of the
allocation there caused many test failures.

The `PebbleMapIteration` benchmarks still show a regression in
comparison to f5009c8. That regression
is entirely due to Pebble's new memtable sizing heuristics which start
with a small memtable size and dynamically grow the size up to the
configured limit. Adding a knob to disable that behavior for the
purposes of a benchmark does not seem worthwhile.

```
name                                     old time/op    new time/op    delta
RocksDBMapIteration/InputSize4096-16       1.18ms ± 3%    0.81ms ± 0%   -31.56%  (p=0.000 n=10+8)
RocksDBMapIteration/InputSize16384-16      5.83ms ± 1%    4.14ms ± 3%   -29.13%  (p=0.000 n=9+10)
RocksDBMapIteration/InputSize65536-16      24.8ms ± 1%    17.7ms ± 3%   -28.54%  (p=0.000 n=9+10)
RocksDBMapIteration/InputSize262144-16      137ms ± 0%     105ms ± 2%   -23.65%  (p=0.000 n=9+9)
RocksDBMapIteration/InputSize1048576-16     547ms ± 1%     430ms ± 1%   -21.44%  (p=0.000 n=8+9)
PebbleMapIteration/InputSize4096-16         594µs ± 1%     323µs ± 2%   -45.65%  (p=0.000 n=9+9)
PebbleMapIteration/InputSize16384-16       3.29ms ± 1%    2.15ms ± 1%   -34.70%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize65536-16       16.0ms ± 7%    11.2ms ±11%   -30.26%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16      96.7ms ± 3%    76.5ms ± 5%   -20.88%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16      267ms ± 0%     185ms ± 1%   -30.60%  (p=0.000 n=9+10)

name                                     old alloc/op   new alloc/op   delta
RocksDBMapIteration/InputSize4096-16        262kB ± 0%       0kB ± 0%   -99.97%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16      1.31MB ± 0%    0.00MB ± 0%   -99.99%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16      5.51MB ± 0%    0.00MB ± 3%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16     22.3MB ± 0%     0.0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16    89.4MB ± 0%     0.0MB ± 0%  -100.00%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize4096-16         263kB ± 0%       0kB ± 0%   -99.91%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16       1.31MB ± 0%    0.00MB ± 0%   -99.98%  (p=0.000 n=10+8)
PebbleMapIteration/InputSize65536-16       5.50MB ± 0%    0.00MB ± 3%   -99.99%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize262144-16      22.3MB ± 0%     0.0MB ± 0%   -99.99%  (p=0.000 n=10+7)
PebbleMapIteration/InputSize1048576-16     89.3MB ± 0%     0.0MB ±26%  -100.00%  (p=0.000 n=10+10)

name                                     old allocs/op  new allocs/op  delta
RocksDBMapIteration/InputSize4096-16        8.20k ± 0%     0.00k ± 0%   -99.96%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16       41.0k ± 0%      0.0k ± 0%   -99.99%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16        172k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16       696k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16     2.79M ± 0%     0.00M ± 0%  -100.00%  (p=0.000 n=9+9)
PebbleMapIteration/InputSize4096-16         8.20k ± 0%     0.01k ± 0%   -99.94%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16        41.0k ± 0%      0.0k ± 0%   -99.99%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize65536-16         172k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16        696k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16      2.79M ± 0%     0.00M ± 9%  -100.00%  (p=0.000 n=10+10)
```

Release note: None

43207: roachprod/gce: use 2 SSDs for c2 machine types r=nvanbenschoten a=nvanbenschoten

This is required to spin up c2 instances, just as it is to spin up
n2 instances.

Release note: None

43214: sql: stop swallowing errors from UncachedPhysicalAccessor.IsValidSchema r=andreimatei a=ajwerner

Before this commit we'd swallow retriable errors during execution. This can
be very problematic as it can lead to lost writes and other general funkiness.
We're opting to not write a test for this specific case as there is ongoing
work to change interfaces to preclude this sort of bug.

Fixes #43067
Fixes #37883 (comment)

Release note: None

43219: pgwire/pgcode: fix DeprecatedInternalConnectionFailure r=nvanbenschoten a=nvanbenschoten

This was moved from "08006" in #41451, not "XXC03".

Release note: None

Co-authored-by: Peter Mattis <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@craig
Copy link
Contributor

craig bot commented Dec 17, 2019

Build succeeded

@craig craig bot merged commit ff890ab into cockroachdb:master Dec 17, 2019
@petermattis petermattis deleted the pmattis/map-iteration branch December 17, 2019 01:31
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Dec 1, 2022
This commit addresses an old TODO about figuring out why we cannot reuse
the same buffer in `diskRowIterator.Row` method. The contract of that
method was previously confusing - it says that the call to `Row`
invalidates the row returned on the previous call; however, the
important piece was missing - that the datums themselves cannot be
mutated (this is what we assume elsewhere and perform only the "shallow"
copies). This commit clarifies the contract of the method and explicitly
explains why we need to allocate fresh byte slices (which is done via
`ByteAllocator` to reduce the number of allocations).

Additional context can be found in cockroachdb#43145 which added this copy in the
first place. Here is a relevant quote from Alfonso:
```
I think what's going on here is that this type of contract (you may only
reuse the row until the next call) is a bit unclear. `CopyRow`s of
`EncDatum`s are only implemented as shallow copies, so the reference to
this reuse only applies to the `EncDatumRow`, but not to the `encoded`
field, which is what is rewritten in this case. The `encoded` field will
not be copied, so I think the tests are probably failing due to that.
This is unfortunate and there doesn't seem to be a good reason for it.
To implement deep copies, we will have to implement deep copies for
`Datum`s as well.
```
I think we were under the impression that we'd implement the "deep copy"
in `CopyRow`, but I highly doubt we'll do so given that we're mostly
investing in the columnar infrastructure nowadays, and the current way
of performing shallow copying has worked well long enough.

Epic: None

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Dec 1, 2022
This commit addresses an old TODO about figuring out why we cannot reuse
the same buffer in `diskRowIterator.Row` method. The contract of that
method was previously confusing - it says that the call to `Row`
invalidates the row returned on the previous call; however, the
important piece was missing - that the datums themselves cannot be
mutated (this is what we assume elsewhere and perform only the "shallow"
copies). This commit clarifies the contract of the method and explicitly
explains why we need to allocate fresh byte slices (which is done via
`ByteAllocator` to reduce the number of allocations).

Additional context can be found in cockroachdb#43145 which added this copy in the
first place. Here is a relevant quote from Alfonso:
```
I think what's going on here is that this type of contract (you may only
reuse the row until the next call) is a bit unclear. `CopyRow`s of
`EncDatum`s are only implemented as shallow copies, so the reference to
this reuse only applies to the `EncDatumRow`, but not to the `encoded`
field, which is what is rewritten in this case. The `encoded` field will
not be copied, so I think the tests are probably failing due to that.
This is unfortunate and there doesn't seem to be a good reason for it.
To implement deep copies, we will have to implement deep copies for
`Datum`s as well.
```
I think we were under the impression that we'd implement the "deep copy"
in `CopyRow`, but I highly doubt we'll do so given that we're mostly
investing in the columnar infrastructure nowadays, and the current way
of performing shallow copying has worked well long enough.

Epic: None

Release note: None
craig bot pushed a commit that referenced this pull request Dec 2, 2022
92713: sql: fix left semi and left anti virtual lookup joins r=yuzefovich a=yuzefovich

This commit fixes the execution of the left semi and left anti virtual
lookup joins. The bug was that we forgot to project away the looked up
columns (coming from the "right" side) which could then lead to wrong
columns being used higher up the tree. The bug was introduced during
22.1 release cycle where we added the optimizer support for generating
plans that could contain left semi and left anti virtual lookup joins.
This commit fixes that issue as well as the output columns of such joins
(I'm not sure whether there is a user facing impact of having incorrect
"output columns").

Additionally, this commit fixes the execution of these virtual lookup
joins to correctly return the input row only once. Previously, for left
anti joins we'd be producing an output row if there was a match (which
is wrong), and for both left semi and left anti we would emit an output
row every time there was a match (but this should be done only once).
(Although I'm not sure whether it is possible for virtual indexes to
result in multiple looked up rows.)

Also, as a minor simplification this commit makes it so that the output
rows are not added into the row container for left semi and left anti
and the container is not instantiated at all.

Fixes: #91012.
Fixes: #88096.

Release note (bug fix): CockroachDB previously could incorrectly
evaluate queries that performed left semi and left anti "virtual lookup"
joins on tables in `pg_catalog` or `information_schema`. These join types
can be planned when a subquery is used inside of a filter condition. The
bug was introduced in 22.1.0 and is now fixed.

92783: nogo: detect deprecated go standard library packages r=rickystewart a=healthy-pod

In #87327, we patched `staticcheck` to detect deprecated "objects" from the standard library. This patch ensures that we also detect deprecated "packages".

Closes #84877

Release note: None

92859: rowcontainer: address an old TODO r=yuzefovich a=yuzefovich

This commit addresses an old TODO about figuring out why we cannot reuse the same buffer in `diskRowIterator.Row` method. The contract of that method was previously confusing - it says that the call to `Row` invalidates the row returned on the previous call; however, the important piece was missing - that the datums themselves cannot be mutated (this is what we assume elsewhere and perform only the "shallow" copies). This commit clarifies the contract of the method and explicitly explains why we need to allocate fresh byte slices (which is done via `ByteAllocator` to reduce the number of allocations).

Additional context can be found in #43145 which added this copy in the first place. Here is a relevant quote from Alfonso:
```
I think what's going on here is that this type of contract (you may only
reuse the row until the next call) is a bit unclear. `CopyRow`s of
`EncDatum`s are only implemented as shallow copies, so the reference to
this reuse only applies to the `EncDatumRow`, but not to the `encoded`
field, which is what is rewritten in this case. The `encoded` field will
not be copied, so I think the tests are probably failing due to that.
This is unfortunate and there doesn't seem to be a good reason for it.
To implement deep copies, we will have to implement deep copies for
`Datum`s as well.
```
I think we were under the impression that we'd implement the "deep copy" in `CopyRow`, but I highly doubt we'll do so given that we're mostly investing in the columnar infrastructure nowadays, and the current way of performing shallow copying has worked well long enough.

Epic: None

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: healthy-pod <[email protected]>
yuzefovich added a commit that referenced this pull request Feb 14, 2023
This commit addresses an old TODO about figuring out why we cannot reuse
the same buffer in `diskRowIterator.Row` method. The contract of that
method was previously confusing - it says that the call to `Row`
invalidates the row returned on the previous call; however, the
important piece was missing - that the datums themselves cannot be
mutated (this is what we assume elsewhere and perform only the "shallow"
copies). This commit clarifies the contract of the method and explicitly
explains why we need to allocate fresh byte slices (which is done via
`ByteAllocator` to reduce the number of allocations).

Additional context can be found in #43145 which added this copy in the
first place. Here is a relevant quote from Alfonso:
```
I think what's going on here is that this type of contract (you may only
reuse the row until the next call) is a bit unclear. `CopyRow`s of
`EncDatum`s are only implemented as shallow copies, so the reference to
this reuse only applies to the `EncDatumRow`, but not to the `encoded`
field, which is what is rewritten in this case. The `encoded` field will
not be copied, so I think the tests are probably failing due to that.
This is unfortunate and there doesn't seem to be a good reason for it.
To implement deep copies, we will have to implement deep copies for
`Datum`s as well.
```
I think we were under the impression that we'd implement the "deep copy"
in `CopyRow`, but I highly doubt we'll do so given that we're mostly
investing in the columnar infrastructure nowadays, and the current way
of performing shallow copying has worked well long enough.

Epic: None

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Apr 3, 2023
This commit addresses an old TODO about figuring out why we cannot reuse
the same buffer in `diskRowIterator.Row` method. The contract of that
method was previously confusing - it says that the call to `Row`
invalidates the row returned on the previous call; however, the
important piece was missing - that the datums themselves cannot be
mutated (this is what we assume elsewhere and perform only the "shallow"
copies). This commit clarifies the contract of the method and explicitly
explains why we need to allocate fresh byte slices (which is done via
`ByteAllocator` to reduce the number of allocations).

Additional context can be found in cockroachdb#43145 which added this copy in the
first place. Here is a relevant quote from Alfonso:
```
I think what's going on here is that this type of contract (you may only
reuse the row until the next call) is a bit unclear. `CopyRow`s of
`EncDatum`s are only implemented as shallow copies, so the reference to
this reuse only applies to the `EncDatumRow`, but not to the `encoded`
field, which is what is rewritten in this case. The `encoded` field will
not be copied, so I think the tests are probably failing due to that.
This is unfortunate and there doesn't seem to be a good reason for it.
To implement deep copies, we will have to implement deep copies for
`Datum`s as well.
```
I think we were under the impression that we'd implement the "deep copy"
in `CopyRow`, but I highly doubt we'll do so given that we're mostly
investing in the columnar infrastructure nowadays, and the current way
of performing shallow copying has worked well long enough.

Epic: None

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants