Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-20.1: sql: fix pagination in UPSERT #51626

Merged
merged 2 commits into from
Jul 21, 2020

Conversation

yuzefovich
Copy link
Member

Backport 2/2 commits from #51608.

/cc @cockroachdb/release


sql: minor cleanup around upsert

This commit removes a couple of duplicated "init" calls as well as some
unused parameters around upsert code.

Release note: None

sql: fix pagination in UPSERT

optTableUpserter was incorrectly overriding curBatchSize method by
returning insertRows.Len, but that container is not actually used
anywhere. As a result, curBatchSize was always considered 0, and we
didn't perform the pagination on the UPSERTs. The bug was introduced in
#33339 (in 19.1.0).

Fixes: #51391.

Release note (bug fix): Previously, CockroachDB could hit a "command is
too large" error when performing UPSERT operation with many values.
Internally, we attempt to perform such operation by splitting it into
"batches", but the batching mechanism was broken.

This commit removes a couple of duplicated "init" calls as well as some
unused parameters around upsert code.

Release note: None
`optTableUpserter` was incorrectly overriding `curBatchSize` method by
returning `insertRows.Len`, but that container is not actually used
anywhere. As a result, `curBatchSize` was always considered 0, and we
didn't perform the pagination on the UPSERTs. The bug was introduced in
33339 (in 19.1.0).

Release note (bug fix): Previously, CockroachDB could hit a "command is
too large" error when performing UPSERT operation with many values.
Internally, we attempt to perform such operation by splitting it into
"batches", but the batching mechanism was broken.
@yuzefovich yuzefovich requested a review from RaduBerinde July 20, 2020 23:04
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)

@yuzefovich yuzefovich merged commit e69bc7b into cockroachdb:release-20.1 Jul 21, 2020
@yuzefovich yuzefovich deleted the backport20.1-51608 branch July 21, 2020 14:04
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Sep 16, 2020
In cockroachdb#51944 I fixed a bug but introduced an even worse one. The original
bug was that if an UPSERT has RETURNING clause, we could have returned
incorrect values because internally `resultCount` and `rowsUpserted`
could be not synchronized. It was fixed by resetting `resultCount`,
however, the resetting was done unconditionally. This is incorrect
because `resultCount` variable is used by `upsertNode.BatchedNext` to
indicate whether there is more work to do (and if `resultCount==0`, then
we should finish). This bug would result in an UPSERT with or without
RETURNING clause of more than 10k rows actually process only 10k and
exit early. This is now fixed.

Relatedly, an UPSERT with RETURNING clause would incorrectly return no
rows when it was processing more than 10k rows.

Additionally, cockroachdb#51626 fixed a bug with pagination of UPSERTs which
exposed another bug when RETURNING clause is present - we were clearing
`rowsUpserted` in `BatchedNext` (as part of `flushAndStartNewBatch`
call), but that clear happens too early - we are accessing it after
`BatchedNext` returns with `BatchedValues`. This would lead to an index
out of bounds crush. Before cockroachdb#51626 there was no pagination done, so we
always had a single batch and `flushAndStartNewBatch` was never called
(to reset the row container too early). This is also now fixed. Note
that this second bug was impossible to run into because of the first one
(we would never reach

Release note (bug fix): CockroachDB in 20.1.4 and 20.1.5 releases could
finish UPSERT operation too early - namely, it would correctly insert
only up to 10000 rows and ignoring the rest. Furthermore, an UPSERT with
RETURNING clause in such scenario would return no rows (it would only
process 10k rows but return 0 rows).
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Sep 16, 2020
In cockroachdb#51944 I fixed a bug but introduced an even worse one. The original
bug was that if an UPSERT has RETURNING clause, we could have returned
incorrect values because internally `resultCount` and `rowsUpserted`
could be not synchronized. It was fixed by resetting `resultCount`,
however, the resetting was done unconditionally. This is incorrect
because `resultCount` variable is used by `upsertNode.BatchedNext` to
indicate whether there is more work to do (and if `resultCount==0`, then
we should finish). This bug would result in an UPSERT with or without
RETURNING clause of more than 10k rows actually process only 10k and
exit early. This is now fixed.

Relatedly, an UPSERT with RETURNING clause would incorrectly return no
rows when it was processing more than 10k rows.

Additionally, cockroachdb#51626 fixed a bug with pagination of UPSERTs which
exposed another bug when RETURNING clause is present - we were clearing
`rowsUpserted` in `BatchedNext` (as part of `flushAndStartNewBatch`
call), but that clear happens too early - we are accessing it after
`BatchedNext` returns with `BatchedValues`. This would lead to an index
out of bounds crush. Before cockroachdb#51626 there was no pagination done, so we
always had a single batch and `flushAndStartNewBatch` was never called
(to reset the row container too early). This is also now fixed. Note
that this second bug was impossible to run into because of the first one
(we would never reach

Release note (bug fix): CockroachDB in 20.1.4 and 20.1.5 releases could
finish UPSERT operation too early - namely, it would correctly insert
only up to 10000 rows and ignoring the rest. Furthermore, an UPSERT with
RETURNING clause in such scenario would return no rows (it would only
process 10k rows but return 0 rows).
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Sep 16, 2020
In cockroachdb#51944 I fixed a bug but introduced an even worse one. The original
bug was that if an UPSERT has RETURNING clause, we could have returned
incorrect values because internally `resultCount` and `rowsUpserted`
could be not synchronized. It was fixed by resetting `resultCount`,
however, the resetting was done unconditionally. This is incorrect
because `resultCount` variable is used by `upsertNode.BatchedNext` to
indicate whether there is more work to do (and if `resultCount==0`, then
we should finish). This bug would result in an UPSERT with or without
RETURNING clause of more than 10k rows actually process only 10k and
exit early. This is now fixed.

Relatedly, an UPSERT with RETURNING clause would incorrectly return no
rows when it was processing more than 10k rows.

Additionally, cockroachdb#51626 fixed a bug with pagination of UPSERTs which
exposed another bug when RETURNING clause is present - we were clearing
`rowsUpserted` in `BatchedNext` (as part of `flushAndStartNewBatch`
call), but that clear happens too early - we are accessing it after
`BatchedNext` returns with `BatchedValues`. This would lead to an index
out of bounds crush. Before cockroachdb#51626 there was no pagination done, so we
always had a single batch and `flushAndStartNewBatch` was never called
(to reset the row container too early). This is also now fixed. Note
that this second bug was impossible to run into because of the first one
(we would never reach this code).

Release note (bug fix): CockroachDB in 20.1.4 and 20.1.5 releases could
finish UPSERT operation too early - namely, it would correctly insert
only up to 10000 rows and ignoring the rest. Furthermore, an UPSERT with
RETURNING clause in such scenario would return no rows (it would only
process 10k rows but return 0 rows).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants