-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-20.1: sql: fix large UPSERTs #54418
release-20.1: sql: fix large UPSERTs #54418
Conversation
3ffa6eb
to
13895d2
Compare
Friendly ping - I'll be asking for an expedited release to include this fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, I missed the original email.
Can you improve the comment for resultCount
explaining the semantics more accurately? (e.g. not clear what "counted separately otherwise" means). And update the comments for batchedCount
/ batchedValues
explaining what they return (the current comments refer to an interface that no longer exists)
I can't say I understand this, don't we want batchedCount()
to return the number of rows in the last batch (at most 10,000)? E.g. won't upsertNode.BatchedNext
now return true
even if the last batch happens to be empty?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)
13895d2
to
486d4a2
Compare
I agree, it is confusing. I renamed the variable to be more explicit and refactored the code a bit, it should be more clear now.
Removed these methods entirely.
Good point, yes, I believe that is another bug that has been present for a long time. Fixed.
Correct (this is a bug), but it was hidden because we're setting |
Hm, I am confused how |
Yeah, it crashes on
|
20.2.0-beta.1 also crashes but 20.1.3 doesn't - I guess my refactor introduced another problem too 🤦 |
Maybe @knz can also take a look since he wrote some of the original code (I believe). |
486d4a2
to
e48836e
Compare
I think I figured it out - the second issue that UPSERTs with RETURNING clause don't work correctly when we have multiple batches has been hidden by the fact that the pagination had been broken (which was fixed by #51626), so prior to that we always had a single batch and would never run into the problem that was exposed once the pagination was fixed. RFAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz and @yuzefovich)
pkg/sql/tablewriter_upsert_opt.go, line 153 at r1 (raw file):
tu.resultCount = 0 if tu.collectRows { tu.rowsUpserted.Clear(ctx)
We did not need this Clear?
pkg/sql/upsert.go, line 189 at r1 (raw file):
func (n *upsertNode) BatchedValues(rowIdx int) tree.Datums { if !n.run.tw.collectRows { panic("return row requested but collect rows was not set")
errors.AssertionErrorf
b9f1db6
to
ccc244d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz and @RaduBerinde)
pkg/sql/tablewriter_upsert_opt.go, line 153 at r1 (raw file):
Previously, RaduBerinde wrote…
We did not need this Clear?
We need it, but at different time - it is done at the beginning of BatchedNext
(similar to how other table writers do it).
In cockroachdb#51944 I fixed a bug but introduced an even worse one. The original bug was that if an UPSERT has RETURNING clause, we could have returned incorrect values because internally `resultCount` and `rowsUpserted` could be not synchronized. It was fixed by resetting `resultCount`, however, the resetting was done unconditionally. This is incorrect because `resultCount` variable is used by `upsertNode.BatchedNext` to indicate whether there is more work to do (and if `resultCount==0`, then we should finish). This bug would result in an UPSERT with or without RETURNING clause of more than 10k rows actually process only 10k and exit early. This is now fixed. Relatedly, an UPSERT with RETURNING clause would incorrectly return no rows when it was processing more than 10k rows. Additionally, cockroachdb#51626 fixed a bug with pagination of UPSERTs which exposed another bug when RETURNING clause is present - we were clearing `rowsUpserted` in `BatchedNext` (as part of `flushAndStartNewBatch` call), but that clear happens too early - we are accessing it after `BatchedNext` returns with `BatchedValues`. This would lead to an index out of bounds crush. Before cockroachdb#51626 there was no pagination done, so we always had a single batch and `flushAndStartNewBatch` was never called (to reset the row container too early). This is also now fixed. Note that this second bug was impossible to run into because of the first one (we would never reach this code). Release note (bug fix): CockroachDB in 20.1.4 and 20.1.5 releases could finish UPSERT operation too early - namely, it would correctly insert only up to 10000 rows and ignoring the rest. Furthermore, an UPSERT with RETURNING clause in such scenario would return no rows (it would only process 10k rows but return 0 rows).
ccc244d
to
a1f667a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't see anything wrong here! 💯
Reviewed 2 of 3 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)
pkg/sql/tablewriter_upsert_opt.go, line 153 at r1 (raw file):
Previously, yuzefovich wrote…
We need it, but at different time - it is done at the beginning of
BatchedNext
(similar to how other table writers do it).
Oh, sorry, missed that diff. LGTM
TFTRs! |
In #51944 I fixed a bug but introduced an even worse one. The original
bug was that if an UPSERT has RETURNING clause, we could have returned
incorrect values because internally
resultCount
androwsUpserted
could be not synchronized. It was fixed by resetting
resultCount
,however, the resetting was done unconditionally. This is incorrect
because
resultCount
variable is used byupsertNode.BatchedNext
toindicate whether there is more work to do (and if
resultCount==0
, thenwe should finish). This bug would result in an UPSERT with or without
RETURNING clause of more than 10k rows actually process only 10k and
exit early. This is now fixed.
Relatedly, an UPSERT with RETURNING clause would incorrectly return no
rows when it was processing more than 10k rows.
Additionally, #51626 fixed a bug with pagination of UPSERTs which
exposed another bug when RETURNING clause is present - we were clearing
rowsUpserted
inBatchedNext
(as part offlushAndStartNewBatch
call), but that clear happens too early - we are accessing it after
BatchedNext
returns withBatchedValues
. This would lead to an indexout of bounds crush. Before #51626 there was no pagination done, so we
always had a single batch and
flushAndStartNewBatch
was never called(to reset the row container too early). This is also now fixed. Note
that this second bug was impossible to run into because of the first one
(we would never reach this code).
Fixes: https://github.com/cockroachlabs/support/issues/598
Fixes: #54456.
Informs: #54465.
Release note (bug fix): CockroachDB in 20.1.4 and 20.1.5 releases could
finish UPSERT operation too early - namely, it would correctly insert
only up to 10000 rows and ignoring the rest. Furthermore, an UPSERT with
RETURNING clause in such scenario would return no rows (it would only
process 10k rows but return 0 rows).