-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: fix handling of write-only columns during update #46285
Conversation
This is mostly an optimizer change, but thought that others who commented on the original issue might want to review the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Will make another pass over it, but looks great so far!
[nit] can you add some information about the new trading queries in the commit message (or separate them in a second commit with a description)? It can just be a copy of the top-level file comment you have already. It will be nice to have a bit of information of them in the git history.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @andy-kimball, @jordanlewis, @knz, and @RaduBerinde)
pkg/sql/opt/exec/execbuilder/testdata/schema_change_in_txn, line 26 at r1 (raw file):
ALTER TABLE t ADD COLUMN g INT NOT NULL DEFAULT(15) query T kvtrace(prefix=/Table/53/)
[nit] add a comment explaining that we expect default values for b,c, NULL value for e, and zero value for d.
pkg/sql/opt/optbuilder/testdata/update, line 1329 at r1 (raw file):
# ------------------------------------------------------------------------------ # Test update that doesn't require mutation column to be recalculated.
This comment doesn't look right anymore (and the one below)
pkg/sql/opt/optbuilder/testdata/upsert, line 1175 at r1 (raw file):
└── upsert_a:13 > 0 [as=check2:15] # Don't directly update mutation columns. However, computed columns do need to
Is this comment still accurate?
pkg/sql/sem/tree/datum.go, line 3901 at r1 (raw file):
return bitArrayZero, nil default: panic(fmt.Sprintf("unhandled type %v", t.SQLString()))
[nit] should this be an error given that we can return one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split the PR into two commits.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, and @knz)
pkg/sql/opt/exec/execbuilder/testdata/schema_change_in_txn, line 26 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] add a comment explaining that we expect default values for b,c, NULL value for e, and zero value for d.
Done.
pkg/sql/opt/optbuilder/testdata/update, line 1329 at r1 (raw file):
Previously, RaduBerinde wrote…
This comment doesn't look right anymore (and the one below)
Done.
pkg/sql/opt/optbuilder/testdata/upsert, line 1175 at r1 (raw file):
Previously, RaduBerinde wrote…
Is this comment still accurate?
Done.
pkg/sql/sem/tree/datum.go, line 3901 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] should this be an error given that we can return one
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default datum values . Didn't look at anything else.
Reviewed 1 of 14 files at r1, 1 of 6 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, and @knz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add some comments before the buildScan
calls in buildInputForUpsert
and buildInputForUpdate
explaining why we are reading mutation values, and pointing out that we aren't (and shouldn't be) using them to calculate any "new" values.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, and @knz)
pkg/sql/opt/optbuilder/mutation_builder.go, line 509 at r2 (raw file):
// 4. If column is currently being added or dropped (i.e. a mutation column), // use a default value (0 for INT column, "" for STRING column, etc). //
[nit] add an explanation of why it's not ok to use the fetched values (as we've been doing so far).
pkg/sql/opt/xform/testdata/external/trading, line 544 at r3 (raw file):
# Find all cards that have been modified in the last 5 seconds. # # Problems:
These queries and analysis are very cool!
pkg/sql/opt/xform/testdata/external/trading-mutation, line 756 at r3 (raw file):
# # Problems: # 1. This query will not compile, due to #46180.
This should be fixed now.
A column in the write-only state has either been recently added or dropped from a table. If recently added, then it's currently being backfilled, and therefore may still have a NULL value. If NULL, then it's not valid to use this value during an UPDATE or UPSERT. However, this is what the previous code was doing; when a column value was updated, it would mistakenly incorporate the NULL value into the update. The fix is to *always* write the default or computed value of a column during an UPDATE or UPSERT of a write-only column. Here are the rules: 1. If column is computed, evaluate that expression as its value. 2. If column has a default value specified for it, use that as its value. 3. If column is nullable, use NULL as its value. 4. If column is currently being added or dropped (i.e. a mutation column), use a default value (0 for INT column, "" for STRING column, etc). One side effect of doing this is that NOT NULL columns can now be dropped without triggering null constraint violations when other transactions try to insert rows during the dropping process. Also, other transactions can observe the default values if they SELECT a column that is dropped at just the right time. Fixes cockroachdb#42459 Release justification: Fix for high-severity bug in existing functionality. This bug can result in NULL values being written to NOT NULL columns. It can also trigger unexpected errors during INSERT/UPDATE/UPSERT statements when schema changes are taking place. Release note (sql change): Columns in the process of being added or removed to a table are now always set to their default or computed value if another transaction concurrently inserts, updates, or upserts a row. This fixes an issue where a column being backfilled would not get properly set by concurrent transactions.
Add a new external test suite that is based on a business buying/selling online trading cards. There are two files. One lists the SELECT and mutation query plans when there are no schema changes running concurrently. The other shows how the query plans change when there are one or more column adds/drops running concurrently. The purpose of this is to ensure that in-progress schema changes do not materially change the plans. Release justification: Non-production code change. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments to buildScan
. We actually need those values for the Delete
case as well. The execution engine panics if they're not available, even for Delete
.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @jordanlewis and @knz)
pkg/sql/opt/optbuilder/mutation_builder.go, line 509 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] add an explanation of why it's not ok to use the fetched values (as we've been doing so far).
Done
pkg/sql/opt/xform/testdata/external/trading-mutation, line 756 at r3 (raw file):
Previously, RaduBerinde wrote…
This should be fixed now.
Done.
bors r+ |
Build succeeded |
pkg/sql/descriptor_mutation_test.go, line 806 at r1 (raw file):
Why did you change this to "a" (here and below)? (Sorry I forgot to publish this earlier) |
A column in the write-only state has either been recently added or dropped
from a table. If recently added, then it's currently being backfilled, and
therefore may still have a NULL value. If NULL, then it's not valid to use
this value during an UPDATE or UPSERT. However, this is what the previous
code was doing; when a column value was updated, it would mistakenly
incorporate the NULL value into the update.
The fix is to always write the default or computed value of a column
during an UPDATE or UPSERT of a write-only column. Here are the rules:
use a default value (0 for INT column, "" for STRING column, etc).
One side effect of doing this is that NOT NULL columns can now be dropped
without triggering null constraint violations when other transactions try
to insert rows during the dropping process. Also, other transactions can
observe the default values if they SELECT a column that is dropped at just
the right time.
Fixes #42459
Release justification: Fix for high-severity bug in existing functionality.
This bug can result in NULL values being written to NOT NULL columns. It
can also trigger unexpected errors during INSERT/UPDATE/UPSERT statements
when schema changes are taking place.
Release note (sql change): Columns in the process of being added or removed
to a table are now always set to their default or computed value if another
transaction concurrently inserts, updates, or upserts a row. This fixes an
issue where a column being backfilled would not get properly set by
concurrent transactions.