-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: Add optimizer support for UPSERT and INSERT..ON CONFLICT statements #33339
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, second and last commit LGTM modulo some nits and a missing release note.
There is a change in the left joiner exec code in the last commit which I do not understand - does it belong there? It probably merits explanation in the commit message.
Also I'd like to understand (preferably from the commit/PR description) what your view is on the transitory period where the two implementations of UPSERT have different behavior:
- different conflict handling
- different fast path
Which one is going to be the default? If the CBO becomes the default this needs to be communicated amply to beta users. If not we need to have a path to exercise it thoroughly.
Reviewed 10 of 10 files at r1, 8 of 8 files at r2, 26 of 27 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/plan.go, line 369 at r6 (raw file):
p.curPlan = planTop{AST: stmt.AST} //s := stmt.String()
nit: stray comments
pkg/sql/tablewriter_upsert_opt.go, line 214 at r6 (raw file):
rowIndex, ok = tu.ru.FetchColIDtoRowIndex[colID] if !ok { panic("no existing value is available for column")
return pgerror.NewAssertionErrorf(...)
(this will go to telemetry where we can pick it up without bothering users with dead nodes)
pkg/sql/logictest/testdata/logic_test/check_constraints, line 157 at r6 (raw file):
# only checks constraints if an insert or update actually occurs. In this case, # the DO NOTHING clause skips the update, so there is no need to check the # constraint.
-
Can you check whether pg does the same or different
-
In either case, document the behavior change in a release note.
pkg/sql/logictest/testdata/logic_test/computed, line 734 at r6 (raw file):
# Upsert -> update # NOTE: The CBO cannot show the name of the computed column in the error message
Please remove the explanatory comment from here and instead add it to #33345.
pkg/sql/logictest/testdata/logic_test/privileges_table, line 245 at r6 (raw file):
# Grant SELECT and INSERT privilege. # TODO(andyk): The SELECT privilege is required by the optimizer to fetch # existing columns. This matches PG behavior, but differs from planner behavior.
This explanatory comment is not needed because we'll fix #33346.
pkg/sql/logictest/testdata/logic_test/txn, line 942 at r6 (raw file):
UPDATE kv SET v = 'foo' # The planner uses INSERT, the optimizer uses UPSERT; accept both for now.
s/planner/heuristic planner/
pkg/sql/logictest/testdata/logic_test/upsert_opt, line 1 at r6 (raw file):
# LogicTest: local-opt fakedist-opt
Could you paste a diff of the two files in the commit message, so that our future selves can satisfy themselves nothing important was lost in the copy/paste.
pkg/sql/opt/exec/execbuilder/relational_builder.go, line 1198 at r6 (raw file):
// // TODO(andyk): Using ensureColumns here can result in an extra Render. // Upgrade execution engine to not require this.
Can you spell out the required change? Better yet spell it out in a linked issue in the TODO text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Mostly minor comments below...
Reviewed 10 of 10 files at r1, 8 of 8 files at r2, 3 of 3 files at r3, 9 of 9 files at r4, 16 of 16 files at r5, 27 of 27 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/opt_catalog.go, line 509 at r2 (raw file):
} // IsIndex is part of the cat.Index interface.
IsIndex -> IsUnique
pkg/sql/logictest/testdata/logic_test/upsert_opt, line 1 at r6 (raw file):
Previously, knz (kena) wrote…
Could you paste a diff of the two files in the commit message, so that our future selves can satisfy themselves nothing important was lost in the copy/paste.
It would also help to add a comment in this file explaining what's different
pkg/sql/opt/ops/statement.opt, line 127 at r5 (raw file):
# into a target table. If a conflicting row already exists, then Upsert will # instead update the existing row. The Upsert operator uses insertion, existing, # and new values constructed by the input to insert rows, update indexes, and to
I think this sentence is trying to do a bit too much... it would help to break this up and explain the difference between an "insertion" value and a "new" value.
pkg/sql/opt/optbuilder/insert.go, line 32 at r5 (raw file):
// excludedTableName is the name of a special Upsert data source containing // columns that could not be inserted due to a conflict with an existing row:
do you mean "rows that could not be inserted..."?
pkg/sql/opt/optbuilder/insert.go, line 474 at r5 (raw file):
// 2. Assign name to each column // 3. Add id of each column to the insertColList mb.insertColList = make(opt.ColList, cap(mb.targetColList))
Wasn't this already done at the top of the function?
pkg/sql/opt/optbuilder/insert.go, line 533 at r5 (raw file):
// filter that discards rows that have a conflict (by checking a not-null table // column to see if it was null-extended by the left join). See the comment // header for buildInsert for an example.
It would be good to specify Builder.buildInsert
to differentiate from mutationBuilder.buildInsert
pkg/sql/opt/optbuilder/insert.go, line 538 at r5 (raw file):
var conflictIndex cat.Index if len(onConflict.Columns) != 0 { // Check that the ON CONFLICT columns reference at most one target row.
This is a bit confusing - how can columns reference at most one row?
pkg/sql/opt/optbuilder/insert.go, line 556 at r5 (raw file):
// If conflict columns were explicitly specified, then only check for a // conflict one a single index. Otherwise, check on all indexes.
one -> on
pkg/sql/opt/optbuilder/insert.go, line 580 at r5 (raw file):
// Remember the column ID of a scan column that is not null. This will be // used to detect whether a conflict was detected for a row. notNullColID := scanScope.cols[findNotNullIndexCol(index)].id
would be good to explain why this always exists and no canary col is necessary
pkg/sql/opt/optbuilder/insert.go, line 602 at r5 (raw file):
mb.outScope.expr = mb.b.factory.ConstructProject( mb.b.factory.ConstructSelect( mb.b.factory.ConstructLeftJoin(
Why do you need one left join for each unique index? It seems like you could get away with a single left join, with the join conditions for different indexes OR-ed together.
pkg/sql/opt/optbuilder/insert.go, line 644 at r5 (raw file):
} // Build the right side of the left outer join. Include mutation columns
Seems like a lot of this code is duplicated from the function above. Is there any way to consolidate?
pkg/sql/opt/optbuilder/insert.go, line 735 at r5 (raw file):
if len(insertCols) != 0 { for _, name := range insertCols { if ord := cat.FindTableColumnByName(mb.tab, name); ord != -1 {
do you need to throw an error if the column is not found?
pkg/sql/opt/optbuilder/insert.go, line 796 at r5 (raw file):
} panic(builderError{errors.New( "there is no unique or exclusion constraint matching the ON CONFLICT specification")})
Is there a pgcode for this error (since you said Postgres also requires this)?
pkg/sql/opt/optbuilder/testdata/upsert, line 258 at r5 (raw file):
└── const: 1 [type=int] # Use RETURNING INSERT..ON CONFLICT as a FROM clause.
This example doesn't have ON CONFLICT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the default, I already have a note in the commit message:
The CBO Upsert support is disabled by default behind a feature flag:
experimental_optimizer_updates.
I will only make it the default once the equivalent fast path(s) are in place. In addition, I believe your separate PR has changed the conflict handling to be equivalent, so that shouldn't be a problem either.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/opt_catalog.go, line 509 at r2 (raw file):
Previously, rytaft wrote…
IsIndex -> IsUnique
Done.
pkg/sql/tablewriter_upsert_opt.go, line 214 at r6 (raw file):
Previously, knz (kena) wrote…
return pgerror.NewAssertionErrorf(...)
(this will go to telemetry where we can pick it up without bothering users with dead nodes)
Ah, nice. Done.
pkg/sql/logictest/testdata/logic_test/check_constraints, line 157 at r6 (raw file):
Previously, knz (kena) wrote…
Can you check whether pg does the same or different
In either case, document the behavior change in a release note.
PG behaves the same way that the HP does. Note that I expect this difference in behavior to be temporary, until I'm able to incorporate check constraints into the CBO. I added a bit to the comment:
# TODO(andyk): The CBO behaves differently than the HP and PG in this case. It
# only checks constraints if an insert or update actually occurs. In this case,
# the DO NOTHING clause skips the update, so there is no need to check the
# constraint. Once the CBO incorporates check constraints, this difference in
# behavior should go away.
Since the CBO is off-by-default, there's no need for a release note.
pkg/sql/logictest/testdata/logic_test/computed, line 734 at r6 (raw file):
Previously, knz (kena) wrote…
Please remove the explanatory comment from here and instead add it to #33345.
How would that issue address this difference in behavior between the CBO and HP? The CBO implements computed columns as projections, and they get evaluated as part of the larger SQL expression. Adding context fields to the pgerror won't have any effect on how computed columns work in the CBO.
pkg/sql/logictest/testdata/logic_test/txn, line 942 at r6 (raw file):
Previously, knz (kena) wrote…
s/planner/heuristic planner/
Done.
pkg/sql/opt/exec/execbuilder/relational_builder.go, line 1198 at r6 (raw file):
Previously, knz (kena) wrote…
Can you spell out the required change? Better yet spell it out in a linked issue in the TODO text.
I don't have more detail on the changes that will need to be made. The TODO is there to remind us to delve more deeply into this when we have time.
pkg/sql/opt/ops/statement.opt, line 127 at r5 (raw file):
Previously, rytaft wrote…
I think this sentence is trying to do a bit too much... it would help to break this up and explain the difference between an "insertion" value and a "new" value.
I got rid of the sentence altogether. There is a ton of other information already in MutationPrivate
comments, as well as Builder.buildInsert
comments.
pkg/sql/opt/optbuilder/insert.go, line 32 at r5 (raw file):
Previously, rytaft wrote…
do you mean "rows that could not be inserted..."?
I rewrote the sentence.
pkg/sql/opt/optbuilder/insert.go, line 474 at r5 (raw file):
Previously, rytaft wrote…
Wasn't this already done at the top of the function?
Good catch. Removed.
pkg/sql/opt/optbuilder/insert.go, line 533 at r5 (raw file):
Previously, rytaft wrote…
It would be good to specify
Builder.buildInsert
to differentiate frommutationBuilder.buildInsert
Done.
pkg/sql/opt/optbuilder/insert.go, line 538 at r5 (raw file):
Previously, rytaft wrote…
This is a bit confusing - how can columns reference at most one row?
Added a bit of extra text to clarify.
pkg/sql/opt/optbuilder/insert.go, line 556 at r5 (raw file):
Previously, rytaft wrote…
one -> on
Done.
pkg/sql/opt/optbuilder/insert.go, line 580 at r5 (raw file):
Previously, rytaft wrote…
would be good to explain why this always exists and no canary col is necessary
Done.
pkg/sql/opt/optbuilder/insert.go, line 602 at r5 (raw file):
Previously, rytaft wrote…
Why do you need one left join for each unique index? It seems like you could get away with a single left join, with the join conditions for different indexes OR-ed together.
It's possible to create a big predicate that does all the analysis, but then we'd select a bad plan for it. We'd end up just doing a primary index scan, and evaluate the big predicate on each row. Instead, we need to do a series of lookup joins (that's what the current Upsert code does).
pkg/sql/opt/optbuilder/insert.go, line 644 at r5 (raw file):
Previously, rytaft wrote…
Seems like a lot of this code is duplicated from the function above. Is there any way to consolidate?
There are definitely similarities, but there are also important differences that make it painful to consolidate. This is what I arrived at after trying out various ways of consolidating. I don't know how to make it better without making it harder to understand.
pkg/sql/opt/optbuilder/insert.go, line 735 at r5 (raw file):
Previously, rytaft wrote…
do you need to throw an error if the column is not found?
They should always exist, because the existence of insertCols
has already been checked. I changed the code to get rid of the unnecessary if
and added a comment.
pkg/sql/opt/optbuilder/insert.go, line 796 at r5 (raw file):
Previously, rytaft wrote…
Is there a pgcode for this error (since you said Postgres also requires this)?
Possibly, but I'm trying to match the behavior of the existing code as closely as possible, and it doesn't use a PG error code.
pkg/sql/opt/optbuilder/testdata/upsert, line 258 at r5 (raw file):
Previously, rytaft wrote…
This example doesn't have ON CONFLICT
Done.
pkg/sql/logictest/testdata/logic_test/privileges_table, line 245 at r6 (raw file):
Previously, knz (kena) wrote…
This explanatory comment is not needed because we'll fix #33346.
Done.
pkg/sql/logictest/testdata/logic_test/upsert_opt, line 1 at r6 (raw file):
Previously, rytaft wrote…
It would also help to add a comment in this file explaining what's different
The main reason for adding this file was due to differences in conflict handling between the CBO and HP, specifically in the case where the same row is inserted/updated multiple times in the same statement. However, after discussion, Raphael and I have agreed that we should not make any semantic guarantees in that case, and should instead require users to use DISTINCT ON
to remove any duplicates:
Therefore, I've updated the upsert
logictest to reflect this decision, and I've removed upsert_opt
altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a release note should also mention the existence of this flag (I don't recall seeing such a release note earlier in the first PR where you added it).
Reviewed 3 of 62 files at r7, 3 of 25 files at r12.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/logictest/testdata/logic_test/check_constraints, line 157 at r6 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
PG behaves the same way that the HP does. Note that I expect this difference in behavior to be temporary, until I'm able to incorporate check constraints into the CBO. I added a bit to the comment:
# TODO(andyk): The CBO behaves differently than the HP and PG in this case. It # only checks constraints if an insert or update actually occurs. In this case, # the DO NOTHING clause skips the update, so there is no need to check the # constraint. Once the CBO incorporates check constraints, this difference in # behavior should go away.
Since the CBO is off-by-default, there's no need for a release note.
Yeah so the situation with release notes is that if you don't write something about this now, nobody will notice the change in behavior later when we switch the default. We're not creating release notes out of thin air about all the behavior that becomes different when a flag is switched. Someone has to write it, and I don't see any record of this behavior change anywhere for the future release note writer to pick up.
My recommendation would be for a release note that says "when the experimental flag is set, the behavior is ..."
pkg/sql/logictest/testdata/logic_test/computed, line 734 at r6 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
How would that issue address this difference in behavior between the CBO and HP? The CBO implements computed columns as projections, and they get evaluated as part of the larger SQL expression. Adding context fields to the pgerror won't have any effect on how computed columns work in the CBO.
-
I read your response to my review comment as an additional requirement on how we're going to produce meaningful contextual strings. It's a non-trivial problem but it's solvable.
-
no matter the outcome of that discussion, I don't think your comment in the test file here is doing any favor to future readers of this test. How does it help understand the test in any way?
Your comment is really a rationale for the change in the PR, not suitable documentation for the end result -- as such it should be part of the commit message, if anything.
pkg/sql/opt/exec/execbuilder/relational_builder.go, line 1198 at r6 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I don't have more detail on the changes that will need to be made. The TODO is there to remind us to delve more deeply into this when we have time.
acknowledged.
This return value is not ever used by callers, so this commit removes it. Release note: None
The INSERT..ON CONFLICT statement needs to know if an index was declared as UNIQUE by the user. This commit adds an IsUnique() method to the cat.Index interface that exposes that information. It also adds a cat.FindTableColumnByName method, which will also be used by upsert. Release note: None
Move "ForInsert" and "ForUpdate" methods from mutationBuilder to the insert.go and update.go files. Leave shared utility methods in the mutation_builder.go file. This sets the pattern for locating Upsert and Delete specific files as well. Release note: None
Use the word "alias" rather than "label" more consistently, in order to match the new Metadata terminology. Release note: None
This commit adds support for three additional mutation operator cases: 1. UPSERT statement 2. INSERT..ON CONFLICT DO NOTHING 3. INSERT..ON CONFLICT..DO UPDATE Each of these has overlapping functionality with INSERT, UPDATE, and with each other, but also have important differences. Similar to the other mutation operators, the Upsert operator accepts a relational input expression that projects columns that will be used as part of executing the mutation: 1. Columns containing existing values fetched from the target table and used to detect conflicts and to formulate the key/value update commands. 2. Columns containing updated values to set when a conflict is detected, as specified by the user. 3. Computed columns which will be updated when a conflict is detected and that are dependent on one or more updated columns. For example, if this is the schema and INSERT..ON CONFLICT statement: CREATE TABLE abc (a INT PRIMARY KEY, b INT, c INT) INSERT INTO abc VALUES (1, 2) ON CONFLICT (a) DO UPDATE SET b=10 Then an input expression equivalent to this would be built: SELECT ins_a, ins_b, ins_c, fetch_a, fetch_b, fetch_c, 10 AS upd_b FROM (VALUES (1, 2, NULL)) AS ins(ins_a, ins_b, ins_c) LEFT OUTER JOIN abc AS fetch(fetch_a, fetch_b, fetch_c) ON ins_a = fetch_a At runtime, the Upsert execution operator will test the fetch_a column. If it is null, then there is no existing row, so the operator inserts a new row using the (ins_a, ins_b, ins_c) columns. Otherwise, the operator formulates an update using the (upd_b) column, along with any needed fetch columns. The CBO Upsert support is disabled by default behind a feature flag: experimental_optimizer_updates. Release note (sql change): The new experimental_optimizer_updates feature flag controls whether updates and upserts are planned by the cost-based optimizer rather than the heuristic planner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 62 files at r7, 3 of 8 files at r8, 9 of 9 files at r10, 15 of 16 files at r11, 22 of 25 files at r12.
Reviewable status: complete! 1 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 62 files at r7, 3 of 25 files at r12.
Reviewable status: complete! 1 of 0 LGTMs obtained
Build the upsertNode for the CBO Upsert operator. Also, create a new upsert tableWriter that is custom built for the CBO. The CBO can use a much simpler upserter because it incorporates conflict detection, update and computed column evaluation, and other upsert operations into the input query, rather than requiring the upserter to do it. Release note (sql change): when the experimental_optimizer_updates feature flag is set, check constraints are not checked for rows skipped by the INSERT..DO NOTHING clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I added a release note, though I question whether we should publish it, given that this flag will be removed before our final 2.2 release.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/logictest/testdata/logic_test/check_constraints, line 157 at r6 (raw file):
Previously, knz (kena) wrote…
Yeah so the situation with release notes is that if you don't write something about this now, nobody will notice the change in behavior later when we switch the default. We're not creating release notes out of thin air about all the behavior that becomes different when a flag is switched. Someone has to write it, and I don't see any record of this behavior change anywhere for the future release note writer to pick up.
My recommendation would be for a release note that says "when the experimental flag is set, the behavior is ..."
Done
pkg/sql/logictest/testdata/logic_test/computed, line 734 at r6 (raw file):
Previously, knz (kena) wrote…
I read your response to my review comment as an additional requirement on how we're going to produce meaningful contextual strings. It's a non-trivial problem but it's solvable.
no matter the outcome of that discussion, I don't think your comment in the test file here is doing any favor to future readers of this test. How does it help understand the test in any way?
Your comment is really a rationale for the change in the PR, not suitable documentation for the end result -- as such it should be part of the commit message, if anything.
I found it useful, but I don't really care much either way, so I removed it.
bors r+ |
Build failed |
bors r+ |
33339: opt: Add optimizer support for UPSERT and INSERT..ON CONFLICT statements r=andy-kimball a=andy-kimball This commit adds support for three additional mutation operator cases: 1. UPSERT statement 2. INSERT..ON CONFLICT DO NOTHING 3. INSERT..ON CONFLICT..DO UPDATE Each of these has overlapping functionality with INSERT, UPDATE, and with each other, but also have important differences. Similar to the other mutation operators, the Upsert operator accepts a relational input expression that projects columns that will be used as part of executing the mutation: 1. Columns containing existing values fetched from the target table and used to detect conflicts and to formulate the key/value update commands. 2. Columns containing updated values to set when a conflict is detected, as specified by the user. 3. Computed columns which will be updated when a conflict is detected and that are dependent on one or more updated columns. For example, if this is the schema and INSERT..ON CONFLICT statement: CREATE TABLE abc (a INT PRIMARY KEY, b INT, c INT) INSERT INTO abc VALUES (1, 2) ON CONFLICT (a) DO UPDATE SET b=10 Then an input expression equivalent to this would be built: SELECT ins_a, ins_b, ins_c, fetch_a, fetch_b, fetch_c, 10 AS upd_b FROM (VALUES (1, 2, NULL)) AS ins(ins_a, ins_b, ins_c) LEFT OUTER JOIN abc AS fetch(fetch_a, fetch_b, fetch_c) ON ins_a = fetch_a At runtime, the Upsert execution operator will test the fetch_a column. If it is null, then there is no existing row, so the operator inserts a new row using the (ins_a, ins_b, ins_c) columns. Otherwise, the operator formulates an update using the (upd_b) column, along with any needed fetch columns. The CBO Upsert support is disabled by default behind a feature flag: experimental_optimizer_updates. Co-authored-by: Andrew Kimball <[email protected]>
Build succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't get a chance to review this in a timely manner. This is fantastic work! I am sure it was a huge effort to get all these details in place.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
A bit more work to address #3998. Summary of changes: - Edit the 'Supported statements' section of the ['Cost-based Optimizer'][1] page to add: - Add FILTER per cockroachdb/cockroach#34077 - Add DELETE per cockroachdb/cockroach#34522 - Add `INSERT .. ON CONFLICT` variants per cockroachdb/cockroach#33339 [1]: http://www.cockroachlabs.com/docs/v2.2/cost-based-optimizer.html
A bit more work to address #3998. Summary of changes: - Edit the 'Supported statements' section of the ['Cost-based Optimizer'][1] page as follows: - Add FILTER per cockroachdb/cockroach#34077 - Add DELETE per cockroachdb/cockroach#34522 - Add `INSERT .. ON CONFLICT` variants per cockroachdb/cockroach#33339 - Remove `experimental_optimizer_updates` cluster setting (can't find a commit for this, but I don't see it in `SHOW ALL` output on my local build of yesterday's `master`, version number is `v2.2.0-alpha.20181217-1096-gd104dcee69-dirty`. [1]: http://www.cockroachlabs.com/docs/v2.2/cost-based-optimizer.html
A bit more work to address #3998. Summary of changes: - Edit the 'Supported statements' section of the ['Cost-based Optimizer'][1] page as follows: - Add DELETE per cockroachdb/cockroach#34522 - Add `INSERT .. ON CONFLICT` variants per cockroachdb/cockroach#33339 - Add FILTER clause on aggregate functions per cockroachdb/cockroach#34077 - Remove `experimental_optimizer_updates` cluster setting (can't find a commit for this, but I don't see it in `SHOW ALL` output on my local build of yesterday's `master`, version number is `v2.2.0-alpha.20181217-1096-gd104dcee69-dirty`. [1]: http://www.cockroachlabs.com/docs/v2.2/cost-based-optimizer.html
A bit more work to address #3998. Summary of changes: - Edit the 'Supported statements' section of the ['Cost-based Optimizer'][1] page as follows: - Add DELETE per cockroachdb/cockroach#34522 - Add `INSERT .. ON CONFLICT` variants per cockroachdb/cockroach#33339 - Add `SELECT`, `VALUES`, and `UNION` statements that do not include window functions - Add FILTER clause on aggregate functions per cockroachdb/cockroach#34077 - Remove `experimental_optimizer_updates` cluster setting [1]: http://www.cockroachlabs.com/docs/v2.2/cost-based-optimizer.html
A bit more work to address #3998. Summary of changes: - Edit the 'Supported statements' section of the ['Cost-based Optimizer'][1] page as follows: - Add DELETE per cockroachdb/cockroach#34522 - Add `INSERT .. ON CONFLICT` variants per cockroachdb/cockroach#33339 - Add `SELECT`, `VALUES`, and `UNION` statements that do not include window functions - Add FILTER clause on aggregate functions per cockroachdb/cockroach#34077 - Remove `experimental_optimizer_updates` cluster setting [1]: http://www.cockroachlabs.com/docs/v2.2/cost-based-optimizer.html
51581: opt: improve error message for failing to force a partial index r=mgartner a=mgartner This commit expands the error message when a user tries to force a partial index that cannot satisfy the query. This error is emitted when the query filter does not imply the partial index predicate. Release note: None 51608: sql: fix pagination in UPSERT r=yuzefovich a=yuzefovich **sql: minor cleanup around upsert** This commit removes a couple of duplicated "init" calls as well as some unused parameters around upsert code. Release note: None **sql: fix pagination in UPSERT** `optTableUpserter` was incorrectly overriding `curBatchSize` method by returning `insertRows.Len`, but that container is not actually used anywhere. As a result, `curBatchSize` was always considered 0, and we didn't perform the pagination on the UPSERTs. The bug was introduced in #33339 (in 19.1.0). Fixes: #51391. Release note (bug fix): Previously, CockroachDB could hit a "command is too large" error when performing UPSERT operation with many values. Internally, we attempt to perform such operation by splitting it into "batches", but the batching mechanism was broken. Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
This commit adds support for three additional mutation operator cases:
Each of these has overlapping functionality with INSERT, UPDATE, and
with each other, but also have important differences. Similar to the
other mutation operators, the Upsert operator accepts a relational
input expression that projects columns that will be used as part of
executing the mutation:
Columns containing existing values fetched from the target table
and used to detect conflicts and to formulate the key/value update
commands.
Columns containing updated values to set when a conflict is
detected, as specified by the user.
Computed columns which will be updated when a conflict is detected
and that are dependent on one or more updated columns.
For example, if this is the schema and INSERT..ON CONFLICT statement:
CREATE TABLE abc (a INT PRIMARY KEY, b INT, c INT)
INSERT INTO abc VALUES (1, 2) ON CONFLICT (a) DO UPDATE SET b=10
Then an input expression equivalent to this would be built:
SELECT ins_a, ins_b, ins_c, fetch_a, fetch_b, fetch_c, 10 AS upd_b
FROM (VALUES (1, 2, NULL)) AS ins(ins_a, ins_b, ins_c)
LEFT OUTER JOIN abc AS fetch(fetch_a, fetch_b, fetch_c)
ON ins_a = fetch_a
At runtime, the Upsert execution operator will test the fetch_a column.
If it is null, then there is no existing row, so the operator inserts a
new row using the (ins_a, ins_b, ins_c) columns. Otherwise, the operator
formulates an update using the (upd_b) column, along with any needed
fetch columns.
The CBO Upsert support is disabled by default behind a feature flag:
experimental_optimizer_updates.