sql: `UPDATE .. FROM` distinct-on should apply before the `LIMIT` is applied #89823

faizaanmadhani · 2022-10-12T15:20:41Z

Describe the problem

When an ORDER BY clause is present in an UPDATE ... FROM statement, all the rows that are expected to be updated are not updated.

For example, in this logic test, it would be expected that the UPDATE statement would update both rows (1, 10) and (2, 20) to (1, 100) and (2, 100) respectively as both rows join on the first two rows of the VALUES v(i) table, however only one of the rows are updated.

statement ok
CREATE TABLE t (k int primary key, a int);
INSERT INTO t VALUES (1, 10), (2, 20);

statement ok
UPDATE t SET a = 100 FROM (VALUES (1),(2),(1)) v(i) WHERE k = v.i ORDER BY k LIMIT 2;

query II
SELECT * FROM t;
----
1  100
2  20

Additional data

Here is the query plan for the statement.

query T
EXPLAIN (OPT, VERBOSE) UPDATE t SET a = 100 FROM (VALUES (1),(2),(1)) v(i) WHERE k = v.i ORDER BY k LIMIT 2;
----
update t
 ├── columns: <none>
 ├── fetch columns: k:5 a:6
 ├── passthrough columns: column1:9
 ├── update-mapping:
 │    └── a_new:10 => a:2
 ├── cardinality: [0 - 0]
 ├── volatile, mutations
 ├── stats: [rows=0]
 ├── cost: 18.658453
 ├── distribution: test
 └── project
      ├── columns: a_new:10 k:5 a:6 column1:9
      ├── cardinality: [0 - 2]
      ├── stats: [rows=1.6151]
      ├── cost: 18.648453
      ├── key: (5)
      ├── fd: ()-->(10), (5)-->(6,9), (5)==(9), (9)==(5)
      ├── distribution: test
      ├── prune: (5,6,9,10)
      ├── distinct-on
      │    ├── columns: k:5 a:6 column1:9
      │    ├── grouping columns: k:5
      │    ├── internal-ordering: +(5|9)
      │    ├── cardinality: [0 - 2]
      │    ├── stats: [rows=1.6151, distinct(5)=1.6151, null(5)=0]
      │    ├── cost: 18.606151
      │    ├── key: (5)
      │    ├── fd: (5)-->(6,9), (5)==(9), (9)==(5)
      │    ├── distribution: test
      │    ├── top-k
      │    │    ├── columns: k:5 a:6 column1:9
      │    │    ├── internal-ordering: +(5|9)
      │    │    ├── k: 2
      │    │    ├── cardinality: [0 - 2]
      │    │    ├── stats: [rows=2, distinct(5)=1.6151, null(5)=0]
      │    │    ├── cost: 18.51
      │    │    ├── fd: (5)-->(6), (5)==(9), (9)==(5)
      │    │    ├── ordering: +(5|9) [actual: +5]
      │    │    ├── distribution: test
      │    │    ├── prune: (6)
      │    │    ├── interesting orderings: (+5)
      │    │    └── inner-join (lookup t)
      │    │         ├── columns: k:5 a:6 column1:9
      │    │         ├── key columns: [9] = [5]
      │    │         ├── lookup columns are key
      │    │         ├── cardinality: [0 - 3]
      │    │         ├── stats: [rows=3, distinct(5)=2, null(5)=0, distinct(9)=2, null(9)=0]
      │    │         ├── cost: 18.32
      │    │         ├── fd: (5)-->(6), (5)==(9), (9)==(5)
      │    │         ├── distribution: test
      │    │         ├── prune: (6)
      │    │         ├── interesting orderings: (+5)
      │    │         ├── values
      │    │         │    ├── columns: column1:9
      │    │         │    ├── cardinality: [3 - 3]
      │    │         │    ├── stats: [rows=3, distinct(9)=2, null(9)=0]
      │    │         │    ├── cost: 0.04
      │    │         │    ├── distribution: test
      │    │         │    ├── prune: (9)
      │    │         │    ├── (1,)
      │    │         │    ├── (2,)
      │    │         │    └── (1,)
      │    │         └── filters (true)
      │    └── aggregations
      │         ├── first-agg [as=a:6, outer=(6)]
      │         │    └── a:6
      │         └── first-agg [as=column1:9, outer=(9)]
      │              └── column1:9
      └── projections
           └── 100 [as=a_new:10]

Jira issue: CRDB-20457

The text was updated successfully, but these errors were encountered:

mgartner · 2022-10-12T16:02:14Z

The problem here is that the distinct-on operator is applied after the limit operator. Postgres does not support ORDER BY .. LIMIT in UPDATE statements, so there is no pre-existing behavior we must follow. To me, the behavior illustrated above is unexpected, but I'm curious if anyone has an argument to keep the existing behavior, i.e., apply the limit before the distinct-on.

msirek · 2022-10-15T03:05:59Z

Maybe other database vendors disallow this syntax because it's confusing.
It's easy to misunderstand what the SQL is supposed to do. The UPDATE is just doing a join, like a SELECT would, but it's using the join to qualify rows to update.

An equivalent SELECT would be:
SELECT * FROM (VALUES (1),(2),(1)) v(i), t WHERE k = v.i ORDER BY k LIMIT 2;

Looking at this form, the ORDER BY clearly applies to the result of the join relation and the first two rows in the ordered join relation only contain the first row from table t. If the goal is to apply the LIMIT to distinct values of k instead of all join rows, this SQL would work using standard ANSI SQL:

UPDATE t SET a = 100 WHERE EXISTS(SELECT DISTINCT(k) FROM (VALUES (1),(2),(1)) v(i) WHERE k = v.i ORDER BY k LIMIT 2);

I wouldn't mind doing away with support for LIMIT and ORDER BY in an UPDATE FROM statement to force users to write SQL which is easier to understand. If we didn't have to worry about Postgres compatibility, we could get rid of UPDATE FROM altogether. I didn't see it mentioned in the ANSI SQL spec.

https://sqlserverfast.com/blog/hugo/2008/03/lets-deprecate-update-from/

mgartner · 2022-10-17T18:39:32Z

I don't think we'll be able to get rid of UPDATE .. FROM for PG compatibility and because it is sometimes a lot more ergonomic than using a subquery or another alternative. But I suspect that the confusing behavior between UDPATE .. FROM and ORDER BY .. LIMIT is one of the reasons that PG doesn't support ORDER BY .. LIMIT in UPDATE statements at all.

msirek · 2023-09-28T17:05:30Z

The only major database I could find which supports ORDER BY and LIMIT in an UPDATE statement is MySQL. And they don't support the FROM clause, meaning the ORDER BY can only be applied to the columns of the target table of the UPDATE. A statement like this fails because the ORDER BY column is not in scope:

UPDATE t1 set a = t1.a+1 WHERE t1.a IN (SELECT a FROM t2) ORDER BY t2.b LIMIT 1;

So, either we should remove support for ORDER BY ... LIMIT from UPDATE ... FROM, or, support it, but apply distinct-on before the LIMIT (as this issue requests), and further restrict the ORDER BY to reference only target table columns. Ordering by a FROM clause column doesn't seem to provide much value, and is non-standard, unless an example of another database vendor providing similar support can be found.

The original issue for supporting UPDATE ... FROM seems to be doing so for Postgres compatibility, so it may have just been an oversight to allow both UPDATE ... FROM and ORDER BY ... LIMIT in the same statement (as Postgres doesn't support ORDER BY ... LIMIT).

mgartner · 2023-10-05T01:56:21Z

I agree—we should disallow ORDER BY/LIMIT with UPDATE ... FROM. I have a vague memory of this also affecting DELETE ... USING, so we should check that too.

faizaanmadhani added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team labels Oct 12, 2022

mgartner changed the title ~~sql: UPDATE FROM ... ORDER BY ... returns unexpected output when an ORDER BY clause is present~~ sql: UPDATE .. FROM distinct-on should apply before the LIMIT is applied Oct 12, 2022

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023

mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: `UPDATE .. FROM` distinct-on should apply before the `LIMIT` is applied #89823

sql: `UPDATE .. FROM` distinct-on should apply before the `LIMIT` is applied #89823

faizaanmadhani commented Oct 12, 2022 •

edited by cockroach-jira-scripts

Loading

mgartner commented Oct 12, 2022

msirek commented Oct 15, 2022

mgartner commented Oct 17, 2022

msirek commented Sep 28, 2023

mgartner commented Oct 5, 2023

sql: UPDATE .. FROM distinct-on should apply before the LIMIT is applied #89823

sql: UPDATE .. FROM distinct-on should apply before the LIMIT is applied #89823

Comments

faizaanmadhani commented Oct 12, 2022 • edited by cockroach-jira-scripts Loading

mgartner commented Oct 12, 2022

msirek commented Oct 15, 2022

mgartner commented Oct 17, 2022

msirek commented Sep 28, 2023

mgartner commented Oct 5, 2023

sql: `UPDATE .. FROM` distinct-on should apply before the `LIMIT` is applied #89823

sql: `UPDATE .. FROM` distinct-on should apply before the `LIMIT` is applied #89823

faizaanmadhani commented Oct 12, 2022 •

edited by cockroach-jira-scripts

Loading