Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql-queries: ERROR: could not decorrelate subquery since 20.2 #73573

Closed
fabiog1901 opened this issue Dec 7, 2021 · 3 comments · Fixed by #95234
Closed

sql-queries: ERROR: could not decorrelate subquery since 20.2 #73573

fabiog1901 opened this issue Dec 7, 2021 · 3 comments · Fixed by #95234
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@fabiog1901
Copy link
Contributor

fabiog1901 commented Dec 7, 2021

I'm running into a ERROR: could not decorrelate subquery when running the CTE which demonstrate an example on how you can re-write some stored procedures that use variables. These statements used to run up until 20.1, and return said error from 20.2. Where can I start please to rewrite the CTE, please?

create table transaction(
    transaction_uuid UUID PRIMARY KEY not null DEFAULT gen_random_uuid(),
    request_uuid UUID  not null DEFAULT gen_random_uuid() ,
    account_id STRING not null,
    debitCard_id STRING default null,
    transaction_am INT8 not null,
    transactionType_cd STRING not null,
    runningBalance_am INT8 not null default 0,
    reservation_uuid UUID default null,
    transactionMetaData_json JSONB default null,
    insert_tsz TIMESTAMPTZ not null default now()
);
CREATE INDEX ON transaction (request_uuid);
CREATE INDEX ON transaction (account_id);
create table account_balance(
    account_id STRING PRIMARY KEY not null,
    runningBalance_am INT8 not null default 0
);

-- LOAD some initial data
--
insert into account_balance select concat('A_', i::STRING), 100 from generate_series(1, 10) as i;
select * from account_balance order by 1;



-- the following withdraws $9... and updates account_balance and transaction marking it a valid txn…

WITH input_cte as (
   select column1 as aid, column2 as did, column3 as amt, column4 as ttype
   from (values('A_7', 'D_8', 9, 'DEBIT'))
),
update_acct as (
    UPDATE account_balance 
    SET runningBalance_am = 
       (select (CASE WHEN (runningBalance_am - (select amt from input_cte)) > 0 
                     THEN (select runningBalance_am - amt from input_cte) 
                     ELSE runningBalance_am 
                END) 
          from account_balance 
        where account_id= (select aid from input_cte))
        where account_id=(select aid from input_cte)
    returning (runningBalance_am)
)
    select
    aid,
    did,
    amt,
    ttype,
    runningBalance_am,
    (CASE WHEN (runningBalance_am - amt) >= 0 
            THEN '[{"VALID TXN}]' 
            ELSE '[{"BOGUS"}]' 
    END)
    from update_acct
    full outer join input_cte on (1=1)
;

Jira issue: CRDB-11651

@fabiog1901 fabiog1901 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team labels Dec 7, 2021
@ajwerner
Copy link
Contributor

ajwerner commented Dec 7, 2021

Is this a duplicate of #67951? Consider converting the CASE to an IF

@fabiog1901
Copy link
Contributor Author

Thanks Andrew, I tried as follows but it still errors out with the same message:

WITH input_cte AS (
   SELECT column1 AS aid, column2 AS did, column3 AS amt, column4 AS ttype
   FROM (values('A_7', 'A_8', 9, 'DEBIT'))
),
update_acct AS (
    UPDATE account_balance 
    SET runningBalance_am = (
            SELECT IF ((runningBalance_am - (SELECT amt FROM input_cte)) > 0, (SELECT runningBalance_am - amt FROM input_cte) , runningBalance_am) 
            FROM account_balance 
            WHERE account_id = (SELECT aid FROM input_cte)
        )
    WHERE account_id = (SELECT aid FROM input_cte)
    RETURNING (runningBalance_am)
)
    SELECT aid, did, amt, ttype, runningBalance_am, IF ((runningBalance_am - amt) >= 0, '[{"VALID TXN}]', '[{"BOGUS"}]' )
    FROM update_acct
    FULL OUTER JOIN input_cte ON (1=1)
;

@ajwerner
Copy link
Contributor

ajwerner commented Dec 8, 2021

Alright, well I'm out of my depth. Somebody from @cockroachdb/sql-queries will need to have a look.

@cucaroach cucaroach assigned cucaroach and mgartner and unassigned cucaroach Dec 14, 2021
@mgartner mgartner removed their assignment May 5, 2022
@msirek msirek self-assigned this May 14, 2022
@jlinder jlinder removed the sync-me-3 label May 27, 2022
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 13, 2023
Previously, the optimizer would error in rare cases when it was unable
to hoist correlated subqueries into apply-joins. Now, scalar, correlated
subqueries that aren't hoisted are executed successfully. There is
remaining work to apply the same method in this commit to `EXISTS` and
`<op> ANY` subqueries.

Hoisting correlated subqueries is not possible when a conditional
expression, like a `CASE`, wraps a subquery that is not leak-proof. One
of the effects of hoisting a subquery is that the subquery will be
unconditionally evaluated. For leak-proof subqueries, the worst case is
that unnecessary computation is performed. For non-leak-proof
subqueries, errors could originate from the subquery when it should have
never been evaluated because the corresponding conditional expression
was never true. So, in order to support these cases, we must be able to
execute a correlated subquery.

A correlated subquery can be thought of as a relational expression with
parameters that need to be filled in with constant value arguments for
each invocation. It is essentially a user-defined function with a single
statement in the function body. So, the `tree.RoutineExpr` machinery
that powers UDFs is easily repurposed to facilitate evaluation of
correlated subqueries.

Fixes cockroachdb#71908
Fixes cockroachdb#73573
Fixes cockroachdb#80169

Release note (sql change): Some queries which previously resulted in the
error "could not decorrelate subquery" now succeed.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 13, 2023
Previously, the optimizer would error in rare cases when it was unable
to hoist correlated subqueries into apply-joins. Now, scalar, correlated
subqueries that aren't hoisted are executed successfully. There is
remaining work to apply the same method in this commit to `EXISTS` and
`<op> ANY` subqueries.

Hoisting correlated subqueries is not possible when a conditional
expression, like a `CASE`, wraps a subquery that is not leak-proof. One
of the effects of hoisting a subquery is that the subquery will be
unconditionally evaluated. For leak-proof subqueries, the worst case is
that unnecessary computation is performed. For non-leak-proof
subqueries, errors could originate from the subquery when it should have
never been evaluated because the corresponding conditional expression
was never true. So, in order to support these cases, we must be able to
execute a correlated subquery.

A correlated subquery can be thought of as a relational expression with
parameters that need to be filled in with constant value arguments for
each invocation. It is essentially a user-defined function with a single
statement in the function body. So, the `tree.RoutineExpr` machinery
that powers UDFs is easily repurposed to facilitate evaluation of
correlated subqueries.

Fixes cockroachdb#71908
Fixes cockroachdb#73573
Fixes cockroachdb#80169

Release note (sql change): Some queries which previously resulted in the
error "could not decorrelate subquery" now succeed.
craig bot pushed a commit that referenced this issue Jan 18, 2023
94670: metrics: add tenant name to _status/vars output r=knz a=dhartunian

This commit adds a `tenant` prometheus label to each metrics output via the
`_status/vars` HTTP endpoint. The label is populated with either "system" or
the name of the tenant generating the metric. When queried from the system
tenant's HTTP port or handler, the result will now also contain metrics from
all in-process tenants on the same node.

When initializing a tenant, an optional "parent" metrics recorder is passed
down allowing the tenant's recorder to be registered with the parent. When we
query the parent it iterates over all child recorders (much like we already do
for stores) and outputs all of their metrics as well.

Example:
```
sql_txn_commit_count{tenant="system"} 0
sql_txn_commit_count{tenant="demo-tenant"} 0
```

Resolves: #94663
Epic: CRDB-18798

Release note (ops change): Metrics output via `_status/vars` now contain
`tenant` labels allowing the user to distinguish between metrics emitted by
the system tenant vs other app tenants identified by their IDs.

Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Aaditya Sondhi <[email protected]>

95234: sql: evaluate correlated subqueries as routines r=mgartner a=mgartner

#### opt: create tree.Routine planning closure in helper function

The logic for creating a planning closure for a `tree.Routine` has been
moved to a helper function so that it can be reused in future commits.

Release note: None

#### sql: evaluate correlated subqueries as routines

Previously, the optimizer would error in rare cases when it was unable
to hoist correlated subqueries into apply-joins. Now, scalar, correlated
subqueries that aren't hoisted are executed successfully. There is
remaining work to apply the same method in this commit to `EXISTS` and
`<op> ANY` subqueries.

Hoisting correlated subqueries is not possible when a conditional
expression, like a `CASE`, wraps a subquery that is not leak-proof. One
of the effects of hoisting a subquery is that the subquery will be
unconditionally evaluated. For leak-proof subqueries, the worst case is
that unnecessary computation is performed. For non-leak-proof
subqueries, errors could originate from the subquery when it should have
never been evaluated because the corresponding conditional expression
was never true. So, in order to support these cases, we must be able to
execute a correlated subquery.

A correlated subquery can be thought of as a relational expression with
parameters that need to be filled in with constant value arguments for
each invocation. It is essentially a user-defined function with a single
statement in the function body. So, the `tree.RoutineExpr` machinery
that powers UDFs is easily repurposed to facilitate evaluation of
correlated subqueries.

Fixes #71908
Fixes #73573
Fixes #80169

Release note (sql change): Some queries which previously resulted in the
error "could not decorrelate subquery" now succeed.


95432: kvserver,kvstorage: move InitEngine r=pavelkalinnikov a=tbg

I'm working on a datadriven test for `kvstorage` and I need to be able
to initialize the engine there.

There's more that should move, but I'm taking it one step at a time.
As we build up more and more sophisticated datadriven tests in
`kvstorage` we can move whatever we need when we need it.

PS: I looked at the unit test being modified by the move and it can't
yet be moved to kvstorage - it also bootstraps the initial ranges, etc,
requiring a lot more code movement until we can move it.

100% mechanical movement via Goland.

Touches #93310.

Epic: CRDB-220
Release note: None


Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
@craig craig bot closed this as completed in 61233f0 Jan 18, 2023
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants