-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt/distsql: fix CTE semantics #31095
Comments
Clarifying from a conversation with @andy-kimball that it's important that the |
I think something got lost in translation:
Assuming the semantic check for point 2 is done properly, then as long as we only introduce spools for mutations then we don't have to care about CASE whatsoever. |
I think @andy-kimball's contention is that side-effecting (even without mutation) CTEs should be spooled. |
that's arbitrary and will hurt performance postgres does not do it |
hm, I don't think I realized that Postgres didn't do that:
|
I want to clarify my proposition here:
|
What about the case where a CTE is referenced multiple times?
In Postgres, this returns:
Which suggests the CTE is only executed once, meaning it spools. Are you absolutely sure that PG makes no guarantees about whether a CTE would spool? When I wrote the side-effect policy, you asked me to add CTE semantics to it, and so I added this:
If I understand your position, you're saying that that rule should only apply to mutations, but not to other kinds of side effects. |
As I wrote above in PG the spool is possible but not guaranteed, it can be introduced for optimisations but apps can't rely on it.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
Our side effect semantics are still tangled up. Say this is the query, and
Here, if we attempt to use a PG in this scenario does not trigger an error. If/when we support multiple CTE refs, we'll have to figure out how to handle this case, perhaps by using the |
Check out what happens in postgres:
As I have explained above, postgres refuses to hoist anything inside CASE to outside of it, and this includes both subqueries and CTEs. Independently from this observation, you can also confirm here that pg does not use a spool for the non-mutation CTE. |
Another note: I think the criteria in pg for blocking the propagation of something across CASE is not "it can produce errors" but instead "there is a subquery and its results cannot be determined reliably during planning". |
I think we're talking past each other. I don't know what PG calls it, but it's definitely caching/spooling the results of the CTE in that query. Take a look at this simplified query:
This only increments All I'm saying that we currently have no way in CockroachDB to duplicate those semantics when the CTE is nested within a |
Also, I think you're missing something about how the optimizer currently works. We never hoist up uncorrelated subqueries - we just leave them as-is. However, we're again unable to match PG semantics, because our subquery implementation always globalizes all subqueries, and executes them exactly once at the beginning of the query. Thus, it is SQL execution that "hoists" those queries, not the optimizer. This script shows the difference in behavior from PG, both in 2.0 and 2.1:
PG correctly returns |
Regarding your comment above in #31095 (comment) As I wrote before pg will introduce a spool in some cases as an optimizations when it decides that a sub-clause can be ran just once to avoid computing results multiple times. It will not do this because it feels semantically mandated to do so. Regarding your other comment #31095 (comment) I agree with your assessment "our current woes come from a limitation in the execution engine". The optimizer has to contend with the feature available in the back-end. However we need to make a strong case that the SQL execution machinery must be extended, and this case must be a semantic case to be compelling (and not just an optimization). |
To clarify: this is especially true when expressions may produce errors (e.g. 1/x). It's very clear throughout the pg docs and source code that potential errors do not influence optimizations. There may or may not be an analysis about the side effects of However I suspect the SQL standard has something to say about this, given that it defines SQL sequences. |
Moving to 19.2. |
@justinj is there anything else we want done here for 19.2? Do we want to re-instate the single-use inlining optimization to avoid regressions? |
Semantics here are ok now—I think the single-use optimization is probably important, I will put that on my list of things to do |
Single-use inlining was reinstated in #40673. Moving to 20.1 to track moving up all mutation WITHs to the top. |
The purpose of this issue is to summarize the expected semantics of CTEs and also document a proposed solution.
Today, we simply inline all CTEs. This is incorrect in the case where a CTE contains side effects. The heuristic planner partially solves this problem by introducting a
Spool
operator on top of any inlined CTE. This is still incorrect in the presence of CTEs which are never referenced in a query at all. Today, we error in such a case. This also prohibits referencing a given CTE's results more than once. In the short-term the optimizer will mimic this behaviour and simply insert a spool node (possibly eliminating the spool if no side effects are possible).The desired semantics of CTEs are this:
The proposed end-state is this:
opt and exec both introduce a
With
operator which has two inputs: avariable
input and an actualinput
input. It also contains a variable referenceref
. The semantics ofWith
are that thevariable
input is guaranteed to be run to completion before theinput
input begins running, and its results stored in some other place, referenced byref
.input
can then refer to those results as a table referenced byref
(ref
would be referenced ininput
by some other new operatorWithRef
).As an optimization, we can track the number of references to a given CTE, and if it has no side effects and exactly one reference we can inline it.
cc @andy-kimball @knz @jordanlewis
The text was updated successfully, but these errors were encountered: