-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFCS: SQL query planning #19135
RFCS: SQL query planning #19135
Conversation
0ee7a09
to
3c1dee4
Compare
Cc @albler |
Review status: 0 of 1 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful. docs/RFCS/20171008_sql_optimizer.md, line 129 at r1 (raw file):
[nit] latter docs/RFCS/20171008_sql_optimizer.md, line 155 at r1 (raw file):
One important question here is if nodes that are equivalent in terms of results but have different physical properties (e.g. ordering) are "equivalent" nodes. If yes, this caveat of "equivalent" needs to be cleared up. This also introduces some difficulties during the search: if nodes were truly equivalent, we could always use the lowest-cost node in each MEMO group (e.g. when applying transformations). But if we can exploit different orderings we may need to go through multiple nodes per MEMO group. docs/RFCS/20171008_sql_optimizer.md, line 155 at r1 (raw file):
I think this description overloads "logical node" with a couple of different meanings (or needs to be clarified). Perhaps "each group corresponds to a logical node in the query plan" and "each group is represented by ..". It would also help if we had a more clear definition of what's a physical node and how it differs from a logical node. Comments from Reviewable |
3c1dee4
to
66343e3
Compare
Review status: 0 of 1 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful. docs/RFCS/20171008_sql_optimizer.md, line 129 at r1 (raw file): Previously, RaduBerinde wrote…
Done. docs/RFCS/20171008_sql_optimizer.md, line 155 at r1 (raw file): Previously, RaduBerinde wrote…
I believe the equivalency groups are based on the logical properties. For example, the same equivalency group will hold The Memo structure deserves its own RFC and, prior to that, more experimentation. docs/RFCS/20171008_sql_optimizer.md, line 155 at r1 (raw file): Previously, RaduBerinde wrote…
Adjusted the language here. I agree I was overloading "logical node". Comments from Reviewable |
I understand this document! Yay 🎆 Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. docs/RFCS/20171008_sql_optimizer.md, line 15 at r2 (raw file):
Technically, transforming a SQL query plan into a better plan. docs/RFCS/20171008_sql_optimizer.md, line 115 at r2 (raw file):
Define "attribute". docs/RFCS/20171008_sql_optimizer.md, line 121 at r2 (raw file):
Example needed here. Please clarify the bitmap story on docs/RFCS/20171008_sql_optimizer.md, line 157 at r2 (raw file):
nit: equivalency docs/RFCS/20171008_sql_optimizer.md, line 240 at r2 (raw file):
You should be talking about pruning in this section or below in search. Comments from Reviewable |
66343e3
to
661477b
Compare
Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. docs/RFCS/20171008_sql_optimizer.md, line 15 at r2 (raw file): Previously, knz (kena) wrote…
Well, it does output a physical query plan while the input might not be directly executable (i.e. not a plan at all). docs/RFCS/20171008_sql_optimizer.md, line 115 at r2 (raw file): Previously, knz (kena) wrote…
I've removed usage of the term attribute. It is used in literature I've been reading, but we use the term column or variable. docs/RFCS/20171008_sql_optimizer.md, line 121 at r2 (raw file): Previously, knz (kena) wrote…
That's a good question. My thinking is muddled. I need to work through a couple of examples. For the purposes of this RFC I'm going to wave my hands wildly and note that a full RFC on only this topic is merited. docs/RFCS/20171008_sql_optimizer.md, line 157 at r2 (raw file): Previously, knz (kena) wrote…
Done. docs/RFCS/20171008_sql_optimizer.md, line 240 at r2 (raw file): Previously, knz (kena) wrote…
Roger. Added a paragraph in Search. Note that Rewrite is a separate phase that occurs before Search. Comments from Reviewable |
Reviewed 1 of 1 files at r3. docs/RFCS/20171008_sql_optimizer.md, line 15 at r2 (raw file): Previously, petermattis (Peter Mattis) wrote…
This is a nit really, but the very notion of optimization implies that the optimization logic as a whole can be disabled and the rest still be functional. I think if you want to make a fuller picture you can rename the entire RFC as "SQL Query planning" and then outline that the iteration of rewrite and search constitutes what we can call "optimization". docs/RFCS/20171008_sql_optimizer.md, line 115 at r2 (raw file): Previously, petermattis (Peter Mattis) wrote…
Ack. docs/RFCS/20171008_sql_optimizer.md, line 240 at r2 (raw file): Previously, petermattis (Peter Mattis) wrote…
I think you skipped a beat in the music. Search and Rewrite are coroutines. You can't pre-populate all the alternatives in Rewrite upfront, there are simply too many (hundreds even with just the few rewrite rules we know of already, more realistically thousands as you mentioned already in writing). Instead Search guides the generation of alternatives, each generated by the application of Rewrite, by avoiding the use of rewrite rules in some cases, and discarding previously rewritten alternatives in other cases. Rewrite does not precede Search, it is subjugated to it. Comments from Reviewable |
Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful. docs/RFCS/20171008_sql_optimizer.md, line 15 at r2 (raw file): Previously, knz (kena) wrote…
Ok, I've adjusted per this suggestion. docs/RFCS/20171008_sql_optimizer.md, line 240 at r2 (raw file): Previously, knz (kena) wrote…
No beat skipped, we're at odds on terminology. My understanding is that between Prep and Search, there is a second phase named Rewrite where unconditional transformations are performed. These unconditional transformations are not costed or explored, but always applied as they are always beneficial. De-correlation and predicate push-down are the two transformations I'm aware of that fall into this category. I need to go back at look at the papers to see if there are other transformations to include here. Search iteratively applies transforms, costs the resulting plans, and prunes (or ignores) low cost plans. So, in my usage of the terminology (which is trying to match our recent learnings), Rewrite is independent of Search, though both phases apply transforms. To reiterate, my understanding of the distinction between Rewrite and Search is that Rewrite doesn't bother to keep the alternatives around, or even to cost them, because the transformations it applies always produce better plans. It is an open question as to whether Rewrite should operate on top of Memo. It certainly doesn't require it. Comments from Reviewable |
661477b
to
41bc813
Compare
Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful. docs/RFCS/20171008_sql_optimizer.md, line 240 at r2 (raw file): Previously, petermattis (Peter Mattis) wrote…
Okay maybe the fact that both rewrite and search use rewrite rules, which we'll call "transforms", should be outlined in the text. Comments from Reviewable |
41bc813
to
afbfaf0
Compare
Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending. docs/RFCS/20171008_sql_optimizer.md, line 240 at r2 (raw file): Previously, knz (kena) wrote…
Ok. Reworded the first paragraph of the Rewrite section to make this clear. Small clarification: the transform rules used by Rewrite are not the same as those used by Search. I think there will be some overlap, but most of the transforms used by Search will not be used by Rewrite. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. docs/RFCS/20171008_sql_optimizer.md, line 121 at r2 (raw file): Previously, petermattis (Peter Mattis) wrote…
My thinking is still muddled, but slightly clearer. We want to model the expression nodes using the relational algebra operators. Each node defines a relation where a relation is a set of attribute names (i.e. column names). In your
The part I'm still muddled about is what happens if we perform a selection on the union:
Now it looks like we can push the selection through the union. I think what is missing here is a
Now if we want to push the selection down through the union, we have to substitute
Once again, I'm going to wave my hands wildly. I see the general outline of how this would work, but the devil is in the details and those are still obscure. Getting those details right will require a full RFC and lots of experimentation. Let's move this discussion to a better forum (e.g. https://github.com/petermattis/opttoy). PS Apologies for falling back on the relation/attribute terminology which might be confusing. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. docs/RFCS/20171008_sql_optimizer.md, line 121 at r2 (raw file): Previously, petermattis (Peter Mattis) wrote…
Solution here I think: Comments from Reviewable |
This description matches my understanding from the sessions Reviewed 1 of 1 files at r4. Comments from Reviewable |
I added the glossary of terms as we discussed. Also identified properties as a module, and created a dedicated section. PTAL |
Reviewed 1 of 1 files at r4, 1 of 1 files at r5. Comments from Reviewable |
The additions look great. Review status: all files reviewed at latest revision, 13 unresolved discussions, some commit checks failed. docs/RFCS/20171008_sql_optimizer.md, line 67 at r5 (raw file):
How do you feel about the term "cost-agnostic transformations"? This allows us to distinguish them from "cost-based transformations". docs/RFCS/20171008_sql_optimizer.md, line 110 at r5 (raw file):
Perhaps "a.k.a. unnesting" docs/RFCS/20171008_sql_optimizer.md, line 133 at r5 (raw file):
Perhaps "a.k.a. decorrelating". docs/RFCS/20171008_sql_optimizer.md, line 181 at r5 (raw file):
Perhaps docs/RFCS/20171008_sql_optimizer.md, line 187 at r5 (raw file):
While correct that the functional dependencies form a graph, I haven't found that attribute to be useful so far. Have you? docs/RFCS/20171008_sql_optimizer.md, line 220 at r5 (raw file):
I believe docs/RFCS/20171008_sql_optimizer.md, line 301 at r5 (raw file):
I'm finding this sentence a bit awkward due to
docs/RFCS/20171008_sql_optimizer.md, line 526 at r5 (raw file):
Did you intend for there to be a blank line before this line? I've noticed a few instances of odd spacing and line wrapping in the additions. Comments from Reviewable |
Review status: all files reviewed at latest revision, 11 unresolved discussions, some commit checks failed. docs/RFCS/20171008_sql_optimizer.md, line 67 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
👍 - updated docs/RFCS/20171008_sql_optimizer.md, line 110 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. docs/RFCS/20171008_sql_optimizer.md, line 133 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. docs/RFCS/20171008_sql_optimizer.md, line 181 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. docs/RFCS/20171008_sql_optimizer.md, line 187 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
The fact it is a graph is not used directly in the code; however it is a graph, where the vertices are the variables and the edges the "dependency info" that the code does compute. So in memory you end up having graph vertices and edges. If it quacks like a duck... I think it is useful for the human that the prosaic explanation points to the graph and say "look this is really what's happening here". docs/RFCS/20171008_sql_optimizer.md, line 220 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. docs/RFCS/20171008_sql_optimizer.md, line 301 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. docs/RFCS/20171008_sql_optimizer.md, line 526 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
In my patch I tried as much as possible to not re-justify paragraphs so that the line diff would be minimal. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions. docs/RFCS/20171008_sql_optimizer.md, line 187 at r5 (raw file): Previously, knz (kena) wrote…
My point is that mentioning that the functional dependencies are a graph provides no benefit to me. Perhaps for some readers, but I'd rather call out what the functional dependencies are. Also, the first sentence implies that the input variables are the functional dependencies, but there are other dependencies that we'll be maintaining. For example, we'll likely be tracking the "keys" for each expression by propagating the null-ability of input variables and "candidate keys" from input expressions. Concretely, here is my suggestion: The functional dependencies for an expression are constraints between two sets of columns. Specific examples of functional dependencies are the projections, where 1 or more input variables determine an output variable, and "keys" which are a set of columns where no two rows output by the expression are equal after projection on to that set (e.g. a unique index for a table where all of the columns are NOT NULL). Conceptually, the functional dependencies form a graph, though they are not represented as such in code. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. docs/RFCS/20171008_sql_optimizer.md, line 187 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. Comments from Reviewable |
I have collated the 3 documents in this PR as discussed. |
Review status: 0 of 1 files reviewed at latest revision, 20 unresolved discussions. docs/RFCS/sql_query_planning.md, line 777 at r16 (raw file): Previously, knz (kena) wrote…
Done. docs/RFCS/sql_query_planning.md, line 1031 at r19 (raw file): Previously, knz (kena) wrote…
I'm not sure some of these are problematic. For example, comparing two subqueries in scalar context presumably requires that the subqueries return a single row. Regardless, we don't need to be exhaustive here in my opinion. I've added some more text here. docs/RFCS/sql_query_planning.md, line 1024 at r20 (raw file): Previously, knz (kena) wrote…
Done. Comments from Reviewable |
46e1afb
to
15e024d
Compare
@knz The latest commit removes the Review status: 0 of 1 files reviewed at latest revision, 20 unresolved discussions, all commit checks successful. Comments from Reviewable |
Thanks for documenting all of this! Reviewed 1 of 2 files at r11, 1 of 3 files at r21. docs/RFCS/sql_query_planning.md, line 802 at r21 (raw file):
Maybe update this section to include our new understanding about passing histograms up the query plan? docs/RFCS/sql_query_planning.md, line 921 at r21 (raw file):
best on generality -> based on generality Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 22 unresolved discussions. docs/RFCS/sql_query_planning.md, line 802 at r21 (raw file): Previously, rytaft wrote…
Good idea. I've added a sentence about propagating histograms up through the intermediate nodes. More detail than that is beyond my knowledge. Let me know if you have something additional to add. Comments from Reviewable |
Reviewed 1 of 2 files at r11, 1 of 1 files at r22. docs/RFCS/sql_query_planning.md, line 802 at r21 (raw file): Previously, petermattis (Peter Mattis) wrote…
LGTM Comments from Reviewable |
This change brings in a subset of https://github.com/petermattis/opttoy/tree/master/v3 This change introduces: - the expr tree: cascades-style optimizers operate on expression trees which can represent both scalar and relational expressions; this is a departure from the way we represent expressions and statements (sem/tree) so we need a new tree structure. - scalar operators: initially, we focus only on scalar expressions. - building an expr tree from a sem/tree.TypedExpr. - opt version of logic tests See the RFC in cockroachdb#19135 for more context on the optimizer. This is the first step of an initial project related to the optimizer: generating index constraints from scalar expressions. This will be a rewrite of the current index constraint generation code (which has many problems, see cockroachdb#6346). Roughly, the existing `makeIndexConstraints` will call into the optimizer with a `TypedExpr` and the optimizer will return index constraints. Release note: None
This change brings in a subset of https://github.com/petermattis/opttoy/tree/master/v3 This change introduces: - the expr tree: cascades-style optimizers operate on expression trees which can represent both scalar and relational expressions; this is a departure from the way we represent expressions and statements (sem/tree) so we need a new tree structure. - scalar operators: initially, we focus only on scalar expressions. - building an expr tree from a sem/tree.TypedExpr. - opt version of logic tests See the RFC in cockroachdb#19135 for more context on the optimizer. This is the first step of an initial project related to the optimizer: generating index constraints from scalar expressions. This will be a rewrite of the current index constraint generation code (which has many problems, see cockroachdb#6346). Roughly, the existing `makeIndexConstraints` will call into the optimizer with a `TypedExpr` and the optimizer will return index constraints. Release note: None
This change brings in a subset of https://github.com/petermattis/opttoy/tree/master/v3 This change introduces: - the expr tree: cascades-style optimizers operate on expression trees which can represent both scalar and relational expressions; this is a departure from the way we represent expressions and statements (sem/tree) so we need a new tree structure. - scalar operators: initially, we focus only on scalar expressions. - building an expr tree from a sem/tree.TypedExpr. - opt version of logic tests See the RFC in cockroachdb#19135 for more context on the optimizer. This is the first step of an initial project related to the optimizer: generating index constraints from scalar expressions. This will be a rewrite of the current index constraint generation code (which has many problems, see cockroachdb#6346). Roughly, the existing `makeIndexConstraints` will call into the optimizer with a `TypedExpr` and the optimizer will return index constraints. Release note: None
This change brings in a subset of https://github.com/petermattis/opttoy/tree/master/v3 This change introduces: - the expr tree: cascades-style optimizers operate on expression trees which can represent both scalar and relational expressions; this is a departure from the way we represent expressions and statements (sem/tree) so we need a new tree structure. - scalar operators: initially, we focus only on scalar expressions. - building an expr tree from a sem/tree.TypedExpr. - opt version of logic tests See the RFC in cockroachdb#19135 for more context on the optimizer. This is the first step of an initial project related to the optimizer: generating index constraints from scalar expressions. This will be a rewrite of the current index constraint generation code (which has many problems, see cockroachdb#6346). Roughly, the existing `makeIndexConstraints` will call into the optimizer with a `TypedExpr` and the optimizer will return index constraints. Release note: None
This change brings in a subset of https://github.com/petermattis/opttoy/tree/master/v3 This change introduces: - the expr tree: cascades-style optimizers operate on expression trees which can represent both scalar and relational expressions; this is a departure from the way we represent expressions and statements (sem/tree) so we need a new tree structure. - scalar operators: initially, we focus only on scalar expressions. - building an expr tree from a sem/tree.TypedExpr. - opt version of logic tests See the RFC in cockroachdb#19135 for more context on the optimizer. This is the first step of an initial project related to the optimizer: generating index constraints from scalar expressions. This will be a rewrite of the current index constraint generation code (which has many problems, see cockroachdb#6346). Roughly, the existing `makeIndexConstraints` will call into the optimizer with a `TypedExpr` and the optimizer will return index constraints. Release note: None
This change brings in a subset of https://github.com/petermattis/opttoy/tree/master/v3 This change introduces: - the expr tree: cascades-style optimizers operate on expression trees which can represent both scalar and relational expressions; this is a departure from the way we represent expressions and statements (sem/tree) so we need a new tree structure. - scalar operators: initially, we focus only on scalar expressions. - building an expr tree from a sem/tree.TypedExpr. - opt version of logic tests See the RFC in cockroachdb#19135 for more context on the optimizer. This is the first step of an initial project related to the optimizer: generating index constraints from scalar expressions. This will be a rewrite of the current index constraint generation code (which has many problems, see cockroachdb#6346). Roughly, the existing `makeIndexConstraints` will call into the optimizer with a `TypedExpr` and the optimizer will return index constraints. Release note: None
fcb79fc
to
c29d5ec
Compare
High-level modules of next generation SQL query planning including a full-featured optimizer.
c29d5ec
to
684d868
Compare
I put metaphorical pen-to-paper this weekend and sketched out the
high-level modules for a SQL optimizer. This overlaps with Raphael's SQL
changes document (#18977), but has a more singular focus on SQL
optimization. I consider the documents complementary.