-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC [WIP] an intermediate representation for SQL #10055
Conversation
Review status: 0 of 1 files reviewed at latest revision, all discussions resolved, all commit checks successful. Comments from Reviewable |
Thanks for your input. Do you know of anything close to term rewriting systems / libraries available in Go already? |
I'd love more discussion about why we need to write the IR definition in ML, versus writing it in Go. Review status: 0 of 1 files reviewed at latest revision, 6 unresolved discussions, all commit checks successful. docs/RFCS/sql-ir.md, line 15 at r1 (raw file):
how about an extra one - to allow the distsql logical planner to recognize subtrees that it can compile? Or more generally, to allow different compilation "backends". docs/RFCS/sql-ir.md, line 17 at r1 (raw file):
can you please expand this bullet? docs/RFCS/sql-ir.md, line 19 at r1 (raw file):
is this one a legitimate need? I think we can transfer the filter expressions just fine today, as the string from the current Expression Node docs/RFCS/sql-ir.md, line 23 at r1 (raw file):
I'm not sure I follow these constraints... Is this assuming that we're gonna save IR to use later/on a different node? Or is this about saving views? docs/RFCS/sql-ir.md, line 117 at r1 (raw file):
on the left hand side you could also have structures (struct literals), not strings. docs/RFCS/sql-ir.md, line 128 at r1 (raw file):
I'm not sure why/if we need to generate these specialized matchers/"actionfns". Can't we write a generic matching algorithm? Comments from Reviewable |
I also don't quite understand why we need to start with a new language. I thought that the idea is to start with the current In particular, I don't see the benefit for expressions. We already have all the machinery to use, analyze, normalize (and even a simple way to serialize/deserialize) SQL expressions, why maintain two different languages/grammars/etc? It would mean that whenever we want to add some feature and support something we would need to mess with both. In the IR, how do you express that you are using a certain index? The IR needs to allow representing this right? (In fact, I can't even find how the general idea of reading from a table is represented, IOW what IR "node" would correspond to the current |
8b8717b
to
06718b9
Compare
Guys I have completed the 2nd iteration, the text is rather .. different now. PTAL. Note it's probably not complete yet, we likely want to be a bit more specific as to what we want as attributes in the IR nodes in this text (i.e. before they are implemented), but I'd like your opinion about the overall shape of the work before we dive into specifics. cc @cuongdo @arjunravinarayan @irfansharif @petermattis @nvanbenschoten @a-robinson |
Review status: 0 of 1 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed. docs/RFCS/sql-ir.md, line 429 at r2 (raw file):
This doc mixes tabs and spaces in a few places; I don't know what the markdown renderer will do with that. Replace all tabs with spaces. docs/RFCS/sql-ir.md, line 476 at r2 (raw file):
Yeah, I think the RFC is the appropriate place to define names. Comments from Reviewable |
Great stuff, thanks for all the additions! Overall shape/direction is great! Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. docs/RFCS/sql-ir.md, line 437 at r2 (raw file):
Note that the "join by lookup" method could be used for other joins, not just index join. I agree it's somewhat of a hybrid though. Comments from Reviewable |
What I'd like a bit more detail on is how exactly the "backends" are going to interact with the IR tree. Are we going to pass the IR to DistSQL, which then has an opportunity to do more transformations? And then DistSQL is going to delegate the parts that it doesn't want to deal with to another "backend" that does further transformations and then produces planNodes? Review status: 0 of 1 files reviewed at latest revision, 18 unresolved discussions, some commit checks failed. docs/RFCS/sql-ir.md, line 117 at r1 (raw file):
|
but is this IR essentially going to serve as the abstract syntax tree too? The phases of the compiler that check for errors (name resolution, type checking) need to operate on something close to the surface syntax to give good error messages, but subsequent phases are simpler if there's a desugaring step that reduces the number of constructs that they must consider. Review status: 0 of 1 files reviewed at latest revision, 18 unresolved discussions, some commit checks failed. Comments from Reviewable |
@andreimatei: I have added a paragraph in the summary to explain the interaction. @eisenstatdavid: I have added a paragraph to clarify that pretty-printing back to SQL will only be possible up to semantic analysis and that we may lose that opportunity after that, and instead we can keep track of the text location that the IR node originates from in the input for error reporting. Review status: 0 of 1 files reviewed at latest revision, 18 unresolved discussions, some commit checks failed. docs/RFCS/sql-ir.md, line 15 at r1 (raw file):
|
@knz it doesn't look like you pushed the changes you made in response to everyone's comments? Review status: 0 of 1 files reviewed at latest revision, 18 unresolved discussions, some commit checks failed. Comments from Reviewable |
Oh indeed, my bad. Done. |
Thanks for writing this all up! Reviewed 1 of 1 files at r3. docs/RFCS/sql-ir.md, line 51 at r3 (raw file):
I understand the importance of keeping things simple and completely agree with this decision, but things like showing a view definition to the user after a bunch of the tables/columns it uses have been renamed will unfortunately require more logic than just keeping a copy of the original string. Comments from Reviewable |
Review status: all files reviewed at latest revision, 19 unresolved discussions, some commit checks failed. docs/RFCS/sql-ir.md, line 51 at r3 (raw file):
|
Review status: all files reviewed at latest revision, 19 unresolved discussions, some commit checks failed. docs/RFCS/sql-ir.md, line 51 at r3 (raw file):
|
396617f
to
589ae68
Compare
Reminder for the discussion in #11736 (comment) about "ghost columns partcipating in ordering" |
I had another iteration on this PR after recent developments:
The big next steps at this point are:
Review status: 0 of 1 files reviewed at latest revision, 19 unresolved discussions. docs/RFCS/sql-ir.md, line 128 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Removed this section. Will address in a later / separate RFC. docs/RFCS/sql-ir.md, line 476 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. Will need to iterate a little bit before this merges though. docs/RFCS/sql-ir.md, line 639 at r2 (raw file): Previously, andreimatei (Andrei Matei) wrote…
I removed this section - I have a solution, will write a separate RFC. docs/RFCS/sql-ir.md, line 51 at r3 (raw file): Previously, a-robinson (Alex Robinson) wrote…
I added a qualifier sub-clause in the sentence. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 19 unresolved discussions, all commit checks successful. docs/RFCS/sql-ir.md, line 389 at r4 (raw file):
If we're going by some of the terminology we've learned recently, I think the above we all be considered "prep". docs/RFCS/sql-ir.md, line 530 at r4 (raw file):
I'm still fuzzy on how the pattern matching would ease the writing of transformations. That is, I can imagine this at a high-level, but the specifics are where the devil is. I think it would be worthwhile, perhaps in a companion RFC, to give an example of 2 transformations in the current code and how they would be simplified. Two good examples would be expression normalization and propagating filters across inner joins. Comments from Reviewable |
This RFC has been in Review status: 0 of 1 files reviewed at latest revision, 21 unresolved discussions, all commit checks successful. docs/RFCS/sql-ir.md, line 238 at r4 (raw file):
As my understanding of the needs of this IR have become clearer, I'm not sure if auto-generated code is necessary, at least for the IR node structure. As an alternative, I think a single node structure used for both relational and scalar expressions can satisfy our needs and enable easy transformations. What seems to be important for the IR structure is to be able to easily determine and update logical properties about a node, such as the input and output variables. This could be achieved by making the IR nodes implement an interface ( There is still an exploration to be made about whether transformations should be represented in a higher-level language or directly in Go. The latter certainly gives the most flexibility, but perhaps at the expense of expressiveness. docs/RFCS/sql-ir.md, line 361 at r4 (raw file):
This approach to using the IR sounds like it is trying to replace the AST nodes. An alternative would be to leave the AST nodes in place and provide a transformation from the AST to the IR. The advantage of this approach is that we can choose when we perform the transformation. For example, we can leave all of the existing name resolution, type checking and normalization code in place for the time being and use the IR for higher-level transformations and query planning. Over time, we can move the conversion from the AST to the IR closer to the raw AST. Comments from Reviewable |
Superseded by #19135. |
Reaction to last week's meeting (with @irfansharif @arjunravinarayan @petermattis @RaduBerinde @andreimatei)
@eisenstatdavid I'd like you to help me shape this up before we submit it for discussion to a larger group. Some special questions for you:
This change is