-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: Support for non-materialized views (WIP) #9045
Conversation
54210af
to
5d9d86e
Compare
Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. docs/RFCS/views.md, line 104 [r1] (raw file):
It's straightforward to see why you need the viewDescriptor to reference the tables/views it depends on, but why do the underlying tables/views need backlinks? I suspect that its to support updating the views when there are schema changes made to the underlying tables, but a little bit more fleshing out (perhaps with an example) could be useful here. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion. docs/RFCS/views.md, line 104 [r1] (raw file):
|
docs/RFCS/views.md, line 104 [r1] (raw file):
|
cc @nvanbenschoten too since this is probably even more similar to information schema than it is to fks and interleaved As far as I'm concerned, the direction seems good and you could feel free to start working out the details Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. docs/RFCS/views.md, line 80 [r2] (raw file):
My instinct is that we should reuse TableDescriptor if it doesn't make things too gross. IIRC, This is what we do with information_schema so some of the patterns are already established. Reasons for this:
docs/RFCS/views.md, line 109 [r2] (raw file):
Yup. sgtm Comments from Reviewable |
Thanks for the review, I'm getting to work on the code now. Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. docs/RFCS/views.md, line 80 [r2] (raw file):
|
[`DROP VIEW`](https://www.postgresql.org/docs/9.1/static/sql-dropview.html). | ||
While supporting `CASCADE` as well would be nice, it can implemented | ||
separately at a later time, as we | ||
[chose to do with foreign keys](fk.md#cascade-and-other-behaviors). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, FKs do support CASCADE for DROP (it just drops the FK constraint), just not for DELETE/TRUNCATE (where it'd need to find and delete actual rows)
It would be good to talk a bit with @knz regarding query rewriting. The main difficulty is that when we do something like Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. docs/RFCS/views.md, line 21 [r3] (raw file):
table stakes? I had to look that up, very managerial :) docs/RFCS/views.md, line 60 [r3] (raw file):
Agreed. Comments from Reviewable |
Yeah, I would hope that query optimization could handle that kind of case no matter whether it's caused by a view or just by the user doing something silly in a non-view query, but will chat with @knz about it. Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions. docs/RFCS/views.md, line 21 [r3] (raw file):
|
[Postgres supports](https://www.postgresql.org/docs/9.1/static/sql-createview.html) | ||
the `CREATE OR REPLACE` statement so long as the replacement query | ||
outputs the same columns as the original query in the same order | ||
(i.e. it can only add new columns to the end). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this limitation important? It does not look to me we are bound to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only justification I've been able to find or come up with is that it's effectively just a simplification of implementation to ensure that when a view is "replaced" the system doesn't have to deal with updating everything that depends on that view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some level, it can provide certain assurances to DBAs that they won't accidentally make unsafe changes, as mentioned in this email discussion:
http://comments.gmane.org/gmane.comp.db.postgresql.general/197711
Ok I like that you're considering how views interact with the rest of the schema. The main question with regards to design is whether you envision views to be stored syntactically or semantically. Syntactically means like we store the SQL string that defines the view in the schema. This is what we currently do for default expressions and check expressions in table descriptors. When you store syntactically, the SQL string is re-parsed and re-analyzed every time a query that uses the view is executed, and the result of compiling the SQL string is inserted in-place in the reuse context for the purpose of optimization and execution. Storing syntactically makes it rather easy to implement in our current code base. You already highlighted that there's a challenge with dealing with schema compatibility when columns are added or removed. With a syntactic view definition you can deal with this by attempting to re-parse and re-check all associated views whenever you change the schema for a table. There's a little bit of code involved but it's not conceptually hard. However the real challenge with a syntactic definition, and that is how to deal with renames of tables and columns. It's actually not so trivial to rewrite the SQL syntax of a view definition whenever a referenced table or column is renamed. (Think about nested JOINs and AS clauses. Anytime you see a name in a query that looks like a table name it might be something else entirely.) It's algorithmically hard. The other approach is storing a view semantically. This means that we define an encoding in the descriptor tables for the abstract semantic tree of a query, and we store encoded abstract trees instead. These trees contain nodes for things like table and column references. Probably the encoding for a table reference contains its table ID, and for a column reference to the column ID; no names. A semantic encoding makes all the semantic checks trivial and makes the view automatically compatible with any renames without any effort. The difficulty with semantic storage is that we would need to define a database encoding for our abstract trees. (Technically this probably means we need to give them protobufs and integrate the corresponding generated marshalling/unmarshalling code.) That's a lot more plumbing and scaffolding work. I personally think that CockroachDB is bound to grow a semantic encoding for this stuff at some point in the future anyways; when that happens, we can then use it for default expressions, check expressions, prepared statements, views, stored procedures and probably a cached database of optimized queries. The strategic question is whether views are the right conduit to get started on this, so early. I honestly don't know and I encourage you to think about it and contribute your opinion. Nathan will definitely have a say on the topic too I think. For the sake of keeping your starter project tractable, I might still want to recommend a syntactic approach first, and simply state that at this point we will not support renames. Then you would implement a check in ALTER that would just refuse to rename things that are referred to by stored views. That would enable you to get something functional in a shorter time span. |
Regarding query optimization. We absolutely need to start thinking soon about pushing filters down queries. Both from the use of a view down into the view's select node, and from a select node down into its JOIN operands. However this is a separate issue orthogonal to the design of views. |
As you mentioned, I don't know that I want to take on creating a semantic encoding for our AST quite yet. There may be some pain down the road in making the change, particularly given our goal of supporting any format written to disk by our beta releases, but if we're already going to have to do it for the other resources you mention then adding one more isn't terrible. My naive expectation is that we'll want to switch to the semantic encoding before too long, but I'll do some more thinking about it. |
+1 to semantic encoding of query ASTs being way out of scope here -- there's precedent for just storing strings, there are other ways to optimize reparsing at runtime and reasonably easy ways to detect if a potential rename breaks things and block it for now. |
Even if they are out of scope I think it's important to mention this discussion and the choice in the text of the RFC. |
4f5e28e
to
c358339
Compare
I've added a section discussing the encoding of the view definition, cribbing heavily from your great description of the situation. Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending. Comments from Reviewable |
Reviewed 1 of 1 files at r7. docs/RFCS/views.md, line 75 at r7 (raw file):
Agreed. docs/RFCS/views.md, line 127 at r7 (raw file):
I'm not so clear on how to check this properly, unfortunately. We may need to forbid all renames on a table that has any dependent views. Comments from Reviewable |
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. docs/RFCS/views.md, line 127 at r7 (raw file):
|
This is a somewhat bare-bones outline of a proposal for #2971, made without a great understanding of how everything works today. Any feedback that you have for me before I get too far into working on this would be great!
cc @dt and @paperstreet given your work on foreign keys and interleaved tables, which both deal with similar cross-resource dependencies
This change is