Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: Support for non-materialized views (WIP) #9045

Merged
merged 2 commits into from
Sep 12, 2016

Conversation

a-robinson
Copy link
Contributor

@a-robinson a-robinson commented Sep 1, 2016

This is a somewhat bare-bones outline of a proposal for #2971, made without a great understanding of how everything works today. Any feedback that you have for me before I get too far into working on this would be great!

cc @dt and @paperstreet given your work on foreign keys and interleaved tables, which both deal with similar cross-resource dependencies


This change is Reviewable

@a-robinson a-robinson force-pushed the views branch 2 times, most recently from 54210af to 5d9d86e Compare September 1, 2016 21:15
@rjnn
Copy link
Contributor

rjnn commented Sep 2, 2016

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


docs/RFCS/views.md, line 104 [r1] (raw file):

We can take a similar approach for view relationships, meaning that a
`ViewDescriptor` will reference the tables/views it depends on, and each
of the tables/views that it depends on will maintain state referring back

It's straightforward to see why you need the viewDescriptor to reference the tables/views it depends on, but why do the underlying tables/views need backlinks? I suspect that its to support updating the views when there are schema changes made to the underlying tables, but a little bit more fleshing out (perhaps with an example) could be useful here.


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion.


docs/RFCS/views.md, line 104 [r1] (raw file):

Previously, arjunravinarayan (Arjun Narayan) wrote…

It's straightforward to see why you need the viewDescriptor to reference the tables/views it depends on, but why do the underlying tables/views need backlinks? I suspect that its to support updating the views when there are schema changes made to the underlying tables, but a little bit more fleshing out (perhaps with an example) could be useful here.

Added a couple sentences to the start of this section. Make sense?

Comments from Reviewable

@rjnn
Copy link
Contributor

rjnn commented Sep 2, 2016

docs/RFCS/views.md, line 104 [r1] (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Added a couple sentences to the start of this section. Make sense?

Yup! Thanks.

Comments from Reviewable

@danhhz
Copy link
Contributor

danhhz commented Sep 2, 2016

cc @nvanbenschoten too since this is probably even more similar to information schema than it is to fks and interleaved

As far as I'm concerned, the direction seems good and you could feel free to start working out the details


Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


docs/RFCS/views.md, line 80 [r2] (raw file):

## Storing view descriptors

We can create a new ViewDescriptor protocol buffer type to parallel the

My instinct is that we should reuse TableDescriptor if it doesn't make things too gross. IIRC, This is what we do with information_schema so some of the patterns are already established.

Reasons for this:

  • Not really sure what the separate proto gets us
  • Fundamentally, a view behaves like a table in many ways
  • There are probably tons of methods that already take one that we'd have to rewrite in terms of some new interface to give views their own proto. This is fine but unfortunate
  • The number of fields that'll overlap between the two feels like a code smell

docs/RFCS/views.md, line 109 [r2] (raw file):

that refer back to the relevant tables and columns in both direcitons..

We can take a similar approach for view relationships, meaning that a

Yup. sgtm


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Thanks for the review, I'm getting to work on the code now.


Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


docs/RFCS/views.md, line 80 [r2] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

My instinct is that we should reuse TableDescriptor if it doesn't make things too gross. IIRC, This is what we do with information_schema so some of the patterns are already established.

Reasons for this:

  • Not really sure what the separate proto gets us
  • Fundamentally, a view behaves like a table in many ways
  • There are probably tons of methods that already take one that we'd have to rewrite in terms of some new interface to give views their own proto. This is fine but unfortunate
  • The number of fields that'll overlap between the two feels like a code smell
Thanks, that all makes sense. Updated.

Comments from Reviewable

[`DROP VIEW`](https://www.postgresql.org/docs/9.1/static/sql-dropview.html).
While supporting `CASCADE` as well would be nice, it can implemented
separately at a later time, as we
[chose to do with foreign keys](fk.md#cascade-and-other-behaviors).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, FKs do support CASCADE for DROP (it just drops the FK constraint), just not for DELETE/TRUNCATE (where it'd need to find and delete actual rows)

@RaduBerinde
Copy link
Member

:lgtm:

It would be good to talk a bit with @knz regarding query rewriting. The main difficulty is that when we do something like SELECT a,b,c FROM SomeView WHERE <expr>, we don't want to run the subquery for the view (which may be an entire table, or even worse a join) and then apply the WHERE filter on all the results. We would want the WHERE to make it inside the view's query if at all possible.
This may not be doable in the current codebase, but @knz has been thinking about query optimization so he may have insights on what the best direction would be to make it possible in the future.


Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


docs/RFCS/views.md, line 21 [r3] (raw file):

[SQL](https://msdn.microsoft.com/en-us/library/ms187956.aspx)
[databases](http://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/intro/src/tpc/db2z_views.html).
In a sense, they're table stakes. Views are used for a number of reasons,

table stakes? I had to look that up, very managerial :)


docs/RFCS/views.md, line 60 [r3] (raw file):

  on various other `DROP` commands as well.
* Postgres notably does not support inserts/updates through views. I
  propose that we don't either for now.

Agreed.


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Yeah, I would hope that query optimization could handle that kind of case no matter whether it's caused by a view or just by the user doing something silly in a non-view query, but will chat with @knz about it.


Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions.


docs/RFCS/views.md, line 21 [r3] (raw file):

Previously, RaduBerinde wrote…

table stakes? I had to look that up, very managerial :)

Whoops, my Google background may be coming through ;)

docs/RFCS/views.md, line 64 [r4] (raw file):

Previously, dt (David Taylor) wrote…

fwiw, FKs do support CASCADE for DROP (it just drops the FK constraint), just not for DELETE/TRUNCATE (where it'd need to find and delete actual rows)

Ah, thanks for clarifying! I've clarified the text.

Comments from Reviewable

[Postgres supports](https://www.postgresql.org/docs/9.1/static/sql-createview.html)
the `CREATE OR REPLACE` statement so long as the replacement query
outputs the same columns as the original query in the same order
(i.e. it can only add new columns to the end).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this limitation important? It does not look to me we are bound to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only justification I've been able to find or come up with is that it's effectively just a simplification of implementation to ensure that when a view is "replaced" the system doesn't have to deal with updating everything that depends on that view.

http://dba.stackexchange.com/a/62817/105457

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some level, it can provide certain assurances to DBAs that they won't accidentally make unsafe changes, as mentioned in this email discussion:

http://comments.gmane.org/gmane.comp.db.postgresql.general/197711

@knz
Copy link
Contributor

knz commented Sep 8, 2016

Ok I like that you're considering how views interact with the rest of the schema.
However if you go this way then there's some more homework to do.

The main question with regards to design is whether you envision views to be stored syntactically or semantically.

Syntactically means like we store the SQL string that defines the view in the schema. This is what we currently do for default expressions and check expressions in table descriptors. When you store syntactically, the SQL string is re-parsed and re-analyzed every time a query that uses the view is executed, and the result of compiling the SQL string is inserted in-place in the reuse context for the purpose of optimization and execution.

Storing syntactically makes it rather easy to implement in our current code base.

You already highlighted that there's a challenge with dealing with schema compatibility when columns are added or removed. With a syntactic view definition you can deal with this by attempting to re-parse and re-check all associated views whenever you change the schema for a table. There's a little bit of code involved but it's not conceptually hard.

However the real challenge with a syntactic definition, and that is how to deal with renames of tables and columns. It's actually not so trivial to rewrite the SQL syntax of a view definition whenever a referenced table or column is renamed. (Think about nested JOINs and AS clauses. Anytime you see a name in a query that looks like a table name it might be something else entirely.) It's algorithmically hard.

The other approach is storing a view semantically. This means that we define an encoding in the descriptor tables for the abstract semantic tree of a query, and we store encoded abstract trees instead. These trees contain nodes for things like table and column references. Probably the encoding for a table reference contains its table ID, and for a column reference to the column ID; no names. A semantic encoding makes all the semantic checks trivial and makes the view automatically compatible with any renames without any effort.

The difficulty with semantic storage is that we would need to define a database encoding for our abstract trees. (Technically this probably means we need to give them protobufs and integrate the corresponding generated marshalling/unmarshalling code.) That's a lot more plumbing and scaffolding work.

I personally think that CockroachDB is bound to grow a semantic encoding for this stuff at some point in the future anyways; when that happens, we can then use it for default expressions, check expressions, prepared statements, views, stored procedures and probably a cached database of optimized queries. The strategic question is whether views are the right conduit to get started on this, so early.

I honestly don't know and I encourage you to think about it and contribute your opinion. Nathan will definitely have a say on the topic too I think.

For the sake of keeping your starter project tractable, I might still want to recommend a syntactic approach first, and simply state that at this point we will not support renames. Then you would implement a check in ALTER that would just refuse to rename things that are referred to by stored views. That would enable you to get something functional in a shorter time span.

@knz
Copy link
Contributor

knz commented Sep 8, 2016

Regarding query optimization. We absolutely need to start thinking soon about pushing filters down queries. Both from the use of a view down into the view's select node, and from a select node down into its JOIN operands. However this is a separate issue orthogonal to the design of views.

@a-robinson
Copy link
Contributor Author

As you mentioned, I don't know that I want to take on creating a semantic encoding for our AST quite yet. There may be some pain down the road in making the change, particularly given our goal of supporting any format written to disk by our beta releases, but if we're already going to have to do it for the other resources you mention then adding one more isn't terrible.

My naive expectation is that we'll want to switch to the semantic encoding before too long, but I'll do some more thinking about it.

@dt
Copy link
Member

dt commented Sep 9, 2016

+1 to semantic encoding of query ASTs being way out of scope here -- there's precedent for just storing strings, there are other ways to optimize reparsing at runtime and reasonably easy ways to detect if a potential rename breaks things and block it for now.

@knz
Copy link
Contributor

knz commented Sep 10, 2016

Even if they are out of scope I think it's important to mention this discussion and the choice in the text of the RFC.

@a-robinson a-robinson force-pushed the views branch 2 times, most recently from 4f5e28e to c358339 Compare September 12, 2016 15:14
@a-robinson
Copy link
Contributor Author

I've added a section discussing the encoding of the view definition, cribbing heavily from your great description of the situation.


Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Sep 12, 2016

:lgtm:


Reviewed 1 of 1 files at r7.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.


docs/RFCS/views.md, line 75 at r7 (raw file):

* [For historical reasons](https://www.postgresql.org/docs/9.4/static/sql-alterview.html),
  Postgres allows for `ALTER TABLE` to be used on views, but I propose that
  we avoid supporting that for as long as possible.

Agreed.


docs/RFCS/views.md, line 127 at r7 (raw file):

definition whenever a referenced table or column is renamed. To get
around this, we could reject attempts to rename things that are depended
on by a view.

I'm not so clear on how to check this properly, unfortunately. We may need to forbid all renames on a table that has any dependent views.


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


docs/RFCS/views.md, line 127 at r7 (raw file):

Previously, knz (kena) wrote…

I'm not so clear on how to check this properly, unfortunately. We may need to forbid all renames on a table that has any dependent views.

Sorry, my attempt to be concise cut out some meaning. Fixed.

Comments from Reviewable

@a-robinson a-robinson merged commit fee48b6 into cockroachdb:develop Sep 12, 2016
@benesch benesch added the first-pr Use to mark the first PR sent by a contributor / team member. Reviewers should be mindful of this. label May 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
first-pr Use to mark the first PR sent by a contributor / team member. Reviewers should be mindful of this.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants