rfc: Support for non-materialized views (WIP) #9045

a-robinson · 2016-09-01T21:14:51Z

This is a somewhat bare-bones outline of a proposal for #2971, made without a great understanding of how everything works today. Any feedback that you have for me before I get too far into working on this would be great!

cc @dt and @paperstreet given your work on foreign keys and interleaved tables, which both deal with similar cross-resource dependencies

This change is

rjnn · 2016-09-02T15:55:48Z

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.

docs/RFCS/views.md, line 104 [r1] (raw file):

We can take a similar approach for view relationships, meaning that a
`ViewDescriptor` will reference the tables/views it depends on, and each
of the tables/views that it depends on will maintain state referring back

It's straightforward to see why you need the viewDescriptor to reference the tables/views it depends on, but why do the underlying tables/views need backlinks? I suspect that its to support updating the views when there are schema changes made to the underlying tables, but a little bit more fleshing out (perhaps with an example) could be useful here.

Comments from Reviewable

a-robinson · 2016-09-02T16:53:59Z

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion.

docs/RFCS/views.md, line 104 [r1] (raw file):

Previously, arjunravinarayan (Arjun Narayan) wrote…

It's straightforward to see why you need the viewDescriptor to reference the tables/views it depends on, but why do the underlying tables/views need backlinks? I suspect that its to support updating the views when there are schema changes made to the underlying tables, but a little bit more fleshing out (perhaps with an example) could be useful here.

Added a couple sentences to the start of this section. Make sense?

Comments from Reviewable

rjnn · 2016-09-02T18:26:06Z

docs/RFCS/views.md, line 104 [r1] (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Added a couple sentences to the start of this section. Make sense?

Yup! Thanks.

Comments from Reviewable

danhhz · 2016-09-02T20:24:15Z

cc @nvanbenschoten too since this is probably even more similar to information schema than it is to fks and interleaved

As far as I'm concerned, the direction seems good and you could feel free to start working out the details

Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.

docs/RFCS/views.md, line 80 [r2] (raw file):

## Storing view descriptors

We can create a new ViewDescriptor protocol buffer type to parallel the

My instinct is that we should reuse TableDescriptor if it doesn't make things too gross. IIRC, This is what we do with information_schema so some of the patterns are already established.

Reasons for this:

Not really sure what the separate proto gets us
Fundamentally, a view behaves like a table in many ways
There are probably tons of methods that already take one that we'd have to rewrite in terms of some new interface to give views their own proto. This is fine but unfortunate
The number of fields that'll overlap between the two feels like a code smell

docs/RFCS/views.md, line 109 [r2] (raw file):

that refer back to the relevant tables and columns in both direcitons..

We can take a similar approach for view relationships, meaning that a

Yup. sgtm

Comments from Reviewable

a-robinson · 2016-09-06T13:40:17Z

Thanks for the review, I'm getting to work on the code now.

Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.

docs/RFCS/views.md, line 80 [r2] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

My instinct is that we should reuse TableDescriptor if it doesn't make things too gross. IIRC, This is what we do with information_schema so some of the patterns are already established.

Reasons for this:

Not really sure what the separate proto gets us

Fundamentally, a view behaves like a table in many ways

There are probably tons of methods that already take one that we'd have to rewrite in terms of some new interface to give views their own proto. This is fine but unfortunate

The number of fields that'll overlap between the two feels like a code smell

Thanks, that all makes sense. Updated.

Comments from Reviewable

dt · 2016-09-07T17:23:19Z

docs/RFCS/views.md

+  [`DROP VIEW`](https://www.postgresql.org/docs/9.1/static/sql-dropview.html).
+  While supporting `CASCADE` as well would be nice, it can implemented
+  separately at a later time, as we
+  [chose to do with foreign keys](fk.md#cascade-and-other-behaviors).


fwiw, FKs do support CASCADE for DROP (it just drops the FK constraint), just not for DELETE/TRUNCATE (where it'd need to find and delete actual rows)

RaduBerinde · 2016-09-08T19:37:46Z

It would be good to talk a bit with @knz regarding query rewriting. The main difficulty is that when we do something like SELECT a,b,c FROM SomeView WHERE <expr>, we don't want to run the subquery for the view (which may be an entire table, or even worse a join) and then apply the WHERE filter on all the results. We would want the WHERE to make it inside the view's query if at all possible.
This may not be doable in the current codebase, but @knz has been thinking about query optimization so he may have insights on what the best direction would be to make it possible in the future.

Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.

docs/RFCS/views.md, line 21 [r3] (raw file):

[SQL](https://msdn.microsoft.com/en-us/library/ms187956.aspx)
[databases](http://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/intro/src/tpc/db2z_views.html).
In a sense, they're table stakes. Views are used for a number of reasons,

table stakes? I had to look that up, very managerial :)

docs/RFCS/views.md, line 60 [r3] (raw file):

  on various other `DROP` commands as well.
* Postgres notably does not support inserts/updates through views. I
  propose that we don't either for now.

Agreed.

Comments from Reviewable

a-robinson · 2016-09-08T20:07:15Z

Yeah, I would hope that query optimization could handle that kind of case no matter whether it's caused by a view or just by the user doing something silly in a non-view query, but will chat with @knz about it.

Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions.

docs/RFCS/views.md, line 21 [r3] (raw file):

Previously, RaduBerinde wrote…

table stakes? I had to look that up, very managerial :)

Whoops, my Google background may be coming through ;)

docs/RFCS/views.md, line 64 [r4] (raw file):

Previously, dt (David Taylor) wrote…

fwiw, FKs do support CASCADE for DROP (it just drops the FK constraint), just not for DELETE/TRUNCATE (where it'd need to find and delete actual rows)

Ah, thanks for clarifying! I've clarified the text.

Comments from Reviewable

knz · 2016-09-08T22:43:28Z

docs/RFCS/views.md

+  [Postgres supports](https://www.postgresql.org/docs/9.1/static/sql-createview.html)
+  the `CREATE OR REPLACE` statement so long as the replacement query
+  outputs the same columns as the original query in the same order
+  (i.e. it can only add new columns to the end).


Is this limitation important? It does not look to me we are bound to it.

The only justification I've been able to find or come up with is that it's effectively just a simplification of implementation to ensure that when a view is "replaced" the system doesn't have to deal with updating everything that depends on that view.

http://dba.stackexchange.com/a/62817/105457

At some level, it can provide certain assurances to DBAs that they won't accidentally make unsafe changes, as mentioned in this email discussion:

http://comments.gmane.org/gmane.comp.db.postgresql.general/197711

knz · 2016-09-08T23:04:51Z

Ok I like that you're considering how views interact with the rest of the schema.
However if you go this way then there's some more homework to do.

The main question with regards to design is whether you envision views to be stored syntactically or semantically.

Syntactically means like we store the SQL string that defines the view in the schema. This is what we currently do for default expressions and check expressions in table descriptors. When you store syntactically, the SQL string is re-parsed and re-analyzed every time a query that uses the view is executed, and the result of compiling the SQL string is inserted in-place in the reuse context for the purpose of optimization and execution.

Storing syntactically makes it rather easy to implement in our current code base.

You already highlighted that there's a challenge with dealing with schema compatibility when columns are added or removed. With a syntactic view definition you can deal with this by attempting to re-parse and re-check all associated views whenever you change the schema for a table. There's a little bit of code involved but it's not conceptually hard.

However the real challenge with a syntactic definition, and that is how to deal with renames of tables and columns. It's actually not so trivial to rewrite the SQL syntax of a view definition whenever a referenced table or column is renamed. (Think about nested JOINs and AS clauses. Anytime you see a name in a query that looks like a table name it might be something else entirely.) It's algorithmically hard.

The other approach is storing a view semantically. This means that we define an encoding in the descriptor tables for the abstract semantic tree of a query, and we store encoded abstract trees instead. These trees contain nodes for things like table and column references. Probably the encoding for a table reference contains its table ID, and for a column reference to the column ID; no names. A semantic encoding makes all the semantic checks trivial and makes the view automatically compatible with any renames without any effort.

The difficulty with semantic storage is that we would need to define a database encoding for our abstract trees. (Technically this probably means we need to give them protobufs and integrate the corresponding generated marshalling/unmarshalling code.) That's a lot more plumbing and scaffolding work.

I personally think that CockroachDB is bound to grow a semantic encoding for this stuff at some point in the future anyways; when that happens, we can then use it for default expressions, check expressions, prepared statements, views, stored procedures and probably a cached database of optimized queries. The strategic question is whether views are the right conduit to get started on this, so early.

I honestly don't know and I encourage you to think about it and contribute your opinion. Nathan will definitely have a say on the topic too I think.

For the sake of keeping your starter project tractable, I might still want to recommend a syntactic approach first, and simply state that at this point we will not support renames. Then you would implement a check in ALTER that would just refuse to rename things that are referred to by stored views. That would enable you to get something functional in a shorter time span.

knz · 2016-09-08T23:05:59Z

Regarding query optimization. We absolutely need to start thinking soon about pushing filters down queries. Both from the use of a view down into the view's select node, and from a select node down into its JOIN operands. However this is a separate issue orthogonal to the design of views.

a-robinson · 2016-09-09T14:17:39Z

As you mentioned, I don't know that I want to take on creating a semantic encoding for our AST quite yet. There may be some pain down the road in making the change, particularly given our goal of supporting any format written to disk by our beta releases, but if we're already going to have to do it for the other resources you mention then adding one more isn't terrible.

My naive expectation is that we'll want to switch to the semantic encoding before too long, but I'll do some more thinking about it.

dt · 2016-09-09T14:35:19Z

+1 to semantic encoding of query ASTs being way out of scope here -- there's precedent for just storing strings, there are other ways to optimize reparsing at runtime and reasonably easy ways to detect if a potential rename breaks things and block it for now.

knz · 2016-09-10T11:17:34Z

Even if they are out of scope I think it's important to mention this discussion and the choice in the text of the RFC.

a-robinson · 2016-09-12T15:14:40Z

I've added a section discussing the encoding of the view definition, cribbing heavily from your great description of the situation.

Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.

Comments from Reviewable

knz · 2016-09-12T15:25:01Z

Reviewed 1 of 1 files at r7.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.

docs/RFCS/views.md, line 75 at r7 (raw file):

* [For historical reasons](https://www.postgresql.org/docs/9.4/static/sql-alterview.html),
  Postgres allows for `ALTER TABLE` to be used on views, but I propose that
  we avoid supporting that for as long as possible.

Agreed.

docs/RFCS/views.md, line 127 at r7 (raw file):

definition whenever a referenced table or column is renamed. To get
around this, we could reject attempts to rename things that are depended
on by a view.

I'm not so clear on how to check this properly, unfortunately. We may need to forbid all renames on a table that has any dependent views.

Comments from Reviewable

a-robinson · 2016-09-12T15:59:51Z

Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.

docs/RFCS/views.md, line 127 at r7 (raw file):

Previously, knz (kena) wrote…

I'm not so clear on how to check this properly, unfortunately. We may need to forbid all renames on a table that has any dependent views.

Sorry, my attempt to be concise cut out some meaning. Fixed.

Comments from Reviewable

a-robinson assigned dt Sep 1, 2016

a-robinson force-pushed the views branch 2 times, most recently from 54210af to 5d9d86e Compare September 1, 2016 21:15

a-robinson force-pushed the views branch from 5d9d86e to 6fbbd24 Compare September 2, 2016 16:53

a-robinson force-pushed the views branch from 6fbbd24 to c384ce5 Compare September 2, 2016 19:27

a-robinson force-pushed the views branch from c384ce5 to 6466511 Compare September 6, 2016 14:03

dt reviewed Sep 7, 2016
View reviewed changes

a-robinson force-pushed the views branch from bb6f0fa to a4c4d93 Compare September 8, 2016 20:07

knz reviewed Sep 8, 2016
View reviewed changes

a-robinson force-pushed the views branch from a4c4d93 to b30a343 Compare September 9, 2016 13:54

a-robinson force-pushed the views branch 2 times, most recently from 4f5e28e to c358339 Compare September 12, 2016 15:14

a-robinson force-pushed the views branch from c358339 to 37a22ad Compare September 12, 2016 15:59

a-robinson force-pushed the views branch from 37a22ad to bf1e678 Compare September 12, 2016 16:57

a-robinson added 2 commits September 12, 2016 12:58

rfc: Support for non-materialized views (WIP)

16cbfb5

rfc: Clarify support for a couple minor view features

450d3a2

a-robinson force-pushed the views branch from bf1e678 to 450d3a2 Compare September 12, 2016 16:58

a-robinson merged commit fee48b6 into cockroachdb:develop Sep 12, 2016

a-robinson mentioned this pull request Oct 14, 2016

sql: Restrict renames of objects that views depend on #9988

Merged

benesch added the first-pr Use to mark the first PR sent by a contributor / team member. Reviewers should be mindful of this. label May 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: Support for non-materialized views (WIP) #9045

rfc: Support for non-materialized views (WIP) #9045

a-robinson commented Sep 1, 2016 •

edited by rjnn

Loading

rjnn commented Sep 2, 2016

a-robinson commented Sep 2, 2016

rjnn commented Sep 2, 2016

danhhz commented Sep 2, 2016

a-robinson commented Sep 6, 2016

dt Sep 7, 2016

RaduBerinde commented Sep 8, 2016

a-robinson commented Sep 8, 2016

knz Sep 8, 2016

a-robinson Sep 9, 2016

a-robinson Sep 9, 2016

knz commented Sep 8, 2016 •

edited

Loading

knz commented Sep 8, 2016 •

edited

Loading

a-robinson commented Sep 9, 2016

dt commented Sep 9, 2016 •

edited

Loading

knz commented Sep 10, 2016

a-robinson commented Sep 12, 2016

knz commented Sep 12, 2016

a-robinson commented Sep 12, 2016

rfc: Support for non-materialized views (WIP) #9045

rfc: Support for non-materialized views (WIP) #9045

Conversation

a-robinson commented Sep 1, 2016 • edited by rjnn Loading

rjnn commented Sep 2, 2016

a-robinson commented Sep 2, 2016

rjnn commented Sep 2, 2016

danhhz commented Sep 2, 2016

a-robinson commented Sep 6, 2016

dt Sep 7, 2016

Choose a reason for hiding this comment

RaduBerinde commented Sep 8, 2016

a-robinson commented Sep 8, 2016

knz Sep 8, 2016

Choose a reason for hiding this comment

a-robinson Sep 9, 2016

Choose a reason for hiding this comment

a-robinson Sep 9, 2016

Choose a reason for hiding this comment

knz commented Sep 8, 2016 • edited Loading

knz commented Sep 8, 2016 • edited Loading

a-robinson commented Sep 9, 2016

dt commented Sep 9, 2016 • edited Loading

knz commented Sep 10, 2016

a-robinson commented Sep 12, 2016

knz commented Sep 12, 2016

a-robinson commented Sep 12, 2016

a-robinson commented Sep 1, 2016 •

edited by rjnn

Loading

knz commented Sep 8, 2016 •

edited

Loading

knz commented Sep 8, 2016 •

edited

Loading

dt commented Sep 9, 2016 •

edited

Loading