rfc: Support for non-materialized views (WIP)

cockroachdb · Sep 12, 2016 · 16cbfb5 · 16cbfb5
1 parent 20f9815
commit 16cbfb5
Showing 1 changed file with 177 additions and 0 deletions.
diff --git a/docs/RFCS/views.md b/docs/RFCS/views.md
@@ -0,0 +1,177 @@
+- Feature Name: Non-Materialized Views
+- Status: draft
+- Start Date: 2016-09-01
+- Authors: Alex Robinson
+- RFC PR: [#9045](https://github.com/cockroachdb/cockroach/pull/9045)
+- Cockroach Issue: [#2971](https://github.com/cockroachdb/cockroach/issues/2971)
+
+# Summary
+
+Add support for non-materialized views to our SQL dialect.
+Materialized views are explicitly out of scope.
+
+# Motivation
+
+[Views](https://en.wikipedia.org/wiki/View_(SQL)) are a widely-supported
+feature across
+[all](https://www.postgresql.org/docs/9.1/static/sql-createview.html)
+[major](http://dev.mysql.com/doc/refman/5.7/en/views.html)
+[SQL](https://msdn.microsoft.com/en-us/library/ms187956.aspx)
+[databases](http://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/intro/src/tpc/db2z_views.html).
+In a sense, they're table stakes. Views are used for a number of reasons,
+including aliasing complex queries, limiting access to underlying data, or
+maintaining compatibility with legacy code as changes are made to the underlying
+database schema.
+
+# Scope
+
+As a bare minimum, we need to support creating views, referencing views
+in queries, and dropping views. We should probably also support altering
+views, although it would be possible to get good use out of views
+without that.
+
+Beyond the basics, though, different major SQL databases offer differing
+features around views. Some allow writing to underlying tables through views
+and checking the integrity of such updates. Some support a
+`CREATE OR REPLACE` statement to change a view's definition in a single
+command or idempotently create a view. Some have special restrictions on
+the `CREATE OR REPLACE` command. Some allow additional options on views,
+such as whether they're only temporary for the current session.
+
+Given our PostgreSQL compatibility, it makes sense to support what they
+support unless we have reason not to.
+
+* Even though it isn't part of the SQL standard,
+  [Postgres supports](https://www.postgresql.org/docs/9.1/static/sql-createview.html)
+  the `CREATE OR REPLACE` statement so long as the replacement query
+  outputs the same columns as the original query in the same order
+  (i.e. it can only add new columns to the end).
+* We should also support the applicable limited `ALTER VIEW` options that
+  [Postgres offers](https://www.postgresql.org/docs/9.1/static/sql-alterview.html).
+* We should support the `RESTRICT` option as the default on
+  [`DROP VIEW`](https://www.postgresql.org/docs/9.1/static/sql-dropview.html).
+  While supporting `CASCADE` as well would be nice, it can implemented
+  separately at a later time, as we
+  [chose to do with foreign keys](fk.md#cascade-and-other-behaviors).
+* Now that there will be references to underlying tables and indexes, we
+  will have to support the `RESTRICT` option (and eventually `CASCADE`)
+  on various other `DROP` commands as well.
+* Postgres notably does not support inserts/updates through views. I
+  propose that we don't either for now.
+
+# Detailed design
+
+TODO(a-robinson): Flesh this out as needed after getting initial feedback.
+
+The major problems we have to solve to support the in-scope features are
+validating new views, storing view definitions somewhere, tracking
+their dependencies to enable `RESTRICT` behavior when schemas are
+changed, and rewriting queries that refer to view.
+
+## Validating new views
+
+Without having dug very far into the code yet, I'd expect to be able to
+reuse existing query validation functionality pretty directly for this.
+There may be some differences (e.g. not allowing `ORDER BY`), but hopefully
+not too many.
+
+## Storing view descriptors
+
+We can reuse the
+[`TableDescriptor`](https://github.com/cockroachdb/cockroach/blob/develop/sql/sqlbase/structured.proto#L244)
+protocol buffer type to represent views. Only a small amount of
+modification will be needed to support the needs of views, and reusing
+the same descriptor will remove the need to duplicate most of the fields
+in the proto and much of the code that processes the proto. Tables and
+views are typically used in the same ways, so it isn't much of a stretch
+to share the underlying descriptor, which is also what we do to support
+the information_schema tables.
+
+## Storing view queries
+
+The more interesting question around storage is around how we will store
+the underlying query that defines a view. We could choose to do so either
+*syntactically* or *semantically*. We're going to start with a syntactic
+encoding for the reasons outlined below.
+
+### Syntactically
+
+Syntactically would mean that we simply store the SQL string that defines
+the intended query. At query time, the string would get re-parsed and
+re-analyzed every time and inserted in place of where the view name was
+used. This is what we currently do for default expressions and check
+expressions in table descriptors, and is fairly simple to implement.
+
+There will be some trickiness in handling compatibility in the face of
+schema changes. For adding/removing columns or removing tables, we can
+do proper verification of schema changes against all dependent views as
+they're made.
+
+The trickier thing to handle will be renames of tables and columns. It
+is quite difficult to rewrite the SQL string of an arbitrary view
+definition whenever a referenced table or column is renamed. To get
+around this, we could reject attempts to rename anything in a table that
+is depended on by a view.
+
+### Semantically
+
+Semantically would mean that we define an encoding for our abstract
+semantic query trees and store them rather than the query string.
+The query trees would presumably use IDs rather than names to refer
+to columns and tables, obviating the renaming problem of a syntactic
+representation.
+
+The difficulty of this approach is primarily the work involved in
+defining and maintaining a consistent encoding for our abstract trees.
+While we think this is something we're likely to do in the future as
+it would help with many features (e.g. default expressions, check
+expressions, prepared statements, stored procedures, etc.), it's a
+very large project, and now might not be the right time to take it on.
+
+## Tracking view dependencies
+
+In order to maintain consistency within a database, we need to prevent
+a table (or view) that a view relies on in its query from being deleted out
+from underneath the view, or from being modified in a way that makes it
+incompatible with the view. Thus, upon a request to delete or update a
+table/view, we have to know whether or not some view depends on its
+existence.
+
+While some other databases (e.g.
+[PostgreSQL](https://www.postgresql.org/docs/8.4/static/catalog-pg-depend.html)
+and [SQL Server](https://msdn.microsoft.com/en-us/library/bb677315.aspx))
+use dedicated system tables for tracking dependencies between database
+entities, CockroachDB has so far taken the approach of maintaining
+dependency information denormalized in the underlying descriptor tables.
+For example, foreign key and interleaved table relationships are tracked
+by storing `ForeignKeyReference` protocol buffers in index descriptors
+that refer back to the relevant tables and columns in both direcitons..
+
+We can take a similar approach for view relationships, meaning that a
+`ViewDescriptor` will reference the tables/views it depends on, and each
+of the tables/views that it depends on will maintain state referring back
+to it. As with foreign key constraints, the overhead of maintaining state
+in both places should be negligible due to the infrequency of schema updates.
+
+## Handling schema updates
+
+I expect that schema changes to views  will mostly mirror how we handle
+schema updates to tables today, but with the added need to verify the
+validity of changes (to tables, indexes, and views) against referenced
+or dependent descriptors. As mentioned in the discussion of a syntactic
+vs semantic representation above, this will mean restricting renames and
+and removeals of columns and tables depended on by views.
+
+## Query rewriting
+
+Similar to validating new views, this should mostly be manageable just
+by reusing existing code. For example, it's easy to imagine adding
+to the logic for looking up a table descriptor to also handle view
+descriptors, then inserting (and processing) the subquery from the view
+in its place.
+
+# Unresolved questions
+
+I imagine some questions will come up as I get a little deeper into the
+system, but none at the moment. I don't expect there to be any major
+obstacles to supporting views.