diff --git a/docs/RFCS/20220628_invisible_index.md b/docs/RFCS/20220628_invisible_index.md new file mode 100644 index 000000000000..cf19a895bbf8 --- /dev/null +++ b/docs/RFCS/20220628_invisible_index.md @@ -0,0 +1,295 @@ +- Feature Name: Invisible Index +- Status: draft +- Start Date: 2022-06-28 +- Authors: Wenyi Hu +- RFC PR: (TODO (wenyihu6): link PR later) +- Cockroach Issue: https://github.com/cockroachdb/cockroach/issues/72576, + https://github.com/cockroachdb/cockroach/issues/82363 + +# Summary + +This new feature introduces the option to make an index become invisible. An +invisible index is an index that is up-to-date but is ignored by the optimizer +unless explicitly specified with [index +hinting](https://www.cockroachlabs.com/docs/v22.1/table-expressions#force-index-selection). +Users can create an index as invisible or alter an index to be invisible after +its initialization. As for now, primary indexes cannot be invisible. But unique +indexes can still be invisible. Specifically, the unique constraint still prevents +insertion of duplicates into a column regardless of whether the inedx is invisible. +But the index will be ignored by the optimizer for queries. + +The main purpose of this RFC is to introduce the feature, to document different +choices of potential SQL syntaxes, and ultimately to justify the decision. + +# Motivation + +Currently, users are not able to observe the impact of removing an index without +risking the cost of rebuilding the index. This new feature would allow users to +validate whether an index should be dropped by changing it to invisible first. +If a drop in query performance is observed, the index can be quickly toggled +back to visible without rebuilding the index. + +Similarly, this new feature would also allow users to roll out new indexes with +more confidence. Currently, some users with large production scales are +concerned about the impact of introducing new indexes and potentially affecting +their applications significantly. With this feature, users can create new +indexes and easily toggle it back to invisible without the cost of dropping the +index. + +This new feature would also be useful if we want to set an index to be visible +only to specific queries. By using index hinting, users can force an invisible +index to be visible to parts of their applications without affecting the rest of +the application. + +# Technical design + +## SQL Syntax + +This following section will discuss different SQL syntax choices. PostgreSQL +does not support invisible indexes yet. We will be using MySQL and Oracle SQL as +a reference for the standardized way to support invisible index syntax. The +points below outline different choices and their use examples. + +Just for reference, the following section shows how SQL syntax now looks like. +The parts surrounded by *** [] *** propose different options that we can consider. + +[//]: # (CREATE INDEX) +1. `CREATE INDEX` +- Create an invisible index by using `CREATE INDEX` in index definition. +- Create an unique invisible index by using `CREATE UNIQUE INDEX` in index definition. +```sql +CREATE [UNIQUE | INVERTED] INDEX [CONCURRENTLY] [IF NOT EXISTS] [] + ON ( [ASC | DESC] [, ...] ) + [USING HASH] [STORING ( )] + [PARTITION BY ] + [WITH ] [WHERE ] + *** [INVISIBLE | NOT VISIBLE | VISIBLE | HIDDEN] *** +``` + +```sql +CREATE INDEX a ON b.c (d) VISIBLE +CREATE INDEX a ON b.c (d) INVISIBLE +CREATE INDEX a ON b.c (d) HIDDEN +CREATE INDEX a ON b.c (d) NOT VISIBLE + +CREATE INDEX a ON b (c) WITH (fillfactor = 100, y_bounds = 50) VISIBLE +CREATE INDEX a ON b (c) WITH (fillfactor = 100, y_bounds = 50) INVISIBLE +CREATE INDEX a ON b (c) WITH (fillfactor = 100, y_bounds = 50) HIDDEN +CREATE INDEX a ON b (c) WITH (fillfactor = 100, y_bounds = 50) NOT VISIBLE + +CREATE INDEX geom_idx ON t USING GIST(geom) WITH (s2_max_cells = 20, s2_max_level = 12, s2_level_mod = 3) HIDDEN +CREATE INDEX geom_idx ON t USING GIST(geom) WITH (s2_max_cells = 20, s2_max_level = 12, s2_level_mod = 3) INVISIBLE +CREATE INDEX geom_idx ON t USING GIST(geom) WITH (s2_max_cells = 20, s2_max_level = 12, s2_level_mod = 3) NOT VISIBLE + +CREATE UNIQUE INDEX IF NOT EXISTS a ON b (c) WHERE d > 3 HIDDEN +CREATE UNIQUE INDEX IF NOT EXISTS a ON b (c) WHERE d > 3 INVISIBLE +CREATE UNIQUE INDEX IF NOT EXISTS a ON b (c) WHERE d > 3 NOT VISIBLE +``` +[//]: # (CREATE TABLE) +2. `CREATE TABLE` +- Create an invisible index by adding an index in a `CREATE TABLE...(INDEX)` definition. +- Create an unique invisible index by adding an index in a `CREATE TABLE...(UNIQUE INDEX)` definition. +- Create an unique invisible index by adding an unique constraint within the table constraint definition of a `CREATE TABLE ...(CONSTRAINT ...)` +- Create an unique invisible index by adding an unique constraint within the column definition of a `CREATE TABLE ...(UNIQUE...)` definition. + +```sql + CREATE [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE [IF NOT EXISTS] [table_element_list] [] +``` + +
+ table_element_list + +
+ Index Definition + +```sql +[UNIQUE | INVERTED] INDEX [] ( [ASC | DESC] [, ...] + [USING HASH] [{STORING | INCLUDE | COVERING} ( )] + [PARTITION BY ] + [WITH ] [WHERE ] + *** [INVISIBLE | NOT VISIBLE | VISIBLE | HIDDEN] *** +``` + +```sql +CREATE TABLE a (b INT8, c STRING, INDEX (b ASC, c DESC) STORING (c) INVISIBLE) +CREATE TABLE a (b INT8, c STRING, INDEX (b ASC, c DESC) STORING (c) NOT VISIBLE) + +CREATE TABLE a (b INT, UNIQUE INDEX foo (b) WHERE c > 3 INVISIBLE) +CREATE TABLE a (b INT, UNIQUE INDEX foo (b) WHERE c > 3 NOT VISIBLE) +``` + +
+ +
+ Column Constraint Definition + +```sql +[CONSTRAINT ] + { NULL | NOT NULL | NOT VISIBLE | + UNIQUE [WITHOUT INDEX | *** WITH INVISIBLE INDEX | WITH NOT VISIBLE INDEX ***] + PRIMARY KEY *** [WITH INVISIBLE INDEX | WITH NOT VISIBLE INDEX | WITHOUT VISIBLE INDEX] *** | CHECK () | DEFAULT | ON UPDATE | GENERATED { ALWAYS | BY DEFAULT } + AS IDENTITY [( )] } +-- Note: primary index cannot be invisible. In this case, the rule is introduced only to throw a semantic error later on. +``` + +```sql +CREATE TABLE a (b INT8 CONSTRAINT c UNIQUE WITHOUT INDEX) +CREATE TABLE a (b INT8 CONSTRAINT c UNIQUE WITH INVISIBLE INDEX) +CREATE TABLE a (b INT8 CONSTRAINT c UNIQUE WITH HIDDEN INDEX) +CREATE TABLE a (b INT8 CONSTRAINT c UNIQUE WITHOUT VISIBLE INDEX) +CREATE TABLE a (b INT8 CONSTRAINT c UNIQUE WITH NOT VISIBLE INDEX) + +CREATE TABLE a (b INT8 CONSTRAINT c PRIMARY KEY INVISIBLE) --/ semantic error +CREATE TABLE a (b INT8 CONSTRAINT c PRIMARY KEY NOT VISIBLE) -- semantic error +CREATE TABLE a (b INT8 CONSTRAINT c PRIMARY KEY HIDDEN) -- semantic error +``` +
+ +
+ Table Constraint Definition + +```sql + UNIQUE [WITHOUT INDEX | *** WITH INVISIBLE INDEX | WITH NOT VISIBLE INDEX ***] ( ) [{STORING | INCLUDE | COVERING} ( )] + PRIMARY KEY ( ) [USING HASH] *** [WITH INVISIBLE INDEX | WITH NOT VISIBLE INDEX] *** -- Note: primary index cannot be invisible. In this case, the rule is introduced only to throw a semantic error later on. +``` +```sql +CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE WITH INVISIBLE INDEX (b, c)) +CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE WITH HIDDEN INDEX (b, c)) +CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE WITH NOT VISIBLE INDEX (b, c)) +CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE WITHOUT VISIBLE INDEX (b, c)) + +CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE (b) INVISIBLE INDEX) +CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE (b) HIDDEN INDEX) +CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE (b) NOT VISIBLE INDEX) + +CREATE TABLE a (b INT8, c STRING, PRIMARY KEY (b, c, "0") INVISIBLE) -- semantic error +CREATE TABLE a (b INT8, c STRING, PRIMARY KEY (b, c, "0") HIDDEN) -- semantic error +CREATE TABLE a (b INT8, c STRING, PRIMARY KEY (b, c, "0") NOT VISIBLE) -- semantic error + +``` +
+ +
+ +[//]: # (Alter Table) +3. `Alter Table` +- Create an unique invisible index by adding unique constraint within the table constraint definition of an `ALTER TABLE ADD CONSTRAINT ...` +- Create an unique invisible index by adding unique constraint within the column definition of `ALTER TABLE ADD `, `ALTER TABLE ADD IF NOT EXISTS `, `ALTER TABLE ADD COLUMN `, `ALTER TABLE ADD COLUMN IF NOT EXISTS `. + +```sql + ALTER TABLE ... ADD [COLUMN] [IF NOT EXISTS] [] + ALTER TABLE ... ADD + ALTER TABLE ... ALTER PRIMARY KEY USING INDEX -- Note: primary index cannot be invisible. In this case, the rule is introduced only to throw a semantic error later on. +``` + +
+constraint + +```sql +[CONSTRAINT ] {NULL | NOT NULL | UNIQUE [WITHOUT INDEX | *** WITH INVISIBLE INDEX | WITH NOT VISIBLE INDEX *** ]| PRIMARY KEY [*** WITH INVISIBLE INDEX | WITH NOT VISIBLE INDEX *** ]| CHECK () | DEFAULT } +``` + +```sql +ALTER TABLE a ADD CONSTRAINT a_idx UNIQUE WITH INVISIBLE INDEX (a) +ALTER TABLE a ADD CONSTRAINT a_idx UNIQUE WITH HIDDEN INDEX (a) +ALTER TABLE a ADD CONSTRAINT a_idx UNIQUE WITHOUT VISIBLE INDEX (a) +ALTER TABLE a ADD CONSTRAINT a_idx UNIQUE WITH NOT VISIBLE INDEX (a) + +ALTER TABLE IF EXISTS a ADD COLUMN b INT8 UNIQUE WITH INVISIBLE INDEX, ADD CONSTRAINT a_no_idx UNIQUE WITH INVISIBLE INDEX (a) +ALTER TABLE IF EXISTS a ADD COLUMN b INT8 UNIQUE WITH HIDDEN INDEX, ADD CONSTRAINT a_no_idx UNIQUE WITH HIDDEN INDEX (a) +ALTER TABLE IF EXISTS a ADD COLUMN b INT8 UNIQUE WITH NOT VISIBLE INDEX, ADD CONSTRAINT a_no_idx UNIQUE WITH NOT VISIBLE INDEX (a) +ALTER TABLE IF EXISTS a ADD COLUMN b INT8 UNIQUE WITHOUT VISIBLE INDEX, ADD CONSTRAINT a_no_idx UNIQUE WITHOUT VISIBLE INDEX (a) +``` +
+ +[//]: # (Alter Index) +4. `Alter Index` +```sql +ALTER INDEX [IF EXISTS] [INVISIBLE | NOT VISIBLE | HIDDEN] +``` + +```sql +ALTER INDEX a@b INVSIBLE +ALTER INDEX a@b NOT VISIBLE +ALTER INDEX a@b HIDDEN +``` + +[//]: # (Show Constraint) +5. `Show Constraint` +```sql +SHOW CONSTRAINT FROM table_name with_comment +``` + +```sql +table_name constraint_name constraint_type details(*** add invisible index details ***) validated +unique_without_index my_partial_unique_f UNIQUE UNIQUE WITH INVISIBLE INDEX (f) WHERE (f > 0) true +unique_without_index my_partial_unique_f UNIQUE UNIQUE WITH HIDDEN INDEX (f) WHERE (f > 0) true +unique_without_index my_partial_unique_f UNIQUE UNIQUE WITH NOT VISIBLE INDEX (f) WHERE (f > 0) true +``` + +[//]: # (Show Index) +6. A new column needs to be added to `crdb_internal.table_indexes`. +``` +descriptor_id descriptor_name index_id index_name index_type is_unique is_inverted is_sharded ***is_hidden*** shard_bucket_count created_at + ***is_invisible*** + ***is_not_invisible*** + ***visibility*** +``` + +7. A new column needs to be added to the output of following SQL statements: +```sql +SHOW INDEX FROM (table_name) +SHOW INDEXES FROM(table_name) +SHOW KEYS FROM (table_name) + +SHOW INDEX FROM DATABASE(database_name) +SHOW INDEXES FROM DATABASE (database_name) +SHOW KEYS FROM DATABASE (database_name) +``` + +```sql +table_name index_name non_unique seq_in_index column_name direction storing implicit ***is_hidden*** + ***is_invisible*** + ***is_not_invisible*** + ***visibility*** +``` + +7. Note that `CREATE CONSTRAINT` and `ALTER CONSTRAINT` are both not supported by the parser. + +### Discussion +CockroachDB currently supports invisible column feature. For this +feature, `NOT VISIBLE` is used for its SQL statement, and `is_hidden` is used +for the new column added to `SHOW INDEX`. It would be nice to stay consistent. +But [MySQL](https://dev.mysql.com/doc/refman/8.0/en/invisible-indexes.html) and +[Oracle](https://oracle-base.com/articles/11g/invisible-indexes-11gr1) both +support `INVISIBLE` for invisible indexes. [MySQL](https://dev.mysql.com/doc/refman/8.0/en/invisible-columns.html), [Oracle](https://oracle-base.com/articles/12c/invisible-columns-12cr1) +also use `INVISIBLE` for the invisible columns. +[MySQL](https://dev.mysql.com/doc/refman/8.0/en/invisible-indexes.html) use +`is_visible` for the new column added to `SHOW INDEX`. +[Oracle](http://www.dba-oracle.com/t_11g_new_index_features.htm) uses +`VISIBILITY` for the new column added to `SHOW INDEX`. + +I was wondering why `NOT VISIBLE` was chosen for the invisible column feature. +I tried changing it to `INVISIBLE` in `sql.y`, and it caused conflicts in the +grammar. I'm not sure if this was the reason why we chose `NOT VISIBLE`. +PostgreSQL currently doesn't support invisible index or invisible column +feature. We can also try supporting both `NOT VISIBLE` and `INVISIBLE`, but more +work would be needed to find a grammar rule that allows both. + +## Later Discussion: Fine-Grained Control of Index Visibility +Later on, we want to extend this feature and allow a more fine-grained control +of index visibility by introducing the following two features. + +As of now, the plan is to introduce the general feature of invisible index +first. The design and implementation details for fine-grained control of index +visibility will be added later on. + +1. Indexes are not restricted to just being visible or invisible; users can experiment + with different levels of visibility. In other words, instead of using a boolean + invisible flag, users can set a float invisible flag between 0.0 and 1.0. The + index would be made invisible only to a corresponding fraction of queries. + Related: https://github.com/cockroachdb/cockroach/issues/72576#issuecomment-1034301996 + +2. Different sessions of a certain user or application can set different index + visibilities for indexes. + Related: https://github.com/cockroachdb/cockroach/issues/82363