Skip to content

Latest commit

 

History

History
295 lines (241 loc) · 14.3 KB

20220628_invisible_index.md

File metadata and controls

295 lines (241 loc) · 14.3 KB
  • Feature Name: Invisible Index
  • Status: draft
  • Start Date: 2022-06-28
  • Authors: Wenyi Hu
  • RFC PR: (TODO (wenyihu6): link PR later)
  • Cockroach Issue: #72576, #82363

Summary

This new feature introduces the option to make an index become invisible. An invisible index is an index that is up-to-date but is ignored by the optimizer unless explicitly specified with index hinting. Users can create an index as invisible or alter an index to be invisible after its initialization. As for now, primary indexes cannot be invisible. But unique indexes can still be invisible. Specifically, the unique constraint still prevents insertion of duplicates into a column regardless of whether the inedx is invisible. But the index will be ignored by the optimizer for queries.

The main purpose of this RFC is to introduce the feature, to document different choices of potential SQL syntaxes, and ultimately to justify the decision.

Motivation

Currently, users are not able to observe the impact of removing an index without risking the cost of rebuilding the index. This new feature would allow users to validate whether an index should be dropped by changing it to invisible first. If a drop in query performance is observed, the index can be quickly toggled back to visible without rebuilding the index.

Similarly, this new feature would also allow users to roll out new indexes with more confidence. Currently, some users with large production scales are concerned about the impact of introducing new indexes and potentially affecting their applications significantly. With this feature, users can create new indexes and easily toggle it back to invisible without the cost of dropping the index.

This new feature would also be useful if we want to set an index to be visible only to specific queries. By using index hinting, users can force an invisible index to be visible to parts of their applications without affecting the rest of the application.

Technical design

SQL Syntax

This following section will discuss different SQL syntax choices. PostgreSQL does not support invisible indexes yet. We will be using MySQL and Oracle SQL as a reference for the standardized way to support invisible index syntax. The points below outline different choices and their use examples.

Just for reference, the following section shows how SQL syntax now looks like. The parts surrounded by *** [] *** propose different options that we can consider.

  1. CREATE INDEX
  • Create an invisible index by using CREATE INDEX in index definition.
  • Create an unique invisible index by using CREATE UNIQUE INDEX in index definition.
CREATE [UNIQUE | INVERTED] INDEX [CONCURRENTLY] [IF NOT EXISTS] [<idxname>]
        ON <tablename> ( <colname> [ASC | DESC] [, ...] )
        [USING HASH] [STORING ( <colnames...> )]
        [PARTITION BY <partition params>]
        [WITH <storage_parameter_list>] [WHERE <where_conds...>]
        *** [INVISIBLE | NOT VISIBLE | VISIBLE | HIDDEN] *** 
CREATE INDEX a ON b.c (d) VISIBLE
CREATE INDEX a ON b.c (d) INVISIBLE
CREATE INDEX a ON b.c (d) HIDDEN
CREATE INDEX a ON b.c (d) NOT VISIBLE

CREATE INDEX a ON b (c) WITH (fillfactor = 100, y_bounds = 50) VISIBLE
CREATE INDEX a ON b (c) WITH (fillfactor = 100, y_bounds = 50) INVISIBLE
CREATE INDEX a ON b (c) WITH (fillfactor = 100, y_bounds = 50) HIDDEN
CREATE INDEX a ON b (c) WITH (fillfactor = 100, y_bounds = 50) NOT VISIBLE

CREATE INDEX geom_idx ON t USING GIST(geom) WITH (s2_max_cells = 20, s2_max_level = 12, s2_level_mod = 3) HIDDEN
CREATE INDEX geom_idx ON t USING GIST(geom) WITH (s2_max_cells = 20, s2_max_level = 12, s2_level_mod = 3) INVISIBLE
CREATE INDEX geom_idx ON t USING GIST(geom) WITH (s2_max_cells = 20, s2_max_level = 12, s2_level_mod = 3) NOT VISIBLE

CREATE UNIQUE INDEX IF NOT EXISTS a ON b (c) WHERE d > 3 HIDDEN
CREATE UNIQUE INDEX IF NOT EXISTS a ON b (c) WHERE d > 3 INVISIBLE
CREATE UNIQUE INDEX IF NOT EXISTS a ON b (c) WHERE d > 3 NOT VISIBLE

2. CREATE TABLE

  • Create an invisible index by adding an index in a CREATE TABLE...(INDEX) definition.
  • Create an unique invisible index by adding an index in a CREATE TABLE...(UNIQUE INDEX) definition.
  • Create an unique invisible index by adding an unique constraint within the table constraint definition of a CREATE TABLE ...(CONSTRAINT ...)
  • Create an unique invisible index by adding an unique constraint within the column definition of a CREATE TABLE ...(UNIQUE...) definition.
 CREATE [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE [IF NOT EXISTS] <tablename> [table_element_list] [<on_commit>]
table_element_list
Index Definition
[UNIQUE | INVERTED] INDEX [<name>] ( <colname> [ASC | DESC] [, ...] 
        [USING HASH] [{STORING | INCLUDE | COVERING} ( <colnames...> )]
        [PARTITION BY <partition params>]
        [WITH <storage_parameter_list>] [WHERE <where_conds...>] 
        *** [INVISIBLE | NOT VISIBLE | VISIBLE | HIDDEN] *** 
CREATE TABLE a (b INT8, c STRING, INDEX (b ASC, c DESC) STORING (c) INVISIBLE)
CREATE TABLE a (b INT8, c STRING, INDEX (b ASC, c DESC) STORING (c) NOT VISIBLE)

CREATE TABLE a (b INT, UNIQUE INDEX foo (b) WHERE c > 3 INVISIBLE)
CREATE TABLE a (b INT, UNIQUE INDEX foo (b) WHERE c > 3 NOT VISIBLE)
Column Constraint Definition
[CONSTRAINT <constraintname>] 
          { NULL | NOT NULL | NOT VISIBLE | 
           UNIQUE [WITHOUT INDEX | *** WITH INVISIBLE INDEX  | WITH NOT VISIBLE INDEX ***]
           PRIMARY KEY *** [WITH INVISIBLE INDEX | WITH NOT VISIBLE INDEX | WITHOUT VISIBLE INDEX] *** | CHECK (<expr>) | DEFAULT <expr> | ON UPDATE <expr> | GENERATED { ALWAYS | BY DEFAULT } 
           AS IDENTITY [( <opt_sequence_option_list> )] }
-- Note: primary index cannot be invisible. In this case, the rule is introduced only to throw a semantic error later on.
CREATE TABLE a (b INT8 CONSTRAINT c UNIQUE WITHOUT INDEX)
CREATE TABLE a (b INT8 CONSTRAINT c UNIQUE WITH INVISIBLE INDEX)
CREATE TABLE a (b INT8 CONSTRAINT c UNIQUE WITH HIDDEN INDEX)
CREATE TABLE a (b INT8 CONSTRAINT c UNIQUE WITHOUT VISIBLE INDEX)
CREATE TABLE a (b INT8 CONSTRAINT c UNIQUE WITH NOT VISIBLE INDEX)

CREATE TABLE a (b INT8 CONSTRAINT c PRIMARY KEY INVISIBLE)  --/ semantic error
CREATE TABLE a (b INT8 CONSTRAINT c PRIMARY KEY NOT VISIBLE)  -- semantic error
CREATE TABLE a (b INT8 CONSTRAINT c PRIMARY KEY HIDDEN)  -- semantic error
Table Constraint Definition
    UNIQUE [WITHOUT INDEX | *** WITH INVISIBLE INDEX  | WITH NOT VISIBLE INDEX ***] ( <colnames...> ) [{STORING | INCLUDE | COVERING} ( <colnames...> )]
    PRIMARY KEY ( <colnames...> ) [USING HASH] *** [WITH INVISIBLE INDEX | WITH NOT VISIBLE INDEX] ***  -- Note: primary index cannot be invisible. In this case, the rule is introduced only to throw a semantic error later on.
CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE WITH INVISIBLE INDEX (b, c))
CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE WITH HIDDEN INDEX (b, c))
CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE WITH NOT VISIBLE INDEX (b, c))
CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE WITHOUT VISIBLE INDEX (b, c))

CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE (b) INVISIBLE INDEX)
CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE (b) HIDDEN INDEX)
CREATE TABLE a (b INT8, c STRING, CONSTRAINT d UNIQUE (b) NOT VISIBLE INDEX)

CREATE TABLE a (b INT8, c STRING, PRIMARY KEY (b, c, "0") INVISIBLE) -- semantic error
CREATE TABLE a (b INT8, c STRING, PRIMARY KEY (b, c, "0") HIDDEN) -- semantic error
CREATE TABLE a (b INT8, c STRING, PRIMARY KEY (b, c, "0") NOT VISIBLE) -- semantic error

3. Alter Table

  • Create an unique invisible index by adding unique constraint within the table constraint definition of an ALTER TABLE <name> ADD CONSTRAINT ...
  • Create an unique invisible index by adding unique constraint within the column definition of ALTER TABLE <name> ADD <coldef>, ALTER TABLE <name> ADD IF NOT EXISTS <coldef>, ALTER TABLE <name> ADD COLUMN <coldef>, ALTER TABLE <name> ADD COLUMN IF NOT EXISTS <coldef>.
   ALTER TABLE ... ADD [COLUMN] [IF NOT EXISTS] <colname> <type> [<constraint...>]
   ALTER TABLE ... ADD <constraint>
   ALTER TABLE ... ALTER PRIMARY KEY USING INDEX <name> -- Note: primary index cannot be invisible. In this case, the rule is introduced only to throw a semantic error later on.
constraint
[CONSTRAINT <constraintname>] {NULL | NOT NULL | UNIQUE [WITHOUT INDEX | *** WITH INVISIBLE INDEX  | WITH NOT VISIBLE INDEX *** ]| PRIMARY KEY [*** WITH INVISIBLE INDEX  | WITH NOT VISIBLE INDEX *** ]| CHECK (<expr>) | DEFAULT <expr>}
ALTER TABLE a ADD CONSTRAINT a_idx UNIQUE WITH INVISIBLE INDEX (a)
ALTER TABLE a ADD CONSTRAINT a_idx UNIQUE WITH HIDDEN INDEX (a)
ALTER TABLE a ADD CONSTRAINT a_idx UNIQUE WITHOUT VISIBLE INDEX (a)
ALTER TABLE a ADD CONSTRAINT a_idx UNIQUE WITH NOT VISIBLE INDEX (a)

ALTER TABLE IF EXISTS a ADD COLUMN b INT8 UNIQUE WITH INVISIBLE INDEX, ADD CONSTRAINT a_no_idx UNIQUE WITH INVISIBLE INDEX (a)
ALTER TABLE IF EXISTS a ADD COLUMN b INT8 UNIQUE WITH HIDDEN INDEX, ADD CONSTRAINT a_no_idx UNIQUE WITH HIDDEN INDEX (a)
ALTER TABLE IF EXISTS a ADD COLUMN b INT8 UNIQUE WITH NOT VISIBLE INDEX, ADD CONSTRAINT a_no_idx UNIQUE WITH NOT VISIBLE INDEX (a)
ALTER TABLE IF EXISTS a ADD COLUMN b INT8 UNIQUE WITHOUT VISIBLE INDEX, ADD CONSTRAINT a_no_idx UNIQUE WITHOUT VISIBLE INDEX (a)

4. Alter Index

ALTER INDEX [IF EXISTS] <idxname> [INVISIBLE | NOT VISIBLE | HIDDEN]
ALTER INDEX a@b INVSIBLE
ALTER INDEX a@b NOT VISIBLE
ALTER INDEX a@b HIDDEN

5. Show Constraint

SHOW CONSTRAINT FROM table_name with_comment
table_name            constraint_name             constraint_type  details(*** add invisible index details ***)      validated
unique_without_index  my_partial_unique_f         UNIQUE           UNIQUE WITH INVISIBLE INDEX (f) WHERE (f > 0)     true
unique_without_index  my_partial_unique_f         UNIQUE           UNIQUE WITH HIDDEN INDEX (f) WHERE (f > 0)        true
unique_without_index  my_partial_unique_f         UNIQUE           UNIQUE WITH NOT VISIBLE INDEX (f) WHERE (f > 0)   true

6. A new column needs to be added to crdb_internal.table_indexes.

descriptor_id  descriptor_name  index_id  index_name  index_type  is_unique  is_inverted  is_sharded  ***is_hidden***       shard_bucket_count created_at
                                                                                                      ***is_invisible*** 
                                                                                                      ***is_not_invisible*** 
                                                                                                      ***visibility*** 
  1. A new column needs to be added to the output of following SQL statements:
SHOW INDEX FROM (table_name) 
SHOW INDEXES FROM(table_name) 
SHOW KEYS FROM (table_name) 

SHOW INDEX FROM DATABASE(database_name) 
SHOW INDEXES FROM DATABASE (database_name) 
SHOW KEYS FROM DATABASE (database_name) 
table_name  index_name  non_unique  seq_in_index  column_name  direction  storing  implicit ***is_hidden***
                                                                                            ***is_invisible***
                                                                                            ***is_not_invisible*** 
                                                                                            ***visibility***
  1. Note that CREATE CONSTRAINT and ALTER CONSTRAINT are both not supported by the parser.

Discussion

CockroachDB currently supports invisible column feature. For this feature, NOT VISIBLE is used for its SQL statement, and is_hidden is used for the new column added to SHOW INDEX. It would be nice to stay consistent. But MySQL and Oracle both support INVISIBLE for invisible indexes. MySQL, Oracle also use INVISIBLE for the invisible columns. MySQL use is_visible for the new column added to SHOW INDEX. Oracle uses VISIBILITY for the new column added to SHOW INDEX.

I was wondering why NOT VISIBLE was chosen for the invisible column feature. I tried changing it to INVISIBLE in sql.y, and it caused conflicts in the grammar. I'm not sure if this was the reason why we chose NOT VISIBLE. PostgreSQL currently doesn't support invisible index or invisible column feature. We can also try supporting both NOT VISIBLE and INVISIBLE, but more work would be needed to find a grammar rule that allows both.

Later Discussion: Fine-Grained Control of Index Visibility

Later on, we want to extend this feature and allow a more fine-grained control of index visibility by introducing the following two features.

As of now, the plan is to introduce the general feature of invisible index first. The design and implementation details for fine-grained control of index visibility will be added later on.

  1. Indexes are not restricted to just being visible or invisible; users can experiment with different levels of visibility. In other words, instead of using a boolean invisible flag, users can set a float invisible flag between 0.0 and 1.0. The index would be made invisible only to a corresponding fraction of queries. Related: #72576 (comment)

  2. Different sessions of a certain user or application can set different index visibilities for indexes. Related: #82363