-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: add rfc for invisible index feature #83531
Conversation
4e2c797
to
cca54f3
Compare
cca54f3
to
7197fd8
Compare
7197fd8
to
f669071
Compare
I noticed that Oliver did a lot of pr review related for invisible column feature, so I reached out to him for some context here.
|
|
||
### Discussion | ||
CockroachDB currently supports invisible column feature. For this | ||
feature, `NOT VISIBLE` is used for its SQL statement, and `is_hidden` is used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i wonder if we should change is_hidden
to not_visible
to be consistent (this is probably my bad when i added the feature).
|
||
I was wondering why `NOT VISIBLE` was chosen for the invisible columns feature. | ||
I tried changing it to `INVISIBLE` in `sql.y`, and it caused conflicts in the | ||
grammar. I'm not sure if this was the reason why we chose `NOT VISIBLE`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned here it has to become a reserved keyword, which is why we went with NOT VISIBLE. Adding INVISIBLE to reserved keywords is potentially problematic as people who have defined columns like INVISIBLE bool
now have to quote it, which makes it backwards incompatible.
[MySQL](https://dev.mysql.com/doc/refman/8.0/en/invisible-indexes.html) use | ||
`is_visible` for the new column added to `SHOW INDEX`. | ||
[Oracle](http://www.dba-oracle.com/t_11g_new_index_features.htm) use | ||
`VISIBILITY` for the new column added to `SHOW INDEX`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i kind of wonder whether INVISIBLE / NOT VISIBLE is the right way to "express" this. My understanding is you still have to pay the write penalty of the indexes, it just doesn't get "used" (which also makes me wonder whether this feature is useful in the ADD case - see my motivation comment above). NOT USED
sounds a bit better semantically to me. but other databases have made it INVISIBLE so maybe we're ok.
this is not a blocking comment and i can easily be convinced otherwise.
If a drop in query performance is observed, the index can be quickly toggled | ||
back to visible without rebuilding the index. | ||
|
||
Similarly, this new feature would also allow users to roll out new indexes with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure whether this fully solves this portion of the motivation. You still need to do a backfill to introduce the index, and then pay the price on every write. I do buy that it does let you experiment with seeing if READ latencies improve with the index though (just wonder if it's worth the write cost!). Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you are right. We are just marking the index as invisible for queries like SELECT. The indexes are maintained up to date with every writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any time the set of indexes changes, query plans can change too. It's easy to imagine cases where dropping an index negatively affects query plans. It's harder to imagine cases where adding an index leads to worse query plans, but it's possible. This paragraph can be more clear that this RFC is solving for the impact of adding an index on query plans, not the impact on index backfills and maintenance during mutations.
f669071
to
465364c
Compare
465364c
to
098a989
Compare
<details> | ||
<summary>Table Constraint Definition</summary> | ||
|
||
UNIQUE [WITHOUT INDEX | *** WITH INVISIBLE INDEX | WITH NOT VISIBLE INDEX ***] ( <colnames...> ) [{STORING | INCLUDE | COVERING} ( <colnames...> )] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to consider how foreign keys constraints interact with unique invisible indexes.
You need a unique constraint or index on the reference column to create a foreign key:
defaultdb> CREATE TABLE p (id INT PRIMARY KEY, a INT);
CREATE TABLE
defaultdb> CREATE TABLE c (id INT PRIMARY KEY, p_a INT REFERENCES p(a));
ERROR: there is no unique constraint matching given keys for referenced table p
SQLSTATE: 23503
defaultdb> CREATE UNIQUE INDEX p_a_key ON p(a);
CREATE INDEX
defaultdb> CREATE TABLE c (id INT PRIMARY KEY, p_a INT REFERENCES p(a));
CREATE TABLE
This is a requirement to make foreign key checks efficient. For example, when inserting into c
we must verify that the new value for p_a
exists in column a
of table p
. Without an index, searching for that value in p
would be expensive. You can see in the query plan of an INSERT
that p_a_key
is used in the anti-join (lookup p@p_a_key)
within f-k-check
.
defaultdb> EXPLAIN (OPT, VERBOSE) INSERT INTO c VALUES (1, 10);
info
------------------------------------------------------------------------------------
insert c
├── columns: <none>
├── insert-mapping:
│ ├── column1:5 => c.id:1
│ └── column2:6 => c.p_a:2
├── input binding: &1
├── cardinality: [0 - 0]
├── volatile, mutations
├── stats: [rows=0]
├── cost: 6.1179112
├── distribution: us-east1
├── values
│ ├── columns: column1:5 column2:6
│ ├── cardinality: [1 - 1]
│ ├── stats: [rows=1, distinct(6)=1, null(6)=0, avgsize(6)=4]
│ ├── cost: 0.02
│ ├── key: ()
│ ├── fd: ()-->(5,6)
│ ├── distribution: us-east1
│ ├── prune: (5,6)
│ └── (1, 10)
└── f-k-checks
└── f-k-checks-item: c(p_a) -> p(a)
└── anti-join (lookup p@p_a_key)
├── columns: p_a:7
├── key columns: [7] = [9]
├── lookup columns are key
├── cardinality: [0 - 1]
├── stats: [rows=1e-10]
├── cost: 6.0879112
├── key: ()
├── fd: ()-->(7)
├── cte-uses
│ └── &1: count=1 used-columns=(6)
├── with-scan &1
│ ├── columns: p_a:7
│ ├── mapping:
│ │ └── column2:6 => p_a:7
│ ├── cardinality: [1 - 1]
│ ├── stats: [rows=1, distinct(7)=1, null(7)=0, avgsize(7)=4]
│ ├── cost: 0.01
│ ├── key: ()
│ ├── fd: ()-->(7)
│ └── cte-uses
│ └── &1: count=1 used-columns=(6)
└── filters (true)
So what happens if we run ALTER INDEX p_a_key NOT VISIBLE
? Does the insert continue to read from the invisible p_a_key
so that the fk check is efficient? Does the insert stop reading from p_a_key
so that the fk check is inefficient? Or does the ALTER INDEX
statement error because a FK depends on that index (and can we even tell if a FK depends on a particular index - multiple overlapping unique indexes that support an FK could exist)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking unique constraint is imposed regardless of whether the index is invisible, and insert can still read from the unique index. The index is ignored just for queries like SELECT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the unique constraint must be imposed because invisible indexes are maintained just like visible indexes. But should they be read for FK checks? Probably, because the alternative is full table scan which would cause abysmal INSERT performance. But we should call out this case, and other like it, because it'll be tricky to get right in the implementation and it'll require thoughtful tests.
And I'm skeptical it's as simple as ignoring only for SELECT
queries. Secondary indexes are read from in UPDATE
and DELETE
statements too. For example:
defaultdb> CREATE TABLE t (k INT PRIMARY KEY, a INT, INDEX (a));
CREATE TABLE
defaultdb> EXPLAIN (OPT, VERBOSE) UPDATE t SET a = 2 WHERE a = 1;
info
---------------------------------------------------------------------------
update t
├── columns: <none>
├── fetch columns: k:5 a:6
├── update-mapping:
│ └── a_new:9 => a:2
├── cardinality: [0 - 0]
├── volatile, mutations
├── stats: [rows=0]
├── cost: 25.0500001
├── distribution: us-east1
└── project
├── columns: a_new:9 k:5 a:6
├── stats: [rows=10]
├── cost: 25.0400001
├── key: (5)
├── fd: ()-->(6,9)
├── distribution: us-east1
├── prune: (5,6,9)
├── scan t@t_a_idx
│ ├── columns: k:5 a:6
│ ├── constraint: /6/5: [/1 - /1]
│ ├── stats: [rows=10, distinct(6)=1, null(6)=0, avgsize(6)=4]
│ ├── cost: 24.8200001
│ ├── key: (5)
│ ├── fd: ()-->(6)
│ └── distribution: us-east1
└── projections
└── 2 [as=a_new:9]
(28 rows)
And what about cases where there is a SELECT
subquery in an INSERT
, UPDATE
, DELETE
, or UPSERT
statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Copying my Reviewable comment from below] because these FK indexes can't be dropped without the CASCADE
option, I think we should disallow making them invisible. (Too hard to add a CASCADE
option to ALTER INDEX SET NOT VISIBLE
.)
But in general, since our goal is to demonstrate the consequences of dropping an index, I don't think we should be scared of causing full table scans. After all, dropping an index likely leads to full table scans as well.
81fd5d5
to
3effe7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this!
It might help to add a short subsection to Technical design
explaining what will happen when an index is invisible, and calling out the restrictions (e.g. no invisible primary indexes).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @devadvocado, @knz, @mgartner, @otan, @postamar, @rhu713, @vy-ton, and @wenyihu6)
docs/RFCS/20220628_invisible_index.md
line 176 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
hmm, i wonder if we should change
is_hidden
tonot_visible
to be consistent (this is probably my bad when i added the feature).
Can we change it now that SHOW COLUMNS
has been released in the wild? Seems like customers could have written queries using the name, e.g. SELECT is_hidden FROM [SHOW COLUMNS FROM p]
.
docs/RFCS/20220628_invisible_index.md
line 187 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
i kind of wonder whether INVISIBLE / NOT VISIBLE is the right way to "express" this. My understanding is you still have to pay the write penalty of the indexes, it just doesn't get "used" (which also makes me wonder whether this feature is useful in the ADD case - see my motivation comment above).
NOT USED
sounds a bit better semantically to me. but other databases have made it INVISIBLE so maybe we're ok.this is not a blocking comment and i can easily be convinced otherwise.
Oracle distinguishes between "unusable" indexes (not maintained and not used by reads) and "invisible" indexes (maintained but not used by reads). I think for better or for worse these have become terms of art, and we should follow them, at least somewhat.
docs/RFCS/20220628_invisible_index.md
line 106 at r2 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
I was thinking unique constraint is imposed regardless of whether the index is invisible, and insert can still read from the unique index. The index is ignored just for queries like SELECT.
I guess we disallow dropping these indexes (unless the CASCADE
option is used, which then also drops the FK reference in the other table):
[email protected]:26257/defaultdb> DROP INDEX p_a_key;
ERROR: "p_a_key" is referenced by foreign key from table "c"
[email protected]:26257/defaultdb> DROP INDEX p_a_key CASCADE;
DROP INDEX
[email protected]:26257/defaultdb> SHOW CREATE TABLE p;
table_name | create_statement
-------------+---------------------------------------------
p | CREATE TABLE public.p (
| id INT8 NOT NULL,
| a INT8 NULL,
| CONSTRAINT p_pkey PRIMARY KEY (id ASC)
| )
(1 row)
[email protected]:26257/defaultdb> SHOW CREATE TABLE c;
table_name | create_statement
-------------+---------------------------------------------
c | CREATE TABLE public.c (
| id INT8 NOT NULL,
| p_a INT8 NULL,
| CONSTRAINT c_pkey PRIMARY KEY (id ASC)
| )
(1 row)
I don't think we would want to implement a CASCADE
option for invisible indexes, so I propose disallowing making these FK indexes invisible as well.
docs/RFCS/20220628_invisible_index.md
line 18 at r3 (raw file):
its initialization. As for now, primary indexes cannot be invisible. But unique indexes can still be invisible. Specifically, the unique constraint still prevents insertion of duplicates into a column regardless of whether the inedx is invisible.
nit: "index" instead of "inedx"
docs/RFCS/20220628_invisible_index.md
line 49 at r3 (raw file):
This following section will discuss different SQL syntax choices. PostgreSQL does not support invisible indexes yet. We will be using MySQL and Oracle SQL as
nit: It would be good to link to MySQL docs and Oracle docs here.
docs/RFCS/20220628_invisible_index.md
line 66 at r3 (raw file):
[PARTITION BY <partition params>] [WITH <storage_parameter_list>] [WHERE <where_conds...>] *** [INVISIBLE | NOT VISIBLE | VISIBLE | HIDDEN] ***
It would be nice to be compatible with MySQL and Oracle, but I guess we don't really promise that in general. Being internally consistent with our own ALTER COLUMN SET NOT VISIBLE
/ SET VISIBLE
seems more important.
8337897
to
9db8bf6
Compare
e04c59e
to
aaec24a
Compare
aaec24a
to
a37e744
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, @otan, and @wenyihu6)
docs/RFCS/20220628_invisible_index.md
line 29 at r6 (raw file):
With invisible indexes, you can introduce the index as invisible first. In a new session, you could give a workout and observe the impact of the new index by turning `optimizer_use_invisible_indexes` on or with index hinting. If this
nit: this seems to have been renamed to optimizer_use_not_visible_indexes
.
You are right! Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Some nits from me that I noticed when reading through the RFC.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, @otan, @wenyihu6, and @yuzefovich)
docs/RFCS/20220628_invisible_index.md
line 84 at r6 (raw file):
- Force index or index hinting with invisible index is allowed and will override the invisible index feature. - If the index is dropped instead, this will throw an error. - Invisible indexes will be treated as while policing unique or foreign key constraints. In other words, we will temporarily disable the invisible index feature during any constraint check.
nit: probably s/treated as while/treated as visible while/
.
docs/RFCS/20220628_invisible_index.md
line 141 at r6 (raw file):
[errors](https://dev.mysql.com/doc/refman/8.0/en/invisible-indexes.html). Instead, MySQL supports index hinting with the session variable [optimizer_use_not_visible_indexes](https://dev.mysql.com/doc/refman/8.0/en/switchable-optimizations.html#optflag_use-invisible-indexes)
nit: missing period.
docs/RFCS/20220628_invisible_index.md
line 228 at r6 (raw file):
feature and focus on how the optimizer will ignore invisible indexes in general. During exploration, the optimizer will explore every possible query plans using
nit: s/plans/plan/
.
docs/RFCS/20220628_invisible_index.md
line 233 at r6 (raw file):
`pkg/sql/opt/xform/scan_index_iter.go`. This is where we can hide the index away from the optimizer. While enumerating every index, the optimizer can check if the index is invisible and ignore if it is. optimizer is effectively ignoring
nit: s/optimizer/The optimizer/
.
docs/RFCS/20220628_invisible_index.md
line 239 at r6 (raw file):
Second, let’s think about what happens when force index is used with invisible index. Force index will override the invisible index feature. We will just need to check if the flag for force index is set before ignore invisible indexes
nit: s/before ignore/before ignoring/
docs/RFCS/20220628_invisible_index.md
line 427 at r6 (raw file):
Related: https://github.com/cockroachdb/cockroach/issues/82363 3. e can consider introducing another session variable or another type of
nit: s/e can/We can/
This commit adds an RFC for the invisible index feature. Related issue: cockroachdb#72576, cockroachdb#82363 Release justification: low risk to the existing functionality; this commit just adds rfc. Release Note: none
a37e744
to
8c4d31f
Compare
Thanks for looking at this pr as part of the QA! |
Should the invisible indexes still influence the decision which columns we collect statistics on? I believe so, and I think we can get this behavior already but thought it'd be worth to confirm with folks. (For context, we collect stats on all columns in the secondary indexes which is determined in |
I think so. Stats are collected on invisible indexes as well. And this aligns with what MySQL and Oracle do. This is not officially documented on their websites; the link for Oracle was found on a blog post. I played around with MySQL(see screenshot below); stats from invisible indexes are still collected to determine cardinality estimates. |
TFTRs! bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 4 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 5 of 0 LGTMs obtained
This commit adds an RFC for the invisible index feature.
Related issue: #72576, #82363
Release justification: low risk to the existing functionality; this commit just
adds rfc.
Release Note: none