Partial indexes #8242

ericharmeling · 2020-09-03T19:52:47Z

Added a new page for Partial Indexes.
Updated CREATE INDEX syntax diagram and parameters.

This PR will likely fix some future "opt" release note issues.

cockroach-teamcity · 2020-09-03T19:52:54Z

This change is

cockroach-teamcity · 2020-09-03T19:56:22Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/e489fe226aaf345d86646d69a0acbf3ac47981e2/

Edited pages:

cockroach-teamcity · 2020-09-08T20:24:18Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/7a895d983a575bbac5158731a94f0002991c02f1/

Edited pages:

mgartner · 2020-09-08T20:35:30Z

Not related to your changes, but I noticed that it isn't mentioned that inverted indexes can be created on ARRAY types here:

mgartner

Looks great!!!

I left a few comments. Other than those, I think the only other thing potentially missing is that the optimizer is not perfect in proving that some query filters imply partial index predicates. From the RFC:

Note that CRDB, like Postgres, will perform a best-effort attempt to prove that a query filter expression implies a partial index predicate. It is not guaranteed to prove implication of arbitrarily complex expressions.

In other words, false negatives are possible (where a filter theoretically implies a predicate, but cannot be proven by the optimizer in practice). It should be very unlikely, but it is possible, and calling it out in our docs may help prevent confusion. Here's one example:

[email protected]:58391/defaultdb> CREATE TABLE t (a INT, b INT, c INT, INDEX (a) WHERE b = 1 OR c = 2 OR b = 3);
CREATE TABLE

Server Execution Time: 4.515ms
Network Latency: 1.406ms

[email protected]:58391/defaultdb> EXPLAIN SELECT a FROM t WHERE b IN (1, 3) OR c = 2;
    tree    |     field     |       description
------------+---------------+---------------------------
            | distribution  | full
            | vectorized    | false
  filter    |               |
   │        | filter        | (b IN (1, 3)) OR (c = 2)
   └── scan |               |
            | missing stats |
            | table         | t@primary
            | spans         | FULL SCAN
(8 rows)

mgartner · 2020-09-08T20:48:33Z

v20.2/sql-feature-support.md

 Multi-column indexes | ✓ | Common Extension | We do not limit on the number of columns indexes can include
 Covering indexes | ✓ | Common Extension | [Storing Columns documentation](create-index.html#store-columns)
 Inverted indexes | ✓ | Common Extension | [Inverted Indexes documentation](inverted-indexes.html)
+ Partial indexes | ✓ | Common Extension | [Partial indexes documentation](partial-indexes.html)
 Multiple indexes per query | Planned | Common Extension | Use multiple indexes to filter the table's values for a single query


mgartner · 2020-09-08T20:59:37Z

v20.2/partial-indexes.md

+
+- They contain fewer rows than full indexes, making them less expensive to create and store on a cluster.
+- Read queries on rows included in a partial index only scan the rows in the partial index. This contrasts with queries on columns in full indexes, which must scan all rows in the indexed column.
+- Write queries on rows implied by a partial index only modify rows in the partial index. This contrasts with write queries on columns in full indexes, which must modify the larger set of rows that make up a full-column index.


I think this last bullet point is confusing. The advantage of partial indexes in regards to writes is the the overhead of writing to an index is only incurred for rows that must be added or removed from the partial index, whereas a non-partial index incurs this overhead for every row. For example, if we have an INDEX (a) WHERE b = 'foo', and we INSERT INTO t (a, b) VALUES (1, 'bar'), there is no overhead of writing to the partial index because that row does not belong.

Would something like below be more clear?

With a partial index, write queries only incur the overhead of an index write when the row satisfies the predicate. This contrasts with full indexes, which incur the overhead of an index write for all rows when the indexed column is modified.

mgartner · 2020-09-08T21:03:09Z

v20.2/partial-indexes.md

+- [Functions](functions-and-operators.html) used in predicates must be immutable. For example, the `now()` function is not allowed in predicates because its value depends on more than its arguments.
+
+{{site.data.alerts.callout_info}}
+Partial indexes cannot be created at [table creation](create-table.html).


It should be possible to create them in a CREATE TABLE statement. Let me know if you ran into a case that didn't work.

[email protected]:58391/defaultdb> create table t (a int, index (a) where a > 0); CREATE TABLE Server Execution Time: 3.143ms Network Latency: 998µs [email protected]:58391/defaultdb> show create table t; table_name | create_statement -------------+------------------------------------------------ t | CREATE TABLE public.t ( | a INT8 NULL, | INDEX t_a_idx (a ASC) WHERE a > 0:::INT8, | FAMILY "primary" (a, rowid) | ) (1 row) Server Execution Time: 5.263ms Network Latency: 231µs

mgartner · 2020-09-08T21:08:19Z

v20.2/partial-indexes.md

+{{site.data.alerts.end}}
+
+{{site.data.alerts.callout_info}}
+CockroachDB returns an error if there are multiple unique or exclusion constraints matching the `ON CONFLICT` specification. See [tracking issue](https://github.com/cockroachdb/cockroach/issues/53170).


This is only the case for ON CONFLICT ... DO UPDATE, but not for ON CONFLICT ... DO NOTHING. There should be no issues with ON CONFLICT ... DO NOTHING.

We'll probably also want to document the new WHERE clause syntax in the INSERT ON CONFLICT statement. There's some examples here. This is particularly confusing, so I'm happy to explain more.

Ahh I see now that you have this correct below in "Known Limitations". I think this Note should include the "DO UPDATE" clarification or be removed.

mgartner · 2020-09-08T21:45:08Z

v20.2/partial-indexes.md

+
+{% include copy-clipboard.html %}
+~~~ sql
+> CREATE INDEX ON rides (city, revenue) WHERE revenue > 80;


[nit] there is no query plan in this section that takes advantage of revenue as an indexed column. If you added another example like SELECT * FROM rides WHERE city = 'new york' AND revenue >= 100 AND revenue < 150, the query plan should be a constrained scan over the partial index, rather than a FULL SCAN.

I'm not sure it's necessary but it might be a nice example to highlight.

ericharmeling

TFTR, @mgartner!

I think I addressed all of your comments. I also added a note about the false negatives. I don't think we need to point out an example, but I agree that adding a disclaimer note will be helpful. Those changes are all in the "mgartner feedback" commit.

re: docs unrelated to partial indexes, I've added some simple updates to separate commits. I'd prefer to have separate PRs for unrelated docs updates, especially if they are more involved, but separate commits for these small updates should be fine.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)

v20.2/partial-indexes.md, line 19 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I think this last bullet point is confusing. The advantage of partial indexes in regards to writes is the the overhead of writing to an index is only incurred for rows that must be added or removed from the partial index, whereas a non-partial index incurs this overhead for every row. For example, if we have an INDEX (a) WHERE b = 'foo', and we INSERT INTO t (a, b) VALUES (1, 'bar'), there is no overhead of writing to the partial index because that row does not belong.

Would something like below be more clear?

With a partial index, write queries only incur the overhead of an index write when the row satisfies the predicate. This contrasts with full indexes, which incur the overhead of an index write for all rows when the indexed column is modified.

Gotcha. I rewrote that last bullet, using a lot of your wording.

v20.2/partial-indexes.md, line 58 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

It should be possible to create them in a CREATE TABLE statement. Let me know if you ran into a case that didn't work.

[email protected]:58391/defaultdb> create table t (a int, index (a) where a > 0);
CREATE TABLE

Server Execution Time: 3.143ms
Network Latency: 998µs

[email protected]:58391/defaultdb> show create table t;
  table_name |               create_statement
-------------+------------------------------------------------
  t          | CREATE TABLE public.t (
             |     a INT8 NULL,
             |     INDEX t_a_idx (a ASC) WHERE a > 0:::INT8,
             |     FAMILY "primary" (a, rowid)
             | )
(1 row)

Server Execution Time: 5.263ms
Network Latency: 231µs

Ah! Okay. Removed this note. Didn't run into any issues.

v20.2/partial-indexes.md, line 80 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Ahh I see now that you have this correct below in "Known Limitations". I think this Note should include the "DO UPDATE" clarification or be removed.

I updated the Known Limitations bullet and removed the note.

v20.2/partial-indexes.md, line 149 at r1 (raw file):

there is no query plan in this section that takes advantage of revenue as an indexed column.

I'm not sure I understand what you mean by this. All of the queries filter on revenue. Do you mean "takes advantage of city as an indexed column"?

I added an example that includes the city in the filter clause.

v20.2/sql-feature-support.md, line 80 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Unrelated to partial indexes but I noticed this "Multiple indexes per query" is marked planned. Since cockroachdb/cockroach#2142 was fixed by cockroachdb/cockroach#47094, there is a case where a single query can use multiple indexes (example below). This will be new in 20.2.

[email protected]:58391/defaultdb> create table t (k int primary key, a int, b int, index a_idx (a), index b_idx (b));
CREATE TABLE

Server Execution Time: 2.755ms
Network Latency: 810µs

[email protected]:58391/defaultdb> explain select k from t where a = 10 or b = 20;
          tree         |     field     | description
-----------------------+---------------+--------------
                       | distribution  | local
                       | vectorized    | false
  distinct             |               |
   │                   | distinct on   | k
   └── union all       |               |
        ├── index join |               |
        │    │         | table         | t@primary
        │    └── scan  |               |
        │              | missing stats |
        │              | table         | t@a_idx
        │              | spans         | [/10 - /10]
        └── index join |               |
             │         | table         | t@primary
             └── scan  |               |
                       | missing stats |
                       | table         | t@b_idx
                       | spans         | [/20 - /20]
(17 rows)

Server Execution Time: 154µs
Network Latency: 311µs

I don't see a docs issue opened for this, so I'll just sneak in an update to this table into this PR, making this support "partial" (in a separate commit).

I'd rather document updates for fully in separate PRs. I opened an issue for this: #8260.

cockroach-teamcity · 2020-09-09T19:45:19Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/4bbb4fb4eb6136b134cdbb3f763edac719e35802/

Edited pages:

mgartner

Reviewed 2 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling)

v20.2/partial-indexes.md, line 58 at r1 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

Ah! Okay. Removed this note. Didn't run into any issues.

Should the CREATE TABLE docs be updated so that the index_def syntax graph also has an opt_where_clause at the end?

v20.2/partial-indexes.md, line 149 at r1 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

there is no query plan in this section that takes advantage of revenue as an indexed column.

I'm not sure I understand what you mean by this. All of the queries filter on revenue. Do you mean "takes advantage of city as an indexed column"?

I added an example that includes the city in the filter clause.

Sorry for creating confusion.

This is a step in the right direction. Before, all the spans were FULL SCANs meaning that they scan the entire partial index. This is a useful case, but another useful example that I wanted to highlight would be a scan over just a small part of the partial index.

Now that you've added city = 'new york', you can see that the span is constrained to [/'new york' - /'new york']. ✔️

If you want to take it a step further to show all the indexed columns being used to constrain the spans, you can change the query filter to WHERE city = 'new york' AND revenue >= 100 AND revenue < 150.

Because revenue >= 100 AND revenue < 150 implies revenue > 80, the partial index can be used. But, it will need to still apply the revenue filter to remove rows where revenue is between 80 and 99 and 150 and +infinity. Luckily, the revenue column is the second indexed column, and the first indexed column, city, is constrained to a single value by the city = 'new york' filter. So the scan over the partial index would constrain both indexed columns, city and revenue, with the span [/'new york'/100 - /'new york'/149].

I think this better shows the full potential of a 2-column partial index. But it's up to you if you want to include it. You may want to reword (or remove?) the EXPLAIN SELECT city, revenue FROM rides WHERE revenue > 95; example since that is similar to my suggested example, but it does a FULL SCAN because the first indexed column, city, is not constrained by the query filter—a crucial difference that may be worth calling out if both examples are on the page.

ericharmeling

@mgartner Thanks for iterating on this! Just updated the PR again. See the latest commit.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)

v20.2/partial-indexes.md, line 58 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Should the CREATE TABLE docs be updated so that the index_def syntax graph also has an opt_where_clause at the end?

Good catch! Just added it.

v20.2/partial-indexes.md, line 149 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Sorry for creating confusion.

This is a step in the right direction. Before, all the spans were FULL SCANs meaning that they scan the entire partial index. This is a useful case, but another useful example that I wanted to highlight would be a scan over just a small part of the partial index.

Now that you've added city = 'new york', you can see that the span is constrained to [/'new york' - /'new york']. ✔️

If you want to take it a step further to show all the indexed columns being used to constrain the spans, you can change the query filter to WHERE city = 'new york' AND revenue >= 100 AND revenue < 150.

Because revenue >= 100 AND revenue < 150 implies revenue > 80, the partial index can be used. But, it will need to still apply the revenue filter to remove rows where revenue is between 80 and 99 and 150 and +infinity. Luckily, the revenue column is the second indexed column, and the first indexed column, city, is constrained to a single value by the city = 'new york' filter. So the scan over the partial index would constrain both indexed columns, city and revenue, with the span [/'new york'/100 - /'new york'/149].

I think this better shows the full potential of a 2-column partial index. But it's up to you if you want to include it. You may want to reword (or remove?) the EXPLAIN SELECT city, revenue FROM rides WHERE revenue > 95; example since that is similar to my suggested example, but it does a FULL SCAN because the first indexed column, city, is not constrained by the query filter—a crucial difference that may be worth calling out if both examples are on the page.

Ahh. I see. I've updated the example again!

cockroach-teamcity · 2020-09-11T20:38:33Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/871302997d05831a6e0c244e8d6b226a2a72d6b0/

Edited pages:

mgartner

This looks great!

Reviewed 4 of 4 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained

v20.2/partial-indexes.md, line 149 at r1 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

Ahh. I see. I've updated the example again!

Looks great!

lnhsingh

- just a couple of nits. Nice job!

Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @ericharmeling and @lnhsingh)

v20.2/partial-indexes.md, line 3 at r3 (raw file):

---
title: Partial Indexes
summary: Partial indexes

nit: Add a more descriptive summary. A variation of the first sentence or two of the doc is good

v20.2/partial-indexes.md, line 7 at r3 (raw file):

---

<span class="version-tag">New in v20.2:</span> Partial indexes allow you to specify a subset of rows and columns to add to an [index](indexes.html). Partial indexes include the subset of rows in a table that evaluate to true on a boolean *predicate expression* (i.e. a `WHERE` filter) defined at [index creation](#creation).

nit: add , after i.e.

v20.2/partial-indexes.md, line 186 at r3 (raw file):

~~~

Note that query's `SELECT` statement queries all columns in the `rides` table, not just the indexed columns. As a result, an "index join" is required on both the primary index and the partial index.

Note that query's > Note that the query's

cockroach-teamcity · 2020-09-14T23:12:10Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/c378ac44c26267ec5907c1aa80f1683e4a902176/

Edited pages:

ericharmeling requested a review from mgartner September 3, 2020 19:52

ericharmeling force-pushed the partial-indexes branch from e489fe2 to 7a895d9 Compare September 8, 2020 20:20

mgartner reviewed Sep 8, 2020

View reviewed changes

ericharmeling commented Sep 9, 2020

View reviewed changes

mgartner reviewed Sep 10, 2020

View reviewed changes

ericharmeling force-pushed the partial-indexes branch from 4bbb4fb to 8713029 Compare September 11, 2020 20:35

ericharmeling commented Sep 11, 2020

View reviewed changes

mgartner approved these changes Sep 11, 2020

View reviewed changes

ericharmeling requested a review from lnhsingh September 14, 2020 01:15

lnhsingh reviewed Sep 14, 2020

View reviewed changes

Partial indexes

c378ac4

ericharmeling force-pushed the partial-indexes branch from 8713029 to c378ac4 Compare September 14, 2020 23:08

ericharmeling merged commit ca166f2 into master Sep 14, 2020

ericharmeling deleted the partial-indexes branch September 14, 2020 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial indexes #8242

Partial indexes #8242

ericharmeling commented Sep 3, 2020

cockroach-teamcity commented Sep 3, 2020

cockroach-teamcity commented Sep 3, 2020

cockroach-teamcity commented Sep 8, 2020

mgartner commented Sep 8, 2020

mgartner left a comment

mgartner Sep 8, 2020 •

edited

Loading

mgartner Sep 8, 2020 •

edited

Loading

mgartner Sep 8, 2020

mgartner Sep 8, 2020

mgartner Sep 8, 2020

mgartner Sep 8, 2020

ericharmeling left a comment

cockroach-teamcity commented Sep 9, 2020

mgartner left a comment

ericharmeling left a comment

cockroach-teamcity commented Sep 11, 2020

mgartner left a comment

lnhsingh left a comment

cockroach-teamcity commented Sep 14, 2020

Partial indexes #8242

Partial indexes #8242

Conversation

ericharmeling commented Sep 3, 2020

cockroach-teamcity commented Sep 3, 2020

cockroach-teamcity commented Sep 3, 2020

cockroach-teamcity commented Sep 8, 2020

mgartner commented Sep 8, 2020

mgartner left a comment

Choose a reason for hiding this comment

mgartner Sep 8, 2020 • edited Loading

Choose a reason for hiding this comment

mgartner Sep 8, 2020 • edited Loading

Choose a reason for hiding this comment

mgartner Sep 8, 2020

Choose a reason for hiding this comment

mgartner Sep 8, 2020

Choose a reason for hiding this comment

mgartner Sep 8, 2020

Choose a reason for hiding this comment

mgartner Sep 8, 2020

Choose a reason for hiding this comment

ericharmeling left a comment

Choose a reason for hiding this comment

cockroach-teamcity commented Sep 9, 2020

mgartner left a comment

Choose a reason for hiding this comment

ericharmeling left a comment

Choose a reason for hiding this comment

cockroach-teamcity commented Sep 11, 2020

mgartner left a comment

Choose a reason for hiding this comment

lnhsingh left a comment

Choose a reason for hiding this comment

cockroach-teamcity commented Sep 14, 2020

mgartner Sep 8, 2020 •

edited

Loading

mgartner Sep 8, 2020 •

edited

Loading