-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfcs: partial indexes #48557
rfcs: partial indexes #48557
Conversation
4e6b352
to
06129d5
Compare
37e6732
to
1e4dc25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice RFC! I didn't carefully review, but the execution portions look good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Avoid overhead of updating an index for rows that are mutated and don't | ||
satisfy the predicate. | ||
- Reduce the number of rows examined when scanning. | ||
- Provide a mechanism for ensuring uniqueness on a subset of rows in a table, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I hadn't thought of this benefit.
|
||
("=>" means "implies") | ||
|
||
atom A => atom B if: A contains B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we use constraint.Span
to do atom "contains" comparisons? Seems like the Span
might be good way to represent atoms, even if we can't reuse the other objects. This also has a ton of overlap with Sumeer's inverted index representation, which also has spans as "atoms", and then allows AND/OR to arbitrarily compose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I was originally thinking of using constraint.Constraint
, but we only need constraint.Span
if we determine the columns being operated on are the same. I'll update the RFC.
However, I'm just now realizing there are some tricky cases involving NULL
that we may have to handle without the aid of constraint.Span
.
For example, imagine these predicates:
a IS NOT NULL
Postgres proves that the following filters imply this:
a IS NOT NULL
a = 1
a > 3
(a, b) IS NULL
Postgres proves that the following filters imply this:
(a, b) IS NULL
a IS NULL and b IS NULL
(a, b) IS NOT NULL
Postgres proves that the following filters imply this:
(a, b) IS NOT NULL
a IS NOT NULL AND b IS NOT NULL
a = 1 AND b = 2
a > 5 and b < 7
For the tuple cases, it might be pragmatic to only handle cases when an atom matches exactly (pointer equality), and leave it at that until we hear from a customer. Heck, it might even be good to push users towards using an equivalent predicate expression that's easier to comprehend, like a IS NOT NULL AND b IS NOT NULL
.
Also, the PR for fixing <tuple> IS NULL
logic may make (2) and (3) easier to deal with, if we normalize-away the tuples: #48299 (review)
Postgres's [predtest library](https://github.com/postgres/postgres/blob/c9d29775195922136c09cc980bb1b7091bf3d859/src/backend/optimizer/util/predtest.c#L251-L287) | ||
uses this method to determine if a partial index can be used to satisfy a query. | ||
The logic Postgres uses for testing implication of conjunctions, disjunctions, | ||
and "atoms" (anything that is not an `AND` or `OR`) is as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does PG's predtest library handle tuple comparisons, like (a,b)>(1,2)
? In particular, I'm wondering if it handles atoms having multiple columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like they only handle the case when the tuple "atom" expression is an exact match, except for <tuple> IS NULL
and <tuple> IS NOT NULL
mentioned in my other comment.
marcus=# create table t (k int, a int, b int);
CREATE TABLE
marcus=# create index i on t (k) where (a, b) >= (1, 2);
CREATE INDEX
marcus=# explain select k from t where (a, b) >= (1, 2) and k = 1;
QUERY PLAN
----------------------------------------------------------------
Bitmap Heap Scan on t (cost=4.13..11.26 rows=3 width=4)
Recheck Cond: ((k = 1) AND (ROW(a, b) >= ROW(1, 2)))
-> Bitmap Index Scan on i (cost=0.00..4.13 rows=3 width=0)
Index Cond: (k = 1)
(4 rows)
marcus=# explain select k from t where (a, b) > (1, 2) and k = 1;
QUERY PLAN
--------------------------------------------------
Seq Scan on t (cost=0.00..45.70 rows=3 width=4)
Filter: ((k = 1) AND (ROW(a, b) > ROW(1, 2)))
(2 rows)
marcus=# explain select k from t where a >= 1 and b >= 2 and k = 1;
QUERY PLAN
--------------------------------------------------
Seq Scan on t (cost=0.00..45.70 rows=1 width=4)
Filter: ((a >= 1) AND (b >= 2) AND (k = 1))
(2 rows)
marcus=# explain select k from t where a > 3 and b > 4 and k = 1;
QUERY PLAN
--------------------------------------------------
Seq Scan on t (cost=0.00..45.70 rows=1 width=4)
Filter: ((a > 3) AND (b > 4) AND (k = 1))
(2 rows)
|
||
## Mutation | ||
|
||
Partial indexes only index rows that satisfy the partial index's predicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One tricky thing to make sure you consider is the effect of a partial index that is currently being added or removed and/or if an existing partial index is having columns added or removed. It should follow the same patterns as regular indexes, but make a special point of carefully testing those kinds of variations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @awoods187, @mgartner, and @RaduBerinde)
docs/RFCS/20200507_partial_indexes.md, line 66 at r1 (raw file):
1. They must result in a boolean. 2. They can only refer to columns in the table being indexed. 3. Functions used within predicates cannot be impure. For example, `now()` is
[nit] since we've moved from impure to immutable/stable/volatile, maybe this should be changed to "must be immutable"?
docs/RFCS/20200507_partial_indexes.md, line 171 at r1 (raw file):
In addition, we'll need to update exploration rules for zig-zag joins and inverted index scans.
What about lookup joins? Is it possible to use a partial index as the lookup index?
docs/RFCS/20200507_partial_indexes.md, line 190 at r1 (raw file):
The statistics builder must take into account the predicate expression, in addition to the filter expression, when generating statistics for partial index
[nit] a partial index scan
docs/RFCS/20200507_partial_indexes.md, line 221 at r1 (raw file):
columns both in the partial index column set and in the partial index predicate. In a series of examples, this proved to have an insignificant effect on the resulting estimate.
I think we may want to try to combine the predicates before calculating selectivity to get a more accurate estimate. For example, if the index is on units_sold > 1000
but the filter is on units_sold > 1200
, we probably want to calculate the selectivity of units sold > 1200
only, not (selectivity of units_sold > 1000
) * (selectivity of units_sold > 1200
) . Since we currently use constraints for this calculation, this would be as simple as performing an intersection of the constraints before calculating the selectivity.
docs/RFCS/20200507_partial_indexes.md, line 228 at r1 (raw file):
expression. In order to maintain this property, `INSERT`s, `UPDATE`s, and `DELETE`s to a table must update the partial index in the event that they change the candidacy of a row.
[nit] it's not just candidacy, right? You still need to update the row even if it remains in the index.
docs/RFCS/20200507_partial_indexes.md, line 301 at r1 (raw file):
- [ ] Update the statistics builder to account for the selectivity of the partial index predicate. - [ ] Add more advance implication logic for filter and predicate expressions.
advance -> advanced
docs/RFCS/20200507_partial_indexes.md, line 303 at r1 (raw file):
- [ ] Add more advance implication logic for filter and predicate expressions. - [ ] Add support in other index exploration rules: - [ ] GenerateInvertedIndexScans
Are there any special considerations needed for inverted indexes?
1e4dc25
to
cefd371
Compare
cefd371
to
33c0e7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, and @rytaft)
docs/RFCS/20200507_partial_indexes.md, line 66 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] since we've moved from impure to immutable/stable/volatile, maybe this should be changed to "must be immutable"?
Done.
docs/RFCS/20200507_partial_indexes.md, line 190 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] a partial index scan
Done.
docs/RFCS/20200507_partial_indexes.md, line 301 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
advance -> advanced
Done.
33c0e7b
to
1606d7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, and @rytaft)
docs/RFCS/20200507_partial_indexes.md, line 228 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] it's not just candidacy, right? You still need to update the row even if it remains in the index.
Yes you're right. But only in the case that the value of the index columns changes.
I added a comment here and rewrote the Update section below to make this more clear.
1606d7c
to
2f9ce42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, and @rytaft)
docs/RFCS/20200507_partial_indexes.md, line 303 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Are there any special considerations needed for inverted indexes?
I don't foresee needing special considerations, at the moment. But there could very well be some gotchas. AFAIK As long as the filter matches the predicate, the partial inverted index can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @mgartner, @RaduBerinde, and @rytaft)
docs/RFCS/20200507_partial_indexes.md, line 16 at r2 (raw file):
Partial indexes are a common feature in RDBMSs. They can be beneficial in multiple use-cases. Partial indexes can:
might be worth upleveling the overview. at a high level, partial indexes let you get the same read performance as regular indexes with improve write performance provided you only need strong read performance on predicate values
docs/RFCS/20200507_partial_indexes.md, line 22 at r2 (raw file):
- Avoid overhead of updating an index for rows that are mutated and don't satisfy the predicate. - Reduce the number of rows examined when scanning.
this shouldnt be any different when compared to a normal index though right?
docs/RFCS/20200507_partial_indexes.md, line 52 at r2 (raw file):
Only the last query can utilize
popular_products
. Its filter expression,
utilize the partial index popular_product
docs/RFCS/20200507_partial_indexes.md, line 58 at r2 (raw file):
expression. Attempting to force a partial index to be used for a query that does not imply
it might be good to introduce the concept that the optimizer will attempt to use the partial index by default but explain when you may need to force a partial index. Then you can mention that you can force a partial index like a normal index using the hint syntax @. Although to be honest I'm struggling to think of a case in which you would want to force a partial index that wouldn't be identifiable by the CBO
docs/RFCS/20200507_partial_indexes.md, line 77 at r2 (raw file):
## Parsing In order to ensure that predicates are valid (e.g. they result in booleans and
nit e.g.,
docs/RFCS/20200507_partial_indexes.md, line 87 at r2 (raw file):
query must _imply_ that the partial index predicate is true. If the predicate is not provably true, the rows to be returned may not exist in the partial index, and it cannot be used.
its implied and kinda obvious but perhaps worth noting the CBO will try to use another index and if none are available the PK
docs/RFCS/20200507_partial_indexes.md, line 176 at r2 (raw file):
The time complexity of this check is `O(P * F)`, where `P` is the number of nodes in the predicate expression and `F` is the number of nodes in the filter expression.
is there ever the case where a partial index is more expensive than a full index for reads if this time complexity check is too long?
docs/RFCS/20200507_partial_indexes.md, line 242 at r2 (raw file):
predicate. In a series of examples, this proved to have an insignificant effect on the resulting estimate.
I assume this would extensible in the future if we found uses cases in which we needed the additional complexity?
docs/RFCS/20200507_partial_indexes.md, line 292 at r2 (raw file):
value of the indexed columns is not changing, there is no need to update the index.
What happens when a PK change occurs? I assume partial indexes are like normal secondary indexes and get re-written but its more mentioning.
Similarly, is there any interaction with partitioning? I assume you can partition a partial index just like a secondary index?
Can you set up a partial index with a where clause that has the same locality as the compound primary key in a partitioned or replicated table? If so would this be an index this is only built/stored in that locality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @mgartner, @RaduBerinde, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 303 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I don't foresee needing special considerations, at the moment. But there could very well be some gotchas. AFAIK As long as the filter matches the predicate, the partial inverted index can be used.
We'll probably need to think about the interaction of partial inverted indexes with the new invertedexpr.SpanExpression
that @sumeerbhola is building. For example, even if we can confirm that the filter condition implies the index predicate, we might need to remove any sub-expressions from the filter that aren't part of the partial index. Probably not necessary to add details to this RFC, but it's something that I'm sure we'll have to wrestle with a bit at some point...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @mgartner, @RaduBerinde, and @rytaft)
docs/RFCS/20200507_partial_indexes.md, line 303 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
We'll probably need to think about the interaction of partial inverted indexes with the new
invertedexpr.SpanExpression
that @sumeerbhola is building. For example, even if we can confirm that the filter condition implies the index predicate, we might need to remove any sub-expressions from the filter that aren't part of the partial index. Probably not necessary to add details to this RFC, but it's something that I'm sure we'll have to wrestle with a bit at some point...
I think the decomposition rules stated earlier are general enough to apply to any type of data. So we would use the same logic to decide whether
user expression => inverted index predicate
What is special about the inverted index is the row structure, which is orthogonal.
docs/RFCS/20200507_partial_indexes.md, line 147 at r2 (raw file):
("=>" means "implies") atom A => atom B if: A contains B
Is an atom
a single column set, or a multi-column set -- I was imagining single column, but the mention of constraint.Span
which allows multi-column made me less sure.
docs/RFCS/20200507_partial_indexes.md, line 248 at r2 (raw file):
expression. In order to maintain this property, `INSERT`s, `UPDATE`s, and `DELETE`s to a table must update the partial index in the event that they change the candidacy of a row. Partial indexes must also be updated if an `UPDATE`d row
(just trying to make sure I understood this correctly)
for UPDATE, unlike a regular index update, that only happens if one of the indexed columns is changed, here if any of the columns involved in a partial index predicate changes we need to read all the columns needed for that predicate computation (potentially from different column families) and then compute the predicate and if changed from false => true write to the index (which may actually be a different set of columns from the ones updated or needed for the predicate computation)? So a user could construct partial index predicates that are actually quite expensive in terms of the read cost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @bdarnell, @mgartner, @RaduBerinde, @rytaft, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 58 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Brilliant. I've added this approach and why its required to the doc.
SGTM
docs/RFCS/20200507_partial_indexes.md, line 176 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Scanning a partial index is not guaranteed to always be more efficient than scanning a full index. Here's an example:
CREATE TABLE orders (id INT PRIMARY KEY, user_id INT, price INT) CREATE INDEX orders_user_id ON orders (user_id) CREATE INDEX orders_price_20s ON orders (id) WHERE price >= 20 and price < 30 SELECT COUNT(*) FROM orders WHERE price >= 20 AND price < 30 AND user_id = 123
In this case, it'd likely be more selective to use the
orders_user_id
to retrieve the orders for user 123 and then filter by price afterwords, rather than usingorders_price_20s
and retrieving all the orders from all users in the price range, and then filtering by user id.However, I expect this check to be very fast. It's proportional to the number of expressions in the predicate and filter expressions. I doubt that performing this check would ever tip the scales such that the cost of this check plus scanning the partial index become more costly than no-check (ignore partial indexes) plus a full index. In other words, the cost of this check should always be very low, and the potential gains of using a partial index vs a full index could be very high.
that makes sense--that's helpful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @bdarnell, @mgartner, @RaduBerinde, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 221 at r1 (raw file):
just for my curiosity, is that the correlated columns and FD_strength_scaled commit?
It's actually this one: 25d515d (the second commit in that PR).
because the column is completely correlated with itself, so one can't multiply selectivities, which requires independence, yes?
Yea, that's a good way to look at it.
e2d8f17
to
836e598
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, @rytaft, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 147 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Good idea. I was originally thinking of using
constraint.Constraint
, but we only needconstraint.Span
if we determine the columns being operated on are the same. I'll update the RFC.However, I'm just now realizing there are some tricky cases involving
NULL
that we may have to handle without the aid ofconstraint.Span
.For example, imagine these predicates:
a IS NOT NULL
Postgres proves that the following filters imply this:
a IS NOT NULL
a = 1
a > 3
(a, b) IS NULL
Postgres proves that the following filters imply this:
(a, b) IS NULL
a IS NULL and b IS NULL
(a, b) IS NOT NULL
Postgres proves that the following filters imply this:
(a, b) IS NOT NULL
a IS NOT NULL AND b IS NOT NULL
a = 1 AND b = 2
a > 5 and b < 7
For the tuple cases, it might be pragmatic to only handle cases when an atom matches exactly (pointer equality), and leave it at that until we hear from a customer. Heck, it might even be good to push users towards using an equivalent predicate expression that's easier to comprehend, like
a IS NOT NULL AND b IS NOT NULL
.Also, the PR for fixing
<tuple> IS NULL
logic may make (2) and (3) easier to deal with, if we normalize-away the tuples: #48299 (review)
We talked more and multi-column spans should help with some of this, but there may be special considerations for null cases or others. I've added a sentence or two about this to the RFC.
docs/RFCS/20200507_partial_indexes.md, line 171 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
What about lookup joins? Is it possible to use a partial index as the lookup index?
Yes it should be possible, as long as we can prove that every row being looked up will exist in the partial index. This could be difficult for more complex join expressions.
For example:
CREATE TABLE ab (a INT PRIMARY KEY, b INT)
CREATE TABLE xy (x INT PRIMARY KEY, y INT, z INT)
CREATE INDEX i ON xy (y) WHERE z >10
The following join and filter make it fairly obvious to see that the index i
could be used for a lookup-join.
SELECT * FROM ab JOIN xy ON a = y WHERE z = 10
But for this query, it may be harder to prove that i
can be used:
SELECT * FROM ab JOIN xy ON a = y AND b = z WHERE b = 10
I've added a work item to the list for utilizing partial indexes during lookup joins.
docs/RFCS/20200507_partial_indexes.md, line 147 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I was imagining single columns originally, but in order to support more complex tuple comparisons, we will need multi-column sets. The being said, Postgres's support for proving containment
Looks like my comment got cut off. I believe I wrote something like "The being said, Postgres's support for proving containment for multi-column atoms is fairly simple. It doesn't support complex proofs of implication for them."
docs/RFCS/20200507_partial_indexes.md, line 292 at r2 (raw file):
What happens when a PK change occurs? I assume partial indexes are like normal secondary indexes and get re-written but its more mentioning.
Correct. I've added a section mentioning this so that it is explicit.
Similarly, is there any interaction with partitioning? I assume you can partition a partial index just like a secondary index?
Can you set up a partial index with a where clause that has the same locality as the compound primary key in a partitioned or replicated table? If so would this be an index this is only built/stored in that locality?
That should be possible, yes. I need to do some more research and figure out if there are special considerations that need to be made in order to support partial partitioned indexes.
docs/RFCS/20200507_partial_indexes.md, line 83 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Probably -- we produce an error in other cases when the hinted plan is not possible. For example, if a lookup join was hinted but not possible, we return the following error: "ERROR: could not produce a query plan conforming to the LOOKUP JOIN hint".
I wasn't aware that an error was produce in that case. Thanks for pointing it out.
In those lookup join hint cases, is it possible that placeholder values can change whether the error is produced or not?
That is what persuaded this decision for partial indexes. It seemed odd to have a query statement with some placeholders that could error or not depending on those values. It puts the responsibility on the user to catch the error in the application and retry without the hint, or prove that the partial index can be used in the application layer before adding the hint. Both of those seem non-ideal.
But I'm not sure how index hints are used in the real world. Are they used in production applications, or for testing query plans and indexes?
@RaduBerinde and @awoods187 looping you in from the discussion we had on this elsewhere. Any new thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 171 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Yes it should be possible, as long as we can prove that every row being looked up will exist in the partial index. This could be difficult for more complex join expressions.
For example:
CREATE TABLE ab (a INT PRIMARY KEY, b INT) CREATE TABLE xy (x INT PRIMARY KEY, y INT, z INT) CREATE INDEX i ON xy (y) WHERE z >10
The following join and filter make it fairly obvious to see that the index
i
could be used for a lookup-join.SELECT * FROM ab JOIN xy ON a = y WHERE z = 10
But for this query, it may be harder to prove that
i
can be used:SELECT * FROM ab JOIN xy ON a = y AND b = z WHERE b = 10
I've added a work item to the list for utilizing partial indexes during lookup joins.
Sounds good.
docs/RFCS/20200507_partial_indexes.md, line 83 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I wasn't aware that an error was produce in that case. Thanks for pointing it out.
In those lookup join hint cases, is it possible that placeholder values can change whether the error is produced or not?
That is what persuaded this decision for partial indexes. It seemed odd to have a query statement with some placeholders that could error or not depending on those values. It puts the responsibility on the user to catch the error in the application and retry without the hint, or prove that the partial index can be used in the application layer before adding the hint. Both of those seem non-ideal.
But I'm not sure how index hints are used in the real world. Are they used in production applications, or for testing query plans and indexes?
@RaduBerinde and @awoods187 looping you in from the discussion we had on this elsewhere. Any new thoughts?
I think placeholders could theoretically change whether or not the error is produced for lookup joins, since we don't trigger that error until execbuilder time (i.e., after placeholders have been substituted). I don't know of an example off hand where the placeholder value would sometimes error and sometimes not, though....
Ideally index hints would just be used for testing, but there are definitely some customers that use them in production if the optimizer would otherwise choose the wrong plan.
docs/RFCS/20200507_partial_indexes.md, line 324 at r4 (raw file):
### Primary Key Changes If a primary key change occurs, the partial index will need to be rewritten to
[nit] to so -> so
836e598
to
efe6123
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, @rytaft, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 324 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] to so -> so
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 83 at r3 (raw file):
A placeholder-dependent error wouldn't be a great experience, and I think I'd prefer to issue the error unless the query can be statically determined to be satisfiable with the partial index independent of any placeholder value.
I'm assuming it would always be possible to write the query to do this, such as WHERE col > $1 AND col > 100
. This would trade the error for possibly missing results, but at least it would make things explicit for the user.
But I'm not sure how index hints are used in the real world. Are they used in production applications, or for testing query plans and indexes?
I have no real-world data, but they're generally used by users who (rightly or wrongly) believe they can plan their query better than the optimizer (I'm one of these users; given the choice I'd rather write my own plan tree than a SQL statement). In OLAP situations, this could mean that our optimizer isn't choosing the right plan without the hint and this puts it on the right track. In OLTP applications, it's more of an assertion about worst-case running time ("this page has an SLA to return in 250ms, so I know that this query must hit an index and be processed in O(1) time, and I'd rather get an error than have it start down an O(n) path").
With that in mind, errors that are dependent on placeholder values aren't great, but they're not the worst outcome. The hint could then be a promise from the user to use a placeholder value that makes the partial index work, and it's ok to return an error if they give a value out of bounds.
6c93ce6
to
ed78903
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, @rytaft, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 221 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
just for my curiosity, is that the correlated columns and FD_strength_scaled commit?
It's actually this one: 25d515d (the second commit in that PR).
because the column is completely correlated with itself, so one can't multiply selectivities, which requires independence, yes?
Yea, that's a good way to look at it.
I've rewritten this section so that the exact statistics calculations are TBD. I've also included a few sentences and examples about the special case mentioned by @rytaft above where the predicate includes indexed columns, so that it is not forgotten.
docs/RFCS/20200507_partial_indexes.md, line 292 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
What happens when a PK change occurs? I assume partial indexes are like normal secondary indexes and get re-written but its more mentioning.
Correct. I've added a section mentioning this so that it is explicit.
Similarly, is there any interaction with partitioning? I assume you can partition a partial index just like a secondary index?
Can you set up a partial index with a where clause that has the same locality as the compound primary key in a partitioned or replicated table? If so would this be an index this is only built/stored in that locality?
That should be possible, yes. I need to do some more research and figure out if there are special considerations that need to be made in order to support partial partitioned indexes.
I've added a work item to add full support for partitioned partial indexes, so we don't lose track of it.
ed78903
to
59df835
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaduBerinde @ajwerner I added a work item to "consider using partial indexes for auto-generated indexes used for foreign keys.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, @rytaft, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 319 at r6 (raw file):
column, the selectivity of each is _not_ independent. The number of rows examined is approximately the total number of rows in the table, multiplied by the selectivity of `[/201 - ]`. It would be inaccurate to naively to instead
[nit] to naively to instead
59df835
to
9e2469f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, @rytaft, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 319 at r6 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] to naively to instead
Done.
9e2469f
to
9e2abb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r7.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 292 at r2 (raw file): Previously, mgartner (Marcus Gartner) wrote…
awesome! im excited about this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, and @sumeerbhola)
9949abe
to
3788946
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @awoods187, @mgartner, @RaduBerinde, @rytaft, and @sumeerbhola)
docs/RFCS/20200507_partial_indexes.md, line 83 at r3 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
A placeholder-dependent error wouldn't be a great experience, and I think I'd prefer to issue the error unless the query can be statically determined to be satisfiable with the partial index independent of any placeholder value.
I'm assuming it would always be possible to write the query to do this, such as
WHERE col > $1 AND col > 100
. This would trade the error for possibly missing results, but at least it would make things explicit for the user.But I'm not sure how index hints are used in the real world. Are they used in production applications, or for testing query plans and indexes?
I have no real-world data, but they're generally used by users who (rightly or wrongly) believe they can plan their query better than the optimizer (I'm one of these users; given the choice I'd rather write my own plan tree than a SQL statement). In OLAP situations, this could mean that our optimizer isn't choosing the right plan without the hint and this puts it on the right track. In OLTP applications, it's more of an assertion about worst-case running time ("this page has an SLA to return in 250ms, so I know that this query must hit an index and be processed in O(1) time, and I'd rather get an error than have it start down an O(n) path").
With that in mind, errors that are dependent on placeholder values aren't great, but they're not the worst outcome. The hint could then be a promise from the user to use a placeholder value that makes the partial index work, and it's ok to return an error if they give a value out of bounds.
I've decided for now on using the hinted partial index if the filter implies the predicate, and erring if the filter does not provably imply the predicate.
Release note: None
3788946
to
41a1f4b
Compare
bors r+ |
Build succeeded |
Release note: None