-
-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change comparisons in cursor-based pagination #4287
Conversation
query plan still dum |
thanks so much for your effort. did you try the tuple comparison? I don't think it should need much new index(es) because almost all the sorts are purely descending ( |
Query plan not dum. Me dum. I increased the number of posts, and now it uses the index for all pagination filters (except post_id which is not in the index). I tried tuple comparison on one of the sorts, and that makes it use 1 index scan instead of 3, which is more simple, so it might be somewhat faster, but it would not affect the amount of index-scanned rows. I don't think it's worth the trouble of doing the ugly stuff needed for tuple comparisons with mixed sorting directions, but I could make tuple comparison automatically done whenever possible in the pagination library I'm working on. I tried it again without this PR but still with many posts, and confirmed that it previously couldn't use the index in this way, so this PR will improve performance a lot.
|
just fyi, a bitmap index scan is not a "real" index scan depending on definitions and can be much worse than a normal index scan. "The bitmap is one bit per heap page. The bitmap index scan sets the bits based on the heap page address that the index entry points to. So when it goes to do the bitmap heap scan, it just does a linear table scan, reading the bitmap to see whether it should bother with a particular page or seek over it." https://dba.stackexchange.com/questions/119386/understanding-bitmap-heap-scan-and-bitmap-index-scan to test you should probably also put at least 1 millions rows in your table because otherwise the results may be completely different from the real world (PG knows scanning through 1000 rows doesn't matter so it does it) |
The index scans are oddly inconsistent. This is with 999999 posts, each with different timestamps:
The command used (#4285):
|
Index scan with only community id in condition is still really bad because that means it reads all rows from the community into memory. condition should be doing 99% of the filtering, filter should ideally be empty. You can tell by "filtered rows" being the number ofposts in the community 2 you're looking at - 10. That was exactly why I added that prefetch/upper boind function because PG was not able to understand the hot posts query enough and the index filters were making it fetch all community posts into RAM. (should be in the discussion in #3872 ) also remember that the expensive queries are the ones looking at subscribed where it has to be able to filter by multiple community IDs simultaneously |
Replaced by #4320 |
Currently, posts are compared with the cursor with a condition like this:
This PR changes it to:
This should make better use of the index. At worst it should be 3 very efficient index scans that are combined together. The 3 comparison groups closely resemble examples of index-friendly conditions in the PostgreSQL manual. Because they are combined with OR, the scan of items that aren't included in the final result should only be caused by other filters that don't use the index, such as deleted = false.