-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Parallelize Bitmap Index Scans #21037
Labels
area/ysql
Yugabyte SQL (YSQL)
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Comments
timothy-e
added
area/ysql
Yugabyte SQL (YSQL)
status/awaiting-triage
Issue awaiting triage
labels
Feb 12, 2024
yugabyte-ci
added
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
labels
Feb 12, 2024
timothy-e
added a commit
that referenced
this issue
Mar 12, 2024
Summary: == Bitmap Scans == Bitmap Scans use multiple indexes to answer a query, with only one scan of the main table. Each index produces a “bitmap” indicating which rows of the main table are interesting. Multiple bitmaps can be combined with AND or OR operators to create a final bitmap which is used to collect rows from the main table. In this diff, initial support for Yugabyte bitmap scans is added. Yugabyte bitmap scans: * **have two nodes:** `Bitmap Index Scan` shared with PG, and `YB Bitmap Table Scan`. * **support exact results only**: When PG approaches work_mem, it switches to a lossy bitmap that identifies specific pages of the heap relation to access. When YB approaches work_mem, it simply switches to a full sequential scan. * **support arbitary nesting of logical operands and index conditions**. Assuming the same indexes and support for conditions, any arrangement of ANDs, ORs, and index scans that Postgres supports are also supported by Yugabyte because the code is reused. Note that Yugabyte is currently extremely unlikely to use an `AND` condition because the planner doesn't account for the cost of each bitmap index access being half the cost of a normal index access. (Bitmap index scans only need to access the ybctids, but normal index scans access ybctids + use them to query the main table.) * **allow conditions with `AND` to be used as a filter**. This currently happens for all `AND` conditions, see the above bullet point. But with a hack to the planner, `Bitmap And` and filters are both valid options. * **are controlled by `enable_bitmapscan`** * **are controlled by pg hint plan's `/*+ BitmapScan(tab) */` syntax ** * ** only print the recheck condition when recheck is required**. This is contrary to PG behavior, but since fetching unnecessary rows has a higher cost in YB, it’s better to clearly communicate that. * **show distributed storage stats when `EXPLAIN (ANALYZE, DIST)` is run**. `Storage Index Read Requests` are shown under `Bitmap Index Scans` on a secondary index, `Storage Table Read Requests` on`Bitmap Index Scans` on a primary index or `YB Bitmap Table Scans`. * **do not use unbounded memory**. Similar to Postgres, each bitmap is bounded by `work_mem` (although PG follows this bound loosely, i.e. if it can't lossify the bitmap enough it will exceed it). In the current draft, if Yugabyte exceeds work_mem, we stop collecting ybctids and run a full table scan. There is some output in the explain plan to indicate this occurred. * **output average ybctid size when `EXPLAIN (ANALYZE, DEBUG)` is run**. This can be used to get an estimate of how much memory a bitmap scan might take up. Yugabyte bitmap scans do **not** currently: * **have an accurate cost model:** [[ #20573 | #20573 ]] * **spill to disk when exceeding work_mem:** [[ #20576 | #20576 ]] * **parallelise bitmap table scans**: [[ #20575 | #20575 ]] * **support GIN indexes**: [[ #20574 | #20574 ]] * **skip fetching ybctids from YB Bitmap Table Scan**: [[ #21036 | #21036 ]] * **parallelise bitmap index scans**: [[ #21037 | #21037 ]] See the design document here: https://docs.google.com/document/d/1pRUsXZDywzrZTQpTfXZNTZckh-8tTtkfOcurUoybTr0/edit?usp=sharing Jira: DB-2547 Test Plan: ```lang=sh ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressYbBitmapScans' ``` Stress test on a 10 million row table: ```lang=sql CREATE TABLE large (a INT, b INT); CREATE INDEX ON large (a ASC); CREATE INDEX ON large (b ASC); INSERT INTO large SELECT i, i FROM generate_series(1, 10000000) i; ``` Bitmap Scan: ``` EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0; NOTICE: exceeded work_mem, switching to full table scan QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- YB Bitmap Table Scan on large (cost=6.91..11.21 rows=10 width=8) (actual time=233.920..10990.324 rows=10000000 loops=1) Storage Table Read Requests: 9767 Storage Table Read Execution Time: 6288.534 ms Storage Table Rows Scanned: 10000000 Exceeded work_mem: true -> BitmapOr (cost=6.91..6.91 rows=20 width=0) (actual time=231.783..231.783 rows=0 loops=1) -> Bitmap Index Scan on large_a_idx (cost=0.00..3.45 rows=10 width=0) (actual time=231.771..231.771 rows=0 loops=1) Index Cond: (a > 0) Storage Index Read Requests: 171 Storage Index Read Execution Time: 199.258 ms Storage Index Rows Scanned: 175104 -> Bitmap Index Scan on large_b_idx (cost=0.00..3.45 rows=10 width=0) (actual time=0.001..0.001 rows=0 loops=1) Index Cond: (b > 0) Planning Time: 10.735 ms Execution Time: 15493.938 ms Storage Read Requests: 9938 Storage Read Execution Time: 6487.792 ms Storage Rows Scanned: 10175104 Storage Write Requests: 0 Catalog Read Requests: 8 Catalog Read Execution Time: 6.969 ms Catalog Write Requests: 0 Storage Flush Requests: 0 Storage Execution Time: 6494.761 ms Peak Memory Usage: 100 kB (25 rows) ``` Sequential Scan: ``` EXPLAIN (ANALYZE, DIST) SELECT * FROM large WHERE a > 0 OR b > 0; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Seq Scan on large (cost=0.00..105.00 rows=1000 width=8) (actual time=2.721..9896.584 rows=10000000 loops=1) Storage Filter: ((a > 0) OR (b > 0)) Storage Table Read Requests: 4885 Storage Table Read Execution Time: 7578.189 ms Storage Table Rows Scanned: 10000000 Planning Time: 0.082 ms Execution Time: 14737.837 ms Storage Read Requests: 4885 Storage Read Execution Time: 7578.189 ms Storage Rows Scanned: 10000000 Storage Write Requests: 0 Catalog Read Requests: 0 Catalog Write Requests: 0 Storage Flush Requests: 0 Storage Execution Time: 7578.189 ms Peak Memory Usage: 24 kB (16 rows) ``` Bitmap scan with high work_mem: ``` SET work_mem TO '1GB'; EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ YB Bitmap Table Scan on large (cost=6.91..11.21 rows=10 width=8) (actual time=76779.066..162561.046 rows=10000000 loops=1) Storage Table Read Requests: 9766 Storage Table Read Execution Time: 78226.718 ms Storage Table Rows Scanned: 10000000 -> BitmapOr (cost=6.91..6.91 rows=20 width=0) (actual time=73603.559..73603.559 rows=10000000 loops=1) -> Bitmap Index Scan on large_a_idx (cost=0.00..3.45 rows=10 width=0) (actual time=36774.895..36774.895 rows=10000000 loops=1) Index Cond: (a > 0) Storage Index Read Requests: 9766 Storage Index Read Execution Time: 11136.148 ms Storage Index Rows Scanned: 10000000 -> Bitmap Index Scan on large_b_idx (cost=0.00..3.45 rows=10 width=0) (actual time=36828.655..36828.655 rows=10000000 loops=1) Index Cond: (b > 0) Storage Index Read Requests: 9766 Storage Index Read Execution Time: 11419.616 ms Storage Index Rows Scanned: 10000000 Planning Time: 0.153 ms Execution Time: 169818.045 ms Storage Read Requests: 29298 Storage Read Execution Time: 100782.482 ms Storage Rows Scanned: 30000000 Storage Write Requests: 0 Catalog Read Requests: 0 Catalog Write Requests: 0 Storage Flush Requests: 0 Storage Execution Time: 100782.482 ms Peak Memory Usage: 1217840 kB (26 rows) ``` Reviewers: tnayak, amartsinchyk Reviewed By: tnayak Subscribers: smishra, ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D32015
asrinivasanyb
pushed a commit
to asrinivasanyb/yugabyte-db
that referenced
this issue
Mar 18, 2024
Summary: == Bitmap Scans == Bitmap Scans use multiple indexes to answer a query, with only one scan of the main table. Each index produces a “bitmap” indicating which rows of the main table are interesting. Multiple bitmaps can be combined with AND or OR operators to create a final bitmap which is used to collect rows from the main table. In this diff, initial support for Yugabyte bitmap scans is added. Yugabyte bitmap scans: * **have two nodes:** `Bitmap Index Scan` shared with PG, and `YB Bitmap Table Scan`. * **support exact results only**: When PG approaches work_mem, it switches to a lossy bitmap that identifies specific pages of the heap relation to access. When YB approaches work_mem, it simply switches to a full sequential scan. * **support arbitary nesting of logical operands and index conditions**. Assuming the same indexes and support for conditions, any arrangement of ANDs, ORs, and index scans that Postgres supports are also supported by Yugabyte because the code is reused. Note that Yugabyte is currently extremely unlikely to use an `AND` condition because the planner doesn't account for the cost of each bitmap index access being half the cost of a normal index access. (Bitmap index scans only need to access the ybctids, but normal index scans access ybctids + use them to query the main table.) * **allow conditions with `AND` to be used as a filter**. This currently happens for all `AND` conditions, see the above bullet point. But with a hack to the planner, `Bitmap And` and filters are both valid options. * **are controlled by `enable_bitmapscan`** * **are controlled by pg hint plan's `/*+ BitmapScan(tab) */` syntax ** * ** only print the recheck condition when recheck is required**. This is contrary to PG behavior, but since fetching unnecessary rows has a higher cost in YB, it’s better to clearly communicate that. * **show distributed storage stats when `EXPLAIN (ANALYZE, DIST)` is run**. `Storage Index Read Requests` are shown under `Bitmap Index Scans` on a secondary index, `Storage Table Read Requests` on`Bitmap Index Scans` on a primary index or `YB Bitmap Table Scans`. * **do not use unbounded memory**. Similar to Postgres, each bitmap is bounded by `work_mem` (although PG follows this bound loosely, i.e. if it can't lossify the bitmap enough it will exceed it). In the current draft, if Yugabyte exceeds work_mem, we stop collecting ybctids and run a full table scan. There is some output in the explain plan to indicate this occurred. * **output average ybctid size when `EXPLAIN (ANALYZE, DEBUG)` is run**. This can be used to get an estimate of how much memory a bitmap scan might take up. Yugabyte bitmap scans do **not** currently: * **have an accurate cost model:** [[ yugabyte#20573 | yugabyte#20573 ]] * **spill to disk when exceeding work_mem:** [[ yugabyte#20576 | yugabyte#20576 ]] * **parallelise bitmap table scans**: [[ yugabyte#20575 | yugabyte#20575 ]] * **support GIN indexes**: [[ yugabyte#20574 | yugabyte#20574 ]] * **skip fetching ybctids from YB Bitmap Table Scan**: [[ yugabyte#21036 | yugabyte#21036 ]] * **parallelise bitmap index scans**: [[ yugabyte#21037 | yugabyte#21037 ]] See the design document here: https://docs.google.com/document/d/1pRUsXZDywzrZTQpTfXZNTZckh-8tTtkfOcurUoybTr0/edit?usp=sharing Jira: DB-2547 Test Plan: ```lang=sh ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressYbBitmapScans' ``` Stress test on a 10 million row table: ```lang=sql CREATE TABLE large (a INT, b INT); CREATE INDEX ON large (a ASC); CREATE INDEX ON large (b ASC); INSERT INTO large SELECT i, i FROM generate_series(1, 10000000) i; ``` Bitmap Scan: ``` EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0; NOTICE: exceeded work_mem, switching to full table scan QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- YB Bitmap Table Scan on large (cost=6.91..11.21 rows=10 width=8) (actual time=233.920..10990.324 rows=10000000 loops=1) Storage Table Read Requests: 9767 Storage Table Read Execution Time: 6288.534 ms Storage Table Rows Scanned: 10000000 Exceeded work_mem: true -> BitmapOr (cost=6.91..6.91 rows=20 width=0) (actual time=231.783..231.783 rows=0 loops=1) -> Bitmap Index Scan on large_a_idx (cost=0.00..3.45 rows=10 width=0) (actual time=231.771..231.771 rows=0 loops=1) Index Cond: (a > 0) Storage Index Read Requests: 171 Storage Index Read Execution Time: 199.258 ms Storage Index Rows Scanned: 175104 -> Bitmap Index Scan on large_b_idx (cost=0.00..3.45 rows=10 width=0) (actual time=0.001..0.001 rows=0 loops=1) Index Cond: (b > 0) Planning Time: 10.735 ms Execution Time: 15493.938 ms Storage Read Requests: 9938 Storage Read Execution Time: 6487.792 ms Storage Rows Scanned: 10175104 Storage Write Requests: 0 Catalog Read Requests: 8 Catalog Read Execution Time: 6.969 ms Catalog Write Requests: 0 Storage Flush Requests: 0 Storage Execution Time: 6494.761 ms Peak Memory Usage: 100 kB (25 rows) ``` Sequential Scan: ``` EXPLAIN (ANALYZE, DIST) SELECT * FROM large WHERE a > 0 OR b > 0; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Seq Scan on large (cost=0.00..105.00 rows=1000 width=8) (actual time=2.721..9896.584 rows=10000000 loops=1) Storage Filter: ((a > 0) OR (b > 0)) Storage Table Read Requests: 4885 Storage Table Read Execution Time: 7578.189 ms Storage Table Rows Scanned: 10000000 Planning Time: 0.082 ms Execution Time: 14737.837 ms Storage Read Requests: 4885 Storage Read Execution Time: 7578.189 ms Storage Rows Scanned: 10000000 Storage Write Requests: 0 Catalog Read Requests: 0 Catalog Write Requests: 0 Storage Flush Requests: 0 Storage Execution Time: 7578.189 ms Peak Memory Usage: 24 kB (16 rows) ``` Bitmap scan with high work_mem: ``` SET work_mem TO '1GB'; EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ YB Bitmap Table Scan on large (cost=6.91..11.21 rows=10 width=8) (actual time=76779.066..162561.046 rows=10000000 loops=1) Storage Table Read Requests: 9766 Storage Table Read Execution Time: 78226.718 ms Storage Table Rows Scanned: 10000000 -> BitmapOr (cost=6.91..6.91 rows=20 width=0) (actual time=73603.559..73603.559 rows=10000000 loops=1) -> Bitmap Index Scan on large_a_idx (cost=0.00..3.45 rows=10 width=0) (actual time=36774.895..36774.895 rows=10000000 loops=1) Index Cond: (a > 0) Storage Index Read Requests: 9766 Storage Index Read Execution Time: 11136.148 ms Storage Index Rows Scanned: 10000000 -> Bitmap Index Scan on large_b_idx (cost=0.00..3.45 rows=10 width=0) (actual time=36828.655..36828.655 rows=10000000 loops=1) Index Cond: (b > 0) Storage Index Read Requests: 9766 Storage Index Read Execution Time: 11419.616 ms Storage Index Rows Scanned: 10000000 Planning Time: 0.153 ms Execution Time: 169818.045 ms Storage Read Requests: 29298 Storage Read Execution Time: 100782.482 ms Storage Rows Scanned: 30000000 Storage Write Requests: 0 Catalog Read Requests: 0 Catalog Write Requests: 0 Storage Flush Requests: 0 Storage Execution Time: 100782.482 ms Peak Memory Usage: 1217840 kB (26 rows) ``` Reviewers: tnayak, amartsinchyk Reviewed By: tnayak Subscribers: smishra, ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D32015
1 task
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/ysql
Yugabyte SQL (YSQL)
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Jira Link: DB-10006
Description
In a Bitmap Scan,
In a PG bitmap scan, the only part parallelised by Postgres parallelism is the Bitmap Heap Scan. The rest of the nodes execute sequentially.
In Yugabyte, because we are a distributed system, we could launch requests for each of the three indexes, and then go back to get the results from each index.
Issue Type
kind/enhancement
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: