Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Support Parallel YB Bitmap Table Scans #20575

Open
1 task done
timothy-e opened this issue Jan 11, 2024 · 0 comments
Open
1 task done

[YSQL] Support Parallel YB Bitmap Table Scans #20575

timothy-e opened this issue Jan 11, 2024 · 0 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@timothy-e
Copy link
Contributor

timothy-e commented Jan 11, 2024

Jira Link: DB-9576

Description

PG supports parallel bitmap scans (only the heap portion is parallelized, not the index access).

YB's initial implementation does not support parallelism for bitmap scans.

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@timothy-e timothy-e added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jan 11, 2024
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Jan 11, 2024
@timothy-e timothy-e self-assigned this Feb 22, 2024
@timothy-e timothy-e changed the title [YSQL] Support Parallel Bitmap Scans [YSQL] Support Parallel YB Bitmap Table Scans Feb 22, 2024
timothy-e added a commit that referenced this issue Mar 12, 2024
Summary:
== Bitmap Scans ==
Bitmap Scans use multiple indexes to answer a query, with only one scan of the main table. Each index produces a “bitmap” indicating which rows of the main table are interesting. Multiple bitmaps can be combined with AND or OR operators to create a final bitmap which is used to collect rows from the main table.

In this diff, initial support for Yugabyte bitmap scans is added. Yugabyte bitmap scans:
* **have two nodes:** `Bitmap Index Scan` shared with PG, and `YB Bitmap Table Scan`.
* **support exact results only**: When PG approaches work_mem, it switches to a lossy bitmap that identifies specific pages of the heap relation to access. When YB approaches work_mem, it simply switches to a full sequential scan.
* **support arbitary nesting of logical operands and index conditions**. Assuming the same indexes and support for conditions, any arrangement of ANDs, ORs, and index scans that Postgres supports are also supported by Yugabyte because the code is reused. Note that Yugabyte is currently extremely unlikely to use an `AND` condition because the planner doesn't account for the cost of each bitmap index access being half the cost of a normal index access. (Bitmap index scans only need to access the ybctids, but normal index scans access ybctids + use them to query the main table.)
* **allow conditions with `AND` to be used as a filter**. This currently happens for all `AND` conditions, see the above bullet point. But with a hack to the planner, `Bitmap And` and filters are both valid options.
* **are controlled by `enable_bitmapscan`**
* **are controlled by pg hint plan's `/*+ BitmapScan(tab) */` syntax **
* ** only print the recheck condition when recheck is required**. This is contrary to PG behavior, but since fetching unnecessary rows has a higher cost in YB, it’s better to clearly communicate that.
* **show distributed storage stats when `EXPLAIN (ANALYZE, DIST)` is run**. `Storage Index Read Requests` are shown under `Bitmap Index Scans` on a secondary index, `Storage Table Read Requests` on`Bitmap Index Scans` on a primary index or `YB Bitmap Table Scans`.
* **do not use unbounded memory**. Similar to Postgres, each bitmap is bounded by `work_mem` (although PG follows this bound loosely, i.e. if it can't lossify the bitmap enough it will exceed it). In the current draft, if Yugabyte exceeds work_mem, we stop collecting ybctids and run a full table scan. There is some output in the explain plan to indicate this occurred.
* **output average ybctid size when `EXPLAIN (ANALYZE, DEBUG)` is run**. This can be used to get an estimate of how much memory a bitmap scan might take up.

Yugabyte bitmap scans do **not** currently:
* **have an accurate cost model:** [[ #20573 | #20573 ]]
* **spill to disk when exceeding work_mem:** [[ #20576 | #20576 ]]
* **parallelise bitmap table scans**: [[ #20575 | #20575 ]]
* **support GIN indexes**: [[ #20574 | #20574 ]]
* **skip fetching ybctids from YB Bitmap Table Scan**: [[ #21036 | #21036 ]]
* **parallelise bitmap index scans**: [[ #21037 | #21037 ]]

See the design document here: https://docs.google.com/document/d/1pRUsXZDywzrZTQpTfXZNTZckh-8tTtkfOcurUoybTr0/edit?usp=sharing
Jira: DB-2547

Test Plan:
```lang=sh
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressYbBitmapScans'
```

Stress test on a 10 million row table:
```lang=sql
CREATE TABLE large (a INT, b INT);
CREATE INDEX ON large (a ASC);
CREATE INDEX ON large (b ASC);
INSERT INTO large SELECT i, i FROM generate_series(1, 10000000) i;
```

Bitmap Scan:
```
EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0;
NOTICE:  exceeded work_mem, switching to full table scan
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 YB Bitmap Table Scan on large  (cost=6.91..11.21 rows=10 width=8) (actual time=233.920..10990.324 rows=10000000 loops=1)
   Storage Table Read Requests: 9767
   Storage Table Read Execution Time: 6288.534 ms
   Storage Table Rows Scanned: 10000000
   Exceeded work_mem: true
   ->  BitmapOr  (cost=6.91..6.91 rows=20 width=0) (actual time=231.783..231.783 rows=0 loops=1)
         ->  Bitmap Index Scan on large_a_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=231.771..231.771 rows=0 loops=1)
               Index Cond: (a > 0)
               Storage Index Read Requests: 171
               Storage Index Read Execution Time: 199.258 ms
               Storage Index Rows Scanned: 175104
         ->  Bitmap Index Scan on large_b_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=0.001..0.001 rows=0 loops=1)
               Index Cond: (b > 0)
 Planning Time: 10.735 ms
 Execution Time: 15493.938 ms
 Storage Read Requests: 9938
 Storage Read Execution Time: 6487.792 ms
 Storage Rows Scanned: 10175104
 Storage Write Requests: 0
 Catalog Read Requests: 8
 Catalog Read Execution Time: 6.969 ms
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 6494.761 ms
 Peak Memory Usage: 100 kB
(25 rows)
```

Sequential Scan:
```
EXPLAIN (ANALYZE, DIST) SELECT * FROM large WHERE a > 0 OR b > 0;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on large  (cost=0.00..105.00 rows=1000 width=8) (actual time=2.721..9896.584 rows=10000000 loops=1)
   Storage Filter: ((a > 0) OR (b > 0))
   Storage Table Read Requests: 4885
   Storage Table Read Execution Time: 7578.189 ms
   Storage Table Rows Scanned: 10000000
 Planning Time: 0.082 ms
 Execution Time: 14737.837 ms
 Storage Read Requests: 4885
 Storage Read Execution Time: 7578.189 ms
 Storage Rows Scanned: 10000000
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 7578.189 ms
 Peak Memory Usage: 24 kB
(16 rows)
```

Bitmap scan with high work_mem:
```
SET work_mem TO '1GB';
EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 YB Bitmap Table Scan on large  (cost=6.91..11.21 rows=10 width=8) (actual time=76779.066..162561.046 rows=10000000 loops=1)
   Storage Table Read Requests: 9766
   Storage Table Read Execution Time: 78226.718 ms
   Storage Table Rows Scanned: 10000000
   ->  BitmapOr  (cost=6.91..6.91 rows=20 width=0) (actual time=73603.559..73603.559 rows=10000000 loops=1)
         ->  Bitmap Index Scan on large_a_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=36774.895..36774.895 rows=10000000 loops=1)
               Index Cond: (a > 0)
               Storage Index Read Requests: 9766
               Storage Index Read Execution Time: 11136.148 ms
               Storage Index Rows Scanned: 10000000
         ->  Bitmap Index Scan on large_b_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=36828.655..36828.655 rows=10000000 loops=1)
               Index Cond: (b > 0)
               Storage Index Read Requests: 9766
               Storage Index Read Execution Time: 11419.616 ms
               Storage Index Rows Scanned: 10000000
 Planning Time: 0.153 ms
 Execution Time: 169818.045 ms
 Storage Read Requests: 29298
 Storage Read Execution Time: 100782.482 ms
 Storage Rows Scanned: 30000000
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 100782.482 ms
 Peak Memory Usage: 1217840 kB
(26 rows)
```

Reviewers: tnayak, amartsinchyk

Reviewed By: tnayak

Subscribers: smishra, ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D32015
asrinivasanyb pushed a commit to asrinivasanyb/yugabyte-db that referenced this issue Mar 18, 2024
Summary:
== Bitmap Scans ==
Bitmap Scans use multiple indexes to answer a query, with only one scan of the main table. Each index produces a “bitmap” indicating which rows of the main table are interesting. Multiple bitmaps can be combined with AND or OR operators to create a final bitmap which is used to collect rows from the main table.

In this diff, initial support for Yugabyte bitmap scans is added. Yugabyte bitmap scans:
* **have two nodes:** `Bitmap Index Scan` shared with PG, and `YB Bitmap Table Scan`.
* **support exact results only**: When PG approaches work_mem, it switches to a lossy bitmap that identifies specific pages of the heap relation to access. When YB approaches work_mem, it simply switches to a full sequential scan.
* **support arbitary nesting of logical operands and index conditions**. Assuming the same indexes and support for conditions, any arrangement of ANDs, ORs, and index scans that Postgres supports are also supported by Yugabyte because the code is reused. Note that Yugabyte is currently extremely unlikely to use an `AND` condition because the planner doesn't account for the cost of each bitmap index access being half the cost of a normal index access. (Bitmap index scans only need to access the ybctids, but normal index scans access ybctids + use them to query the main table.)
* **allow conditions with `AND` to be used as a filter**. This currently happens for all `AND` conditions, see the above bullet point. But with a hack to the planner, `Bitmap And` and filters are both valid options.
* **are controlled by `enable_bitmapscan`**
* **are controlled by pg hint plan's `/*+ BitmapScan(tab) */` syntax **
* ** only print the recheck condition when recheck is required**. This is contrary to PG behavior, but since fetching unnecessary rows has a higher cost in YB, it’s better to clearly communicate that.
* **show distributed storage stats when `EXPLAIN (ANALYZE, DIST)` is run**. `Storage Index Read Requests` are shown under `Bitmap Index Scans` on a secondary index, `Storage Table Read Requests` on`Bitmap Index Scans` on a primary index or `YB Bitmap Table Scans`.
* **do not use unbounded memory**. Similar to Postgres, each bitmap is bounded by `work_mem` (although PG follows this bound loosely, i.e. if it can't lossify the bitmap enough it will exceed it). In the current draft, if Yugabyte exceeds work_mem, we stop collecting ybctids and run a full table scan. There is some output in the explain plan to indicate this occurred.
* **output average ybctid size when `EXPLAIN (ANALYZE, DEBUG)` is run**. This can be used to get an estimate of how much memory a bitmap scan might take up.

Yugabyte bitmap scans do **not** currently:
* **have an accurate cost model:** [[ yugabyte#20573 | yugabyte#20573 ]]
* **spill to disk when exceeding work_mem:** [[ yugabyte#20576 | yugabyte#20576 ]]
* **parallelise bitmap table scans**: [[ yugabyte#20575 | yugabyte#20575 ]]
* **support GIN indexes**: [[ yugabyte#20574 | yugabyte#20574 ]]
* **skip fetching ybctids from YB Bitmap Table Scan**: [[ yugabyte#21036 | yugabyte#21036 ]]
* **parallelise bitmap index scans**: [[ yugabyte#21037 | yugabyte#21037 ]]

See the design document here: https://docs.google.com/document/d/1pRUsXZDywzrZTQpTfXZNTZckh-8tTtkfOcurUoybTr0/edit?usp=sharing
Jira: DB-2547

Test Plan:
```lang=sh
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressYbBitmapScans'
```

Stress test on a 10 million row table:
```lang=sql
CREATE TABLE large (a INT, b INT);
CREATE INDEX ON large (a ASC);
CREATE INDEX ON large (b ASC);
INSERT INTO large SELECT i, i FROM generate_series(1, 10000000) i;
```

Bitmap Scan:
```
EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0;
NOTICE:  exceeded work_mem, switching to full table scan
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 YB Bitmap Table Scan on large  (cost=6.91..11.21 rows=10 width=8) (actual time=233.920..10990.324 rows=10000000 loops=1)
   Storage Table Read Requests: 9767
   Storage Table Read Execution Time: 6288.534 ms
   Storage Table Rows Scanned: 10000000
   Exceeded work_mem: true
   ->  BitmapOr  (cost=6.91..6.91 rows=20 width=0) (actual time=231.783..231.783 rows=0 loops=1)
         ->  Bitmap Index Scan on large_a_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=231.771..231.771 rows=0 loops=1)
               Index Cond: (a > 0)
               Storage Index Read Requests: 171
               Storage Index Read Execution Time: 199.258 ms
               Storage Index Rows Scanned: 175104
         ->  Bitmap Index Scan on large_b_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=0.001..0.001 rows=0 loops=1)
               Index Cond: (b > 0)
 Planning Time: 10.735 ms
 Execution Time: 15493.938 ms
 Storage Read Requests: 9938
 Storage Read Execution Time: 6487.792 ms
 Storage Rows Scanned: 10175104
 Storage Write Requests: 0
 Catalog Read Requests: 8
 Catalog Read Execution Time: 6.969 ms
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 6494.761 ms
 Peak Memory Usage: 100 kB
(25 rows)
```

Sequential Scan:
```
EXPLAIN (ANALYZE, DIST) SELECT * FROM large WHERE a > 0 OR b > 0;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on large  (cost=0.00..105.00 rows=1000 width=8) (actual time=2.721..9896.584 rows=10000000 loops=1)
   Storage Filter: ((a > 0) OR (b > 0))
   Storage Table Read Requests: 4885
   Storage Table Read Execution Time: 7578.189 ms
   Storage Table Rows Scanned: 10000000
 Planning Time: 0.082 ms
 Execution Time: 14737.837 ms
 Storage Read Requests: 4885
 Storage Read Execution Time: 7578.189 ms
 Storage Rows Scanned: 10000000
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 7578.189 ms
 Peak Memory Usage: 24 kB
(16 rows)
```

Bitmap scan with high work_mem:
```
SET work_mem TO '1GB';
EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 YB Bitmap Table Scan on large  (cost=6.91..11.21 rows=10 width=8) (actual time=76779.066..162561.046 rows=10000000 loops=1)
   Storage Table Read Requests: 9766
   Storage Table Read Execution Time: 78226.718 ms
   Storage Table Rows Scanned: 10000000
   ->  BitmapOr  (cost=6.91..6.91 rows=20 width=0) (actual time=73603.559..73603.559 rows=10000000 loops=1)
         ->  Bitmap Index Scan on large_a_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=36774.895..36774.895 rows=10000000 loops=1)
               Index Cond: (a > 0)
               Storage Index Read Requests: 9766
               Storage Index Read Execution Time: 11136.148 ms
               Storage Index Rows Scanned: 10000000
         ->  Bitmap Index Scan on large_b_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=36828.655..36828.655 rows=10000000 loops=1)
               Index Cond: (b > 0)
               Storage Index Read Requests: 9766
               Storage Index Read Execution Time: 11419.616 ms
               Storage Index Rows Scanned: 10000000
 Planning Time: 0.153 ms
 Execution Time: 169818.045 ms
 Storage Read Requests: 29298
 Storage Read Execution Time: 100782.482 ms
 Storage Rows Scanned: 30000000
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 100782.482 ms
 Peak Memory Usage: 1217840 kB
(26 rows)
```

Reviewers: tnayak, amartsinchyk

Reviewed By: tnayak

Subscribers: smishra, ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D32015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants