Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL][Query Diagnostics] Handle intermediate flushing for explain plans #23720

Open
1 task done
IshanChhangani opened this issue Aug 29, 2024 · 0 comments
Open
1 task done
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/new-feature This is a request for a completely new feature priority/medium Medium priority issue

Comments

@IshanChhangani
Copy link
Contributor

IshanChhangani commented Aug 29, 2024

Jira Link: DB-12628

Description

Currently, when a query's explain plan exceeds the shared memory allocated for explain plans (YB_QD_MAX_EXPLAIN_PLAN_LEN), the output is trimmed, leading to potential data loss. We aim to improve this behavior to ensure no data is lost.

Currently, the logic copies as much of the plan as possible, even if the data is incomplete. We want to change this to copy the plan only if the entire plan fits within the buffer, similar to how bind variables are handled in this scenario.

Additionally, we propose the following improvements:

GUC Configuration: Convert the hard limits on buffer sizes into configurable GUCs (Grand Unified Configuration) for more flexibility.
User Warnings: Print warnings if a user exceeds the buffer limit within a single background worker interval.
Overflow Flag: Introduce a flag to indicate if data was missed due to buffer overflow. If this flag is set, trigger a buffer flush, regardless of how much the buffer is filled.
These changes will prevent data loss and provide better control over buffer management for explain plans.

Issue Type

kind/new-feature

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@IshanChhangani IshanChhangani added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Aug 29, 2024
@IshanChhangani IshanChhangani self-assigned this Aug 29, 2024
@yugabyte-ci yugabyte-ci added kind/new-feature This is a request for a completely new feature priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Aug 29, 2024
IshanChhangani added a commit that referenced this issue Sep 5, 2024
…queryDiagnostics

Summary:
- This diff adds functionality for `explain_analyze` and `explain_dist` params.
- Note: This diff does not include code for supporting EXPLAIN(ANALYZE, DIST, DEBUG)
- Explain.txt looks like:-

```
duration: 6.407 ms
plan:
Sort  (cost=550.02..550.85 rows=333 width=116) (actual time=6.390..6.390 rows=0 loops=1)
  Sort Key: (sum(o.total_amount)) DESC, (count(DISTINCT o.order_id)) DESC
  Sort Method: quicksort  Memory: 25kB
  ->  GroupAggregate  (cost=496.90..536.07 rows=333 width=116) (actual time=6.377..6.377 rows=0 loops=1)
        Group Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text))
        Filter: (count(DISTINCT o.order_id) > 0)
        ->  Sort  (cost=496.90..499.40 rows=1000 width=178) (actual time=6.374..6.374 rows=0 loops=1)
              Sort Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text))
              Sort Method: quicksort  Memory: 25kB
              ->  YB Batched Nested Loop Left Join  (cost=117.50..447.07 rows=1000 width=178) (actual time=6.362..6.362 rows=0 loops=1)
                    Join Filter: (oi.product_id = p.product_id)
                    ->  YB Batched Nested Loop Join  (cost=117.50..328.19 rows=1000 width=268) (actual time=6.359..6.359 rows=0 loops=1)
                          Join Filter: (o.customer_id = c.customer_id)
                          ->  Hash Right Join  (cost=117.50..225.69 rows=1000 width=32) (actual time=6.354..6.355 rows=0 loops=1)
                                Hash Cond: (oi.order_id = o.order_id)
                                ->  Seq Scan on order_items oi  (cost=0.00..100.00 rows=1000 width=12) (never executed)
                                ->  Hash  (cost=105.00..105.00 rows=1000 width=24) (actual time=6.348..6.348 rows=0 loops=1)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on orders o  (cost=0.00..105.00 rows=1000 width=24) (actual time=6.345..6.345 rows=0 loops=1)
                                            Storage Filter: ((order_date >= '2023-04-01'::date) AND (order_date <= '2023-05-31'::date))
                                            Storage Table Read Requests: 1
                                            Storage Table Read Execution Time: 0.000 ms
                          ->  Index Scan using customers_pkey on customers c  (cost=0.00..0.12 rows=1 width=240) (never executed)
                                Index Cond: (customer_id = ANY (ARRAY[o.customer_id, $1, $2, ..., $1023]))
                                Storage Filter: (registration_date >= '2023-01-01'::date)
                    ->  Index Scan using products_pkey on products p  (cost=0.00..0.11 rows=1 width=122) (never executed)
                          Index Cond: (product_id = ANY (ARRAY[oi.product_id, $1025, $1026, ..., $2047]))

duration: 6.559 ms
plan:
Sort  (cost=550.02..550.85 rows=333 width=116) (actual time=6.537..6.537 rows=0 loops=1)
  Sort Key: (sum(o.total_amount)) DESC, (count(DISTINCT o.order_id)) DESC
  Sort Method: quicksort  Memory: 25kB
  ->  GroupAggregate  (cost=496.90..536.07 rows=333 width=116) (actual time=6.526..6.526 rows=0 loops=1)
        Group Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text))
        Filter: (count(DISTINCT o.order_id) > 0)
        ->  Sort  (cost=496.90..499.40 rows=1000 width=178) (actual time=6.523..6.523 rows=0 loops=1)
              Sort Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text))
              Sort Method: quicksort  Memory: 25kB
              ->  YB Batched Nested Loop Left Join  (cost=117.50..447.07 rows=1000 width=178) (actual time=6.515..6.515 rows=0 loops=1)
                    Join Filter: (oi.product_id = p.product_id)
                    ->  YB Batched Nested Loop Join  (cost=117.50..328.19 rows=1000 width=268) (actual time=6.511..6.511 rows=0 loops=1)
                          Join Filter: (o.customer_id = c.customer_id)
                          ->  Hash Right Join  (cost=117.50..225.69 rows=1000 width=32) (actual time=6.506..6.506 rows=0 loops=1)
                                Hash Cond: (oi.order_id = o.order_id)
                                ->  Seq Scan on order_items oi  (cost=0.00..100.00 rows=1000 width=12) (never executed)
                                ->  Hash  (cost=105.00..105.00 rows=1000 width=24) (actual time=6.497..6.497 rows=0 loops=1)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on orders o  (cost=0.00..105.00 rows=1000 width=24) (actual time=6.494..6.494 rows=0 loops=1)
                                            Storage Filter: ((order_date >= '2023-04-01'::date) AND (order_date <= '2023-05-31'::date))
                                            Storage Table Read Requests: 1
                                            Storage Table Read Execution Time: 0.000 ms
                          ->  Index Scan using customers_pkey on customers c  (cost=0.00..0.12 rows=1 width=240) (never executed)
                                Index Cond: (customer_id = ANY (ARRAY[o.customer_id, $1, $2, ..., $1023]))
                                Storage Filter: (registration_date >= '2023-01-01'::date)
                    ->  Index Scan using products_pkey on products p  (cost=0.00..0.11 rows=1 width=122) (never executed)
                          Index Cond: (product_id = ANY (ARRAY[oi.product_id, $1025, $1026, ..., $2047]))

```
Jira: DB-11078

Test Plan:
./yb_build.sh --java-test TestYbQueryDiagnostics#checkExplainData

Testing with a large query that generates an EXPLAIN plan exceeding 16KB, causes output termination and loss of remaining data. This is expected and will be handled later  [[ #23720 | #23720 ]]

Reviewers: asaha, hbhanawat, telgersma

Reviewed By: telgersma

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D35566
IshanChhangani added a commit that referenced this issue Sep 12, 2024
…ANALYZE, DIST) support for queryDiagnostics

Summary:
No merge conflict
- yb_query_diagnostics.c
  - YbQueryDiagnostics_ExecutorStart
    - Resolved compile errors by replacing `MAX_REMOVE_VALUE` with pg15 counterpart function `pg_prng_double(&pg_global_prng_state)`. Upstream PG 3804539e48e794781c6145c7f988f5d507418fa8 replaced random function with pg_prng_double.  YB master 40689bc
  - Imported this function `include "common/pg_prng.h"`
    - same

original summary
- This diff adds functionality for `explain_analyze` and `explain_dist` params.
- Note: This diff does not include code for supporting EXPLAIN(ANALYZE, DIST, DEBUG)
- Explain.txt looks like:-

```
duration: 6.407 ms
plan:
Sort  (cost=550.02..550.85 rows=333 width=116) (actual time=6.390..6.390 rows=0 loops=1)
  Sort Key: (sum(o.total_amount)) DESC, (count(DISTINCT o.order_id)) DESC
  Sort Method: quicksort  Memory: 25kB
  ->  GroupAggregate  (cost=496.90..536.07 rows=333 width=116) (actual time=6.377..6.377 rows=0 loops=1)
        Group Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text))
        Filter: (count(DISTINCT o.order_id) > 0)
        ->  Sort  (cost=496.90..499.40 rows=1000 width=178) (actual time=6.374..6.374 rows=0 loops=1)
              Sort Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text))
              Sort Method: quicksort  Memory: 25kB
              ->  YB Batched Nested Loop Left Join  (cost=117.50..447.07 rows=1000 width=178) (actual time=6.362..6.362 rows=0 loops=1)
                    Join Filter: (oi.product_id = p.product_id)
                    ->  YB Batched Nested Loop Join  (cost=117.50..328.19 rows=1000 width=268) (actual time=6.359..6.359 rows=0 loops=1)
                          Join Filter: (o.customer_id = c.customer_id)
                          ->  Hash Right Join  (cost=117.50..225.69 rows=1000 width=32) (actual time=6.354..6.355 rows=0 loops=1)
                                Hash Cond: (oi.order_id = o.order_id)
                                ->  Seq Scan on order_items oi  (cost=0.00..100.00 rows=1000 width=12) (never executed)
                                ->  Hash  (cost=105.00..105.00 rows=1000 width=24) (actual time=6.348..6.348 rows=0 loops=1)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on orders o  (cost=0.00..105.00 rows=1000 width=24) (actual time=6.345..6.345 rows=0 loops=1)
                                            Storage Filter: ((order_date >= '2023-04-01'::date) AND (order_date <= '2023-05-31'::date))
                                            Storage Table Read Requests: 1
                                            Storage Table Read Execution Time: 0.000 ms
                          ->  Index Scan using customers_pkey on customers c  (cost=0.00..0.12 rows=1 width=240) (never executed)
                                Index Cond: (customer_id = ANY (ARRAY[o.customer_id, $1, $2, ..., $1023]))
                                Storage Filter: (registration_date >= '2023-01-01'::date)
                    ->  Index Scan using products_pkey on products p  (cost=0.00..0.11 rows=1 width=122) (never executed)
                          Index Cond: (product_id = ANY (ARRAY[oi.product_id, $1025, $1026, ..., $2047]))

duration: 6.559 ms
plan:
Sort  (cost=550.02..550.85 rows=333 width=116) (actual time=6.537..6.537 rows=0 loops=1)
  Sort Key: (sum(o.total_amount)) DESC, (count(DISTINCT o.order_id)) DESC
  Sort Method: quicksort  Memory: 25kB
  ->  GroupAggregate  (cost=496.90..536.07 rows=333 width=116) (actual time=6.526..6.526 rows=0 loops=1)
        Group Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text))
        Filter: (count(DISTINCT o.order_id) > 0)
        ->  Sort  (cost=496.90..499.40 rows=1000 width=178) (actual time=6.523..6.523 rows=0 loops=1)
              Sort Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text))
              Sort Method: quicksort  Memory: 25kB
              ->  YB Batched Nested Loop Left Join  (cost=117.50..447.07 rows=1000 width=178) (actual time=6.515..6.515 rows=0 loops=1)
                    Join Filter: (oi.product_id = p.product_id)
                    ->  YB Batched Nested Loop Join  (cost=117.50..328.19 rows=1000 width=268) (actual time=6.511..6.511 rows=0 loops=1)
                          Join Filter: (o.customer_id = c.customer_id)
                          ->  Hash Right Join  (cost=117.50..225.69 rows=1000 width=32) (actual time=6.506..6.506 rows=0 loops=1)
                                Hash Cond: (oi.order_id = o.order_id)
                                ->  Seq Scan on order_items oi  (cost=0.00..100.00 rows=1000 width=12) (never executed)
                                ->  Hash  (cost=105.00..105.00 rows=1000 width=24) (actual time=6.497..6.497 rows=0 loops=1)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on orders o  (cost=0.00..105.00 rows=1000 width=24) (actual time=6.494..6.494 rows=0 loops=1)
                                            Storage Filter: ((order_date >= '2023-04-01'::date) AND (order_date <= '2023-05-31'::date))
                                            Storage Table Read Requests: 1
                                            Storage Table Read Execution Time: 0.000 ms
                          ->  Index Scan using customers_pkey on customers c  (cost=0.00..0.12 rows=1 width=240) (never executed)
                                Index Cond: (customer_id = ANY (ARRAY[o.customer_id, $1, $2, ..., $1023]))
                                Storage Filter: (registration_date >= '2023-01-01'::date)
                    ->  Index Scan using products_pkey on products p  (cost=0.00..0.11 rows=1 width=122) (never executed)
                          Index Cond: (product_id = ANY (ARRAY[oi.product_id, $1025, $1026, ..., $2047]))

```
Jira: DB-11078

Original commit: 40689bc / D35566

Test Plan:
./yb_build.sh --java-test TestYbQueryDiagnostics#checkExplainData

Testing with a large query that generates an EXPLAIN plan exceeding 16KB, causes output termination and loss of remaining data. This is expected and will be handled later  [[ #23720 | #23720 ]]

Reviewers: jason, tfoucher

Reviewed By: jason

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D37980
@IshanChhangani IshanChhangani changed the title [YSQL][Query Diagnostics] Handle oversized explain plans [YSQL][Query Diagnostics] Handle intermediate flushing for explain plans Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/new-feature This is a request for a completely new feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants