[YSQL][Query Diagnostics] Handle intermediate flushing for explain plans #23720

IshanChhangani · 2024-08-29T10:58:02Z

Description

Currently, when a query's explain plan exceeds the shared memory allocated for explain plans (YB_QD_MAX_EXPLAIN_PLAN_LEN), the output is trimmed, leading to potential data loss. We aim to improve this behavior to ensure no data is lost.

Currently, the logic copies as much of the plan as possible, even if the data is incomplete. We want to change this to copy the plan only if the entire plan fits within the buffer, similar to how bind variables are handled in this scenario.

Additionally, we propose the following improvements:

GUC Configuration: Convert the hard limits on buffer sizes into configurable GUCs (Grand Unified Configuration) for more flexibility.
User Warnings: Print warnings if a user exceeds the buffer limit within a single background worker interval.
Overflow Flag: Introduce a flag to indicate if data was missed due to buffer overflow. If this flag is set, trigger a buffer flush, regardless of how much the buffer is filled.
These changes will prevent data loss and provide better control over buffer management for explain plans.

Issue Type

kind/new-feature

Warning: Please confirm that this issue does not contain any sensitive information

I confirm this issue does not contain any sensitive information.

…queryDiagnostics Summary: - This diff adds functionality for `explain_analyze` and `explain_dist` params. - Note: This diff does not include code for supporting EXPLAIN(ANALYZE, DIST, DEBUG) - Explain.txt looks like:- ``` duration: 6.407 ms plan: Sort (cost=550.02..550.85 rows=333 width=116) (actual time=6.390..6.390 rows=0 loops=1) Sort Key: (sum(o.total_amount)) DESC, (count(DISTINCT o.order_id)) DESC Sort Method: quicksort Memory: 25kB -> GroupAggregate (cost=496.90..536.07 rows=333 width=116) (actual time=6.377..6.377 rows=0 loops=1) Group Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text)) Filter: (count(DISTINCT o.order_id) > 0) -> Sort (cost=496.90..499.40 rows=1000 width=178) (actual time=6.374..6.374 rows=0 loops=1) Sort Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text)) Sort Method: quicksort Memory: 25kB -> YB Batched Nested Loop Left Join (cost=117.50..447.07 rows=1000 width=178) (actual time=6.362..6.362 rows=0 loops=1) Join Filter: (oi.product_id = p.product_id) -> YB Batched Nested Loop Join (cost=117.50..328.19 rows=1000 width=268) (actual time=6.359..6.359 rows=0 loops=1) Join Filter: (o.customer_id = c.customer_id) -> Hash Right Join (cost=117.50..225.69 rows=1000 width=32) (actual time=6.354..6.355 rows=0 loops=1) Hash Cond: (oi.order_id = o.order_id) -> Seq Scan on order_items oi (cost=0.00..100.00 rows=1000 width=12) (never executed) -> Hash (cost=105.00..105.00 rows=1000 width=24) (actual time=6.348..6.348 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on orders o (cost=0.00..105.00 rows=1000 width=24) (actual time=6.345..6.345 rows=0 loops=1) Storage Filter: ((order_date >= '2023-04-01'::date) AND (order_date <= '2023-05-31'::date)) Storage Table Read Requests: 1 Storage Table Read Execution Time: 0.000 ms -> Index Scan using customers_pkey on customers c (cost=0.00..0.12 rows=1 width=240) (never executed) Index Cond: (customer_id = ANY (ARRAY[o.customer_id, $1, $2, ..., $1023])) Storage Filter: (registration_date >= '2023-01-01'::date) -> Index Scan using products_pkey on products p (cost=0.00..0.11 rows=1 width=122) (never executed) Index Cond: (product_id = ANY (ARRAY[oi.product_id, $1025, $1026, ..., $2047])) duration: 6.559 ms plan: Sort (cost=550.02..550.85 rows=333 width=116) (actual time=6.537..6.537 rows=0 loops=1) Sort Key: (sum(o.total_amount)) DESC, (count(DISTINCT o.order_id)) DESC Sort Method: quicksort Memory: 25kB -> GroupAggregate (cost=496.90..536.07 rows=333 width=116) (actual time=6.526..6.526 rows=0 loops=1) Group Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text)) Filter: (count(DISTINCT o.order_id) > 0) -> Sort (cost=496.90..499.40 rows=1000 width=178) (actual time=6.523..6.523 rows=0 loops=1) Sort Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text)) Sort Method: quicksort Memory: 25kB -> YB Batched Nested Loop Left Join (cost=117.50..447.07 rows=1000 width=178) (actual time=6.515..6.515 rows=0 loops=1) Join Filter: (oi.product_id = p.product_id) -> YB Batched Nested Loop Join (cost=117.50..328.19 rows=1000 width=268) (actual time=6.511..6.511 rows=0 loops=1) Join Filter: (o.customer_id = c.customer_id) -> Hash Right Join (cost=117.50..225.69 rows=1000 width=32) (actual time=6.506..6.506 rows=0 loops=1) Hash Cond: (oi.order_id = o.order_id) -> Seq Scan on order_items oi (cost=0.00..100.00 rows=1000 width=12) (never executed) -> Hash (cost=105.00..105.00 rows=1000 width=24) (actual time=6.497..6.497 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on orders o (cost=0.00..105.00 rows=1000 width=24) (actual time=6.494..6.494 rows=0 loops=1) Storage Filter: ((order_date >= '2023-04-01'::date) AND (order_date <= '2023-05-31'::date)) Storage Table Read Requests: 1 Storage Table Read Execution Time: 0.000 ms -> Index Scan using customers_pkey on customers c (cost=0.00..0.12 rows=1 width=240) (never executed) Index Cond: (customer_id = ANY (ARRAY[o.customer_id, $1, $2, ..., $1023])) Storage Filter: (registration_date >= '2023-01-01'::date) -> Index Scan using products_pkey on products p (cost=0.00..0.11 rows=1 width=122) (never executed) Index Cond: (product_id = ANY (ARRAY[oi.product_id, $1025, $1026, ..., $2047])) ``` Jira: DB-11078 Test Plan: ./yb_build.sh --java-test TestYbQueryDiagnostics#checkExplainData Testing with a large query that generates an EXPLAIN plan exceeding 16KB, causes output termination and loss of remaining data. This is expected and will be handled later [[ #23720 | #23720 ]] Reviewers: asaha, hbhanawat, telgersma Reviewed By: telgersma Subscribers: yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D35566

…ANALYZE, DIST) support for queryDiagnostics Summary: No merge conflict - yb_query_diagnostics.c - YbQueryDiagnostics_ExecutorStart - Resolved compile errors by replacing `MAX_REMOVE_VALUE` with pg15 counterpart function `pg_prng_double(&pg_global_prng_state)`. Upstream PG 3804539e48e794781c6145c7f988f5d507418fa8 replaced random function with pg_prng_double. YB master 40689bc - Imported this function `include "common/pg_prng.h"` - same original summary - This diff adds functionality for `explain_analyze` and `explain_dist` params. - Note: This diff does not include code for supporting EXPLAIN(ANALYZE, DIST, DEBUG) - Explain.txt looks like:- ``` duration: 6.407 ms plan: Sort (cost=550.02..550.85 rows=333 width=116) (actual time=6.390..6.390 rows=0 loops=1) Sort Key: (sum(o.total_amount)) DESC, (count(DISTINCT o.order_id)) DESC Sort Method: quicksort Memory: 25kB -> GroupAggregate (cost=496.90..536.07 rows=333 width=116) (actual time=6.377..6.377 rows=0 loops=1) Group Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text)) Filter: (count(DISTINCT o.order_id) > 0) -> Sort (cost=496.90..499.40 rows=1000 width=178) (actual time=6.374..6.374 rows=0 loops=1) Sort Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text)) Sort Method: quicksort Memory: 25kB -> YB Batched Nested Loop Left Join (cost=117.50..447.07 rows=1000 width=178) (actual time=6.362..6.362 rows=0 loops=1) Join Filter: (oi.product_id = p.product_id) -> YB Batched Nested Loop Join (cost=117.50..328.19 rows=1000 width=268) (actual time=6.359..6.359 rows=0 loops=1) Join Filter: (o.customer_id = c.customer_id) -> Hash Right Join (cost=117.50..225.69 rows=1000 width=32) (actual time=6.354..6.355 rows=0 loops=1) Hash Cond: (oi.order_id = o.order_id) -> Seq Scan on order_items oi (cost=0.00..100.00 rows=1000 width=12) (never executed) -> Hash (cost=105.00..105.00 rows=1000 width=24) (actual time=6.348..6.348 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on orders o (cost=0.00..105.00 rows=1000 width=24) (actual time=6.345..6.345 rows=0 loops=1) Storage Filter: ((order_date >= '2023-04-01'::date) AND (order_date <= '2023-05-31'::date)) Storage Table Read Requests: 1 Storage Table Read Execution Time: 0.000 ms -> Index Scan using customers_pkey on customers c (cost=0.00..0.12 rows=1 width=240) (never executed) Index Cond: (customer_id = ANY (ARRAY[o.customer_id, $1, $2, ..., $1023])) Storage Filter: (registration_date >= '2023-01-01'::date) -> Index Scan using products_pkey on products p (cost=0.00..0.11 rows=1 width=122) (never executed) Index Cond: (product_id = ANY (ARRAY[oi.product_id, $1025, $1026, ..., $2047])) duration: 6.559 ms plan: Sort (cost=550.02..550.85 rows=333 width=116) (actual time=6.537..6.537 rows=0 loops=1) Sort Key: (sum(o.total_amount)) DESC, (count(DISTINCT o.order_id)) DESC Sort Method: quicksort Memory: 25kB -> GroupAggregate (cost=496.90..536.07 rows=333 width=116) (actual time=6.526..6.526 rows=0 loops=1) Group Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text)) Filter: (count(DISTINCT o.order_id) > 0) -> Sort (cost=496.90..499.40 rows=1000 width=178) (actual time=6.523..6.523 rows=0 loops=1) Sort Key: c.customer_id, ((((c.first_name)::text || ' '::text) || (c.last_name)::text)) Sort Method: quicksort Memory: 25kB -> YB Batched Nested Loop Left Join (cost=117.50..447.07 rows=1000 width=178) (actual time=6.515..6.515 rows=0 loops=1) Join Filter: (oi.product_id = p.product_id) -> YB Batched Nested Loop Join (cost=117.50..328.19 rows=1000 width=268) (actual time=6.511..6.511 rows=0 loops=1) Join Filter: (o.customer_id = c.customer_id) -> Hash Right Join (cost=117.50..225.69 rows=1000 width=32) (actual time=6.506..6.506 rows=0 loops=1) Hash Cond: (oi.order_id = o.order_id) -> Seq Scan on order_items oi (cost=0.00..100.00 rows=1000 width=12) (never executed) -> Hash (cost=105.00..105.00 rows=1000 width=24) (actual time=6.497..6.497 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on orders o (cost=0.00..105.00 rows=1000 width=24) (actual time=6.494..6.494 rows=0 loops=1) Storage Filter: ((order_date >= '2023-04-01'::date) AND (order_date <= '2023-05-31'::date)) Storage Table Read Requests: 1 Storage Table Read Execution Time: 0.000 ms -> Index Scan using customers_pkey on customers c (cost=0.00..0.12 rows=1 width=240) (never executed) Index Cond: (customer_id = ANY (ARRAY[o.customer_id, $1, $2, ..., $1023])) Storage Filter: (registration_date >= '2023-01-01'::date) -> Index Scan using products_pkey on products p (cost=0.00..0.11 rows=1 width=122) (never executed) Index Cond: (product_id = ANY (ARRAY[oi.product_id, $1025, $1026, ..., $2047])) ``` Jira: DB-11078 Original commit: 40689bc / D35566 Test Plan: ./yb_build.sh --java-test TestYbQueryDiagnostics#checkExplainData Testing with a large query that generates an EXPLAIN plan exceeding 16KB, causes output termination and loss of remaining data. This is expected and will be handled later [[ #23720 | #23720 ]] Reviewers: jason, tfoucher Reviewed By: jason Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D37980

IshanChhangani added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Aug 29, 2024

IshanChhangani self-assigned this Aug 29, 2024

yugabyte-ci added kind/new-feature This is a request for a completely new feature priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Aug 29, 2024

IshanChhangani changed the title ~~[YSQL][Query Diagnostics] Handle oversized explain plans~~ [YSQL][Query Diagnostics] Handle intermediate flushing for explain plans Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL][Query Diagnostics] Handle intermediate flushing for explain plans #23720

[YSQL][Query Diagnostics] Handle intermediate flushing for explain plans #23720

IshanChhangani commented Aug 29, 2024 •

edited

Loading

[YSQL][Query Diagnostics] Handle intermediate flushing for explain plans #23720

[YSQL][Query Diagnostics] Handle intermediate flushing for explain plans #23720

Comments

IshanChhangani commented Aug 29, 2024 • edited Loading

Description

Issue Type

Warning: Please confirm that this issue does not contain any sensitive information

IshanChhangani commented Aug 29, 2024 •

edited

Loading