explain: Clarify meaning of distribution full vs local #10020

sheaffej · 2021-03-19T18:52:12Z

It would be helpful to clarify the meaning distribution in explain plans. The possible values appear to be full and local.

On the page: https://www.cockroachlabs.com/docs/v20.2/explain#default-query-plans

It states:

distribution:full
The query plan will be distributed across all nodes on a distributed cluster.

However, it does not explain what local means. And it is not clear what distribution is referring to.

Reference internal Slack:
https://cockroachlabs.slack.com/archives/CHKQGKYEM/p1616110866018700
and follow-on
https://cockroachlabs.slack.com/archives/CHKQGKYEM/p1616110946019100

Summarizing the slack conversation.

distribution: full
Means that the planner chose a distributed execution plan, where execution of the query is performed by multiple nodes in parallel. Such as a distributed join or aggregation, where each node performs a portion of the work in parallel, and then final or downstream results are returned by the gateway node.

distribution: local
Means the planner chose a local execution plan, where execution of the query is performed only on the local node (aka gateway node). It is important to note that rows may be fetched by one or more remote nodes, but the processing of those rows is performed only by the local node.

Therefore "distribution" refers more to the style of processing of the records during query execution (e.g. distributed processing, or local processing), vs the fetching of the rows that will be processed by the query.

Examples:

distribution: local

[email protected]:59856/movr> explain 
select u.name, u.id
from users u
where u.city = 'Amsterdam' and u.id = 'b3333333-3333-4000-8000-000000000023';
  tree |        field        |                                                 description
-------+---------------------+--------------------------------------------------------------------------------------------------------------
       | distribution        | local
       | vectorized          | false
  scan |                     |
       | estimated row count | 1
       | table               | users@primary
       | spans               | [/'Amsterdam'/'b3333333-3333-4000-8000-000000000023' - /'Amsterdam'/'b3333333-3333-4000-8000-000000000023']
(6 rows)

Notice that all query processing occurs on node 1.

distribution: full

[email protected]:59856/movr> explain 
select u.city, count(*) as num
from users u
where u.city = 'Amsterdam' group by u.city;
    tree    |        field        |          description
------------+---------------------+--------------------------------
            | distribution        | full
            | vectorized          | false
  group     |                     |
   └── scan |                     |
            | estimated row count | 0
            | table               | users@primary
            | spans               | [/'Amsterdam' - /'Amsterdam']
(7 rows)

Notice that query processing involves node 1 and node5.

Therefore it is important to note the full does not mean that processing happens on ALL nodes, but that it happens on multiple nodes simultaneously.

The text was updated successfully, but these errors were encountered:

Fixes #10020, clarify full vs local distributed execution plans.

…0240) * New EXPLAIN ANALYZE PLAN statement. * Pass to update all EXPLAIN output. * Fixes #10127 estimated row count in all operators of EXPLAIN. Fixes #10020, clarify full vs local distributed execution plans. * Updates to EXPLAIN ANALYZE. * Adding new demo workload include. * Adding new statistics, and renaming KV tuple -> KV row. * Fixes #9582. * Fixes #9154. * Fixes #9109. * Backport fix for #10020. * Removing insecure performance tuning tutorial. * Addressed Radu's review comments. * Fixes #9553. * Refactoring to use demo cluster with locality flags. * Incorporated Andy's review comments. * Incorporated Eric's feedback. * Generated new `EXPLAIN ANALYZE` diagram. * Fixes #10262

sheaffej added A-sql C-doc-improvement labels Mar 19, 2021

sheaffej assigned ianjevans Mar 19, 2021

ianjevans pushed a commit that referenced this issue Apr 5, 2021

Fixes #10127 estimated row count in all operators of EXPLAIN.

b93e69d

Fixes #10020, clarify full vs local distributed execution plans.

ianjevans pushed a commit that referenced this issue Apr 5, 2021

Backport fix for #10020.

bd1284c

ianjevans mentioned this issue Apr 5, 2021

Update EXPLAIN and EXPLAIN ANALYZE topics and output for 21.1 #10240

Merged

ianjevans closed this as completed in #10240 Apr 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explain: Clarify meaning of distribution full vs local #10020

explain: Clarify meaning of distribution full vs local #10020

sheaffej commented Mar 19, 2021

explain: Clarify meaning of distribution full vs local #10020

explain: Clarify meaning of distribution full vs local #10020

Comments

sheaffej commented Mar 19, 2021

Examples:

distribution: local

distribution: full