[compute] map LIR to dataflow #29848

mgree · 2024-10-03T20:51:25Z

This PR introduces two new introspection sources and two new introspection views. These novel forms of introspection allow us to map LIR operators down to dataflow operators; we can now attribute existing introspection data about dataflows to LIR operators.

Introspection

The two new sources are ComputeLog sources that run per worker.

`mz_introspection.mz_compute_dataflow_global_ids_per_worker`

Maps dataflow identifiers to the global IDs used internally for things that get built as dataflows.

   name    | nullable | type  | comment
-----------+----------+-------+---------
 id        | f        | uint8 | dataflow ID
 worker_id | f        | uint8 |
 global_id | f        | text  |

`mz_introspection.mz_compute_lir_mapping_per_worker`

Tracks attribution information for LIR terms (in terms of FlatPlan).

       name        | nullable | type  | comment
-------------------+----------+-------+---------
 global_id         | f        | text  |
 lir_id            | f        | uint8 | AST node number
 worker_id         | f        | uint8 |
 operator          | f        | text  | rendered string
 parent_lir_id     | t        | uint8 | parent AST node number
 nesting           | f        | uint2 | nesting (used for indentation)
 operator_id_start | t        | uint8 | first dataflow operator (inclusive)
 operator_id_end   | t        | uint8 | last dataflow oeprator (exclusive)

Views

We use two introspection views to work with these per-worker sources. It ought to be the case that all workers agree about this metadata (though they may not agree on, say, the amount of memory a dataflow operator is using!).

So: these are just views that set worker_id = 0.

`mz_introspection.mz_dataflow_global_ids`

   name    | nullable | type  | comment
-----------+----------+-------+---------
 id        | f        | uint8 |
 global_id | f        | text  |

`mz_introspection.mz_lir_mapping`

       name        | nullable | type  | comment
-------------------+----------+-------+---------
 global_id         | f        | text  |
 lir_id            | f        | uint8 |
 operator          | f        | text  |
 parent_lir_id     | t        | uint8 |
 nesting           | f        | uint2 |
 operator_id_start | t        | uint8 |
 operator_id_end   | t        | uint8 |

Attributing to LIR

We can see a sample interaction as follows:

CREATE TABLE t(x INT NOT NULL, y INT, z TEXT);
CREATE VIEW v AS
  SELECT t1.x AS x, t1.z AS z1, t2.z AS z2
  FROM t AS t1, t AS t2
  WHERE t1.x = t2.y;
CREATE INDEX v_idx_x ON v(x);

\! sleep 1

SELECT global_id, lir_id, REPEAT(' ', MAX(nesting) * 2) || operator AS operator, SUM(duration_ns) AS duration, SUM(count) AS count
    FROM           mz_introspection.mz_lir_mapping mlm
         LEFT JOIN mz_introspection.mz_compute_operator_durations_histogram mcodh
         ON (mlm.operator_id_start <= mcodh.id AND mcodh.id < mlm.operator_id_end)
GROUP BY global_id, lir_id, operator
ORDER BY global_id, lir_id DESC;

which yields an output like:

 global_id | lir_id |          operator          | duration | count
-----------+--------+----------------------------+----------+-------
 u2        | 4      | Join::Differential 1 » 3   |  1261568 |    17
 u2        | 3      |   Arrange 2                |   466944 |    16
 u2        | 2      |     Get::Collection u1     |    69632 |     7
 u2        | 1      |   Arrange 0                |   417792 |    16
 u2        | 0      |     Get::Collection u1     |    73728 |     7
 u3        | 6      | Arrange 5                  |   454656 |    17
 u3        | 5      |   Get::PassArrangements u2 |          |

A place to store our bicycles

~~Should these be mz_internal or mz_introspection?~~ It needs to be mz_introspection to pass tests.
Should I just do the indentation up front?
~~Should I track other metadata (e.g., parents/children of LIR nodes?)~~ I added parent_lir_id to allow for more complex reconstructions.

Motivation

This PR adds a known-desirable feature.

The first step in attribution/plan profiling. https://github.com/MaterializeInc/database-issues/issues/6551

Tips for reviewer

I've tried to break things down so that each commit is sensible, but there's a little bit of back-and-forth in exactly what I store in the views.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

sthm · 2024-10-28T14:28:20Z

I'd need to play around with it to get a better intuition how this works in practice. But from the description above, it seems to provide a an easy way to map operator ids back to object ids in a way that preserves the hierarchy and dependencies of operators.

The example shows how this mapping can be used to add operator level statistics mz_compute_operator_durations_histogram back to something that looks like a query plan. From a quick skim of the documentation, it seems like it would also work for mz_arrangement_sizes and even more complicated queries like operator skew from dataflow troubleshooting, which is great.

…perator rendering

…on so we can reconstruct the tree in SQL if need be

(rebased to use now merged timely 0.13.0)

@antiguru

…not logging, per @antiguru

antiguru

Looks good, thank you! Left some inline comments.

src/compute/src/logging/compute.rs

src/compute/src/render.rs

antiguru · 2024-11-06T08:14:06Z

test/sqllogictest/introspection/attribution_sources.slt

Do you want to include your 'Attribution to Lir' example here, without duration/count?

Follow-up to MaterializeInc#29848

mgree · 2024-12-03T20:50:49Z

Turned back on in #30692.

mgree force-pushed the lir-to-dataflow-address-mapping branch 7 times, most recently from 1c15839 to 146a170 Compare October 16, 2024 15:15

mgree force-pushed the lir-to-dataflow-address-mapping branch 4 times, most recently from 002cd64 to a1447ef Compare October 23, 2024 18:07

antiguru self-requested a review October 23, 2024 18:34

mgree force-pushed the lir-to-dataflow-address-mapping branch 2 times, most recently from b42194c to d5ffe7b Compare October 28, 2024 19:38

antiguru requested a review from teskje October 29, 2024 08:50

mgree force-pushed the lir-to-dataflow-address-mapping branch 13 times, most recently from 2eb9ff2 to a79f9e6 Compare October 31, 2024 01:08

mgree marked this pull request as ready for review October 31, 2024 01:44

mgree added 9 commits November 4, 2024 09:23

use timely-differential#593 to use operator spans instead of addresses

ceb1f29

track nesting (to support indentation in SQL), remove newlines from o…

7353e3b

…perator rendering

move metadata into a separate struct, add parent LirId node informati…

6914058

…on so we can reconstruct the tree in SQL if need be

renumber global IDs in test (due to rebase over new builtins)

2018597

(rebased to use now merged timely 0.13.0)

satisfy linter

d8cb1dd

fixup tests with correct oids

2cb99e2

fix docs lint, make everything internal (for now, at least)

9ef067a

move back to mz_introspection, per @antiguru

c4466bb

newtype LirId; shrink ComputeEvent::LirMapping per @antiguru

cce4908

mgree force-pushed the lir-to-dataflow-address-mapping branch 5 times, most recently from 7d11230 to 4ac784a Compare November 4, 2024 19:32

address feedback (other than LirMapping fix)

eeccbbf

mgree force-pushed the lir-to-dataflow-address-mapping branch from 4ac784a to eeccbbf Compare November 4, 2024 19:33

batch lir metadata mapping, shrink ComputeEvent, and avoid work when …

e94055a

…not logging, per @antiguru

mgree force-pushed the lir-to-dataflow-address-mapping branch from 67b5f26 to e94055a Compare November 4, 2024 20:28

remove oid from test, per @def-

1b111af

mgree force-pushed the lir-to-dataflow-address-mapping branch from 21589bc to 1b111af Compare November 4, 2024 21:43

mgree requested a review from antiguru November 4, 2024 22:37

antiguru approved these changes Nov 6, 2024

View reviewed changes

add attribution queries, fix race in SLT, @antiguru's nits

6364363

mgree merged commit c094caf into MaterializeInc:main Nov 6, 2024
84 checks passed

mgree deleted the lir-to-dataflow-address-mapping branch November 6, 2024 18:14

def- added a commit to def-/materialize that referenced this pull request Nov 7, 2024

testdrive: Adapt/remove old kafka src syntax files

3540ecf

Follow-up to MaterializeInc#29848

teskje mentioned this pull request Nov 8, 2024

Disable the new instrospection sources #30393

Merged

5 tasks

teskje mentioned this pull request Nov 18, 2024

compute: replace FlatPlan with RenderPlan #30500

Merged

5 tasks

mgree mentioned this pull request Dec 23, 2024

[docs] document LIR attribution #30899

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compute] map LIR to dataflow #29848

[compute] map LIR to dataflow #29848

mgree commented Oct 3, 2024 •

edited

Loading

sthm commented Oct 28, 2024

antiguru left a comment

antiguru Nov 6, 2024

mgree Nov 6, 2024

mgree commented Dec 3, 2024

[compute] map LIR to dataflow #29848

[compute] map LIR to dataflow #29848

Conversation

mgree commented Oct 3, 2024 • edited Loading

Introspection

mz_introspection.mz_compute_dataflow_global_ids_per_worker

mz_introspection.mz_compute_lir_mapping_per_worker

Views

mz_introspection.mz_dataflow_global_ids

mz_introspection.mz_lir_mapping

Attributing to LIR

A place to store our bicycles

Motivation

Tips for reviewer

Checklist

sthm commented Oct 28, 2024

antiguru left a comment

Choose a reason for hiding this comment

antiguru Nov 6, 2024

Choose a reason for hiding this comment

mgree Nov 6, 2024

Choose a reason for hiding this comment

mgree commented Dec 3, 2024

mgree commented Oct 3, 2024 •

edited

Loading

`mz_introspection.mz_compute_dataflow_global_ids_per_worker`

`mz_introspection.mz_compute_lir_mapping_per_worker`

`mz_introspection.mz_dataflow_global_ids`

`mz_introspection.mz_lir_mapping`