-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[compute] map LIR to dataflow #29848
[compute] map LIR to dataflow #29848
Conversation
1c15839
to
146a170
Compare
002cd64
to
a1447ef
Compare
I'd need to play around with it to get a better intuition how this works in practice. But from the description above, it seems to provide a an easy way to map operator ids back to object ids in a way that preserves the hierarchy and dependencies of operators. The example shows how this mapping can be used to add operator level statistics |
b42194c
to
d5ffe7b
Compare
2eb9ff2
to
a79f9e6
Compare
…perator rendering
…on so we can reconstruct the tree in SQL if need be
(rebased to use now merged timely 0.13.0)
7d11230
to
4ac784a
Compare
4ac784a
to
eeccbbf
Compare
67b5f26
to
e94055a
Compare
21589bc
to
1b111af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you! Left some inline comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to include your 'Attribution to Lir' example here, without duration/count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
Turned back on in #30692. |
This PR introduces two new introspection sources and two new introspection views. These novel forms of introspection allow us to map LIR operators down to dataflow operators; we can now attribute existing introspection data about dataflows to LIR operators.
Introspection
The two new sources are
ComputeLog
sources that run per worker.mz_introspection.mz_compute_dataflow_global_ids_per_worker
Maps dataflow identifiers to the global IDs used internally for things that get built as dataflows.
mz_introspection.mz_compute_lir_mapping_per_worker
Tracks attribution information for LIR terms (in terms of
FlatPlan
).Views
We use two introspection views to work with these per-worker sources. It ought to be the case that all workers agree about this metadata (though they may not agree on, say, the amount of memory a dataflow operator is using!).
So: these are just views that set
worker_id = 0
.mz_introspection.mz_dataflow_global_ids
mz_introspection.mz_lir_mapping
Attributing to LIR
We can see a sample interaction as follows:
which yields an output like:
A place to store our bicycles
Should these beIt needs to bemz_internal
ormz_introspection
?mz_introspection
to pass tests.Should I just do the indentation up front?
Should I track other metadata (e.g., parents/children of LIR nodes?)I addedparent_lir_id
to allow for more complex reconstructions.Motivation
The first step in attribution/plan profiling. https://github.com/MaterializeInc/database-issues/issues/6551
Tips for reviewer
I've tried to break things down so that each commit is sensible, but there's a little bit of back-and-forth in exactly what I store in the views.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.