-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql, distsql: planning for interleave joins between ancestor and descendant #19853
sql, distsql: planning for interleave joins between ancestor and descendant #19853
Conversation
Great stuff overall! Review status: 0 of 27 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed. pkg/sql/distsql_join.go, line 28 at r2 (raw file):
You should move the existing code in a separate commit so (hopefully) we don't lose history information (plus this makes it hard to review what actually changed). Or just keep it in the same file for now. pkg/sql/distsql_join.go, line 475 at r2 (raw file):
this definitely needs tests pkg/sql/distsql_join.go, line 481 at r2 (raw file):
This makes my head spin so I have to give it some more careful thought. But I'm confused why it's enough to just modify the EndKeys. Don't we need to make sure we don't scan the same rows in another partition? Seems like in this example, the next span that starts at pkg/sql/distsql_join.go, line 489 at r2 (raw file):
where are we invoking PrefixEnd? pkg/sql/distsql_join.go, line 494 at r2 (raw file):
I don't get the 3. There can be multiple columns in each ancestor "section". The key is actually `<1st-tableid>/<1st-indexid>/<1st-index-column-1>/<1st-index-column-2>.../#/<2nd-tableid>/<2nd-indexid>/<2nd-index-column-1>.../#/... pkg/sql/distsql_join.go, line 526 at r2 (raw file):
[nit] don't overload pkg/sql/join.go, line 861 at r2 (raw file):
Why is it helpful to compute this separately? We could check this stuff directly during distsql planning (we are already re-checking them..) I would change pkg/sql/distsqlrun/flow_diagram.go, line 160 at r2 (raw file):
We need to include more details on the post-processing of each table. Comments from Reviewable |
f51713a
to
23588ec
Compare
Added some unit tests to some helper methods. Will be adding some query-level logic tests in the next patch. Review status: 0 of 31 files reviewed at latest revision, 8 unresolved discussions. pkg/sql/distsql_join.go, line 28 at r2 (raw file): Previously, RaduBerinde wrote…
I moved this back to pkg/sql/distsql_join.go, line 475 at r2 (raw file): Previously, RaduBerinde wrote…
Added some tests, let me know what you think. pkg/sql/distsql_join.go, line 481 at r2 (raw file): Previously, RaduBerinde wrote…
You're right, I've updated the comment with a better example and added logic that "cascades" the "fixed" end key. pkg/sql/distsql_join.go, line 489 at r2 (raw file): Previously, RaduBerinde wrote…
Good catch, guess I missed this the first iteration. Added some tests and made this part more correct. pkg/sql/distsql_join.go, line 494 at r2 (raw file): Previously, RaduBerinde wrote…
The 3 is for the The actual # of columns in each ancestor section is given by each The I updated the comment to be more clear. pkg/sql/distsql_join.go, line 526 at r2 (raw file): Previously, RaduBerinde wrote…
Good catch! pkg/sql/join.go, line 861 at r2 (raw file): Previously, RaduBerinde wrote…
The motive for computing this ahead of time was possibly re-using the hint in the non-distributed execution engine. Looking back, this might be a bit premature and it might just be fine to keep all this logic in DistSQL for now. pkg/sql/distsqlrun/flow_diagram.go, line 160 at r2 (raw file): Previously, RaduBerinde wrote…
Good point, updated this. Here's a preview of what it looks like for a simple
Comments from Reviewable |
Review status: 0 of 31 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed. pkg/roachpb/combine_spans.go, line 105 at r4 (raw file):
Not sure how this semantic is useful. Intersection of two sets is not the same with intersecting all elements in the sets. We want to intersect two lists of spans (one for each table), I don't see why we would intersect spans from the same table E.g. if the join has a pkg/sql/distsql_join.go, line 494 at r2 (raw file): Previously, richardwu (Richard Wu) wrote…
Makes sense, thanks! There is an asymmetry in that we only fix up pkg/sql/distsql_join.go, line 372 at r4 (raw file):
It's easy to reason about this function if the space covered by the spans does not change (i.e. we just move pieces between partitions). But it feels wrong that we may be extending spans. What if we have: parent table key K1, child table key K2 and the join has WHERE K1=1 AND K2=2. We would have the span I still believe that this entire thing would be cleaner and less error-prone if we populate the interleave reader with two sets of spans, for each table. Then the reader goes through both at the same time (as if it's merging them) but it knows which rows to ignore from each side. pkg/sql/distsqlrun/flow_diagram.go, line 160 at r2 (raw file): Previously, richardwu (Richard Wu) wrote…
We should prepend a "Left/Right" prefix so things are more readable. pkg/sql/distsqlrun/flow_diagram.go, line 29 at r4 (raw file):
Why this alias? Comments from Reviewable |
pkg/sql/distsql_join.go, line 372 at r4 (raw file): Previously, RaduBerinde wrote…
I see your point, good catch. I'm beginning to lean towards the idea of pushing down 2 sets of spans for each table to the processor. It'll give us the flexibility to not have to push down Now that I think about it, an intersection of spans from the parent/child tables is not quite correct for the spans generated from
this would incorrectly yield the intersection As for aligning split points: I am unsure if there's a cleaner way to do this in order to allow each Even with the two set of spans, one per table, we'll need to align the split points of each table in a similar fashion (technically, we only need to align the split points of the child's set of spans since the parent). Based on the following assumptions:
The revised algorithm for adjusting the end key would be the following (for the child table):
Here's an example where we have a filter
Suppose the range split at
Since parent row
In some cases, fixing the span may "consume" the other partition i.e. in the case
When we try to fix the
and since there are no children rows being scanned on node 2, we can completely remove that partition (step 4). Comments from Reviewable |
pkg/sql/distsql_join.go, line 372 at r4 (raw file): Previously, richardwu (Richard Wu) wrote…
I wonder if it wouldn't be easier to add this logic to Comments from Reviewable |
23588ec
to
62481c2
Compare
Review status: 0 of 40 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed. pkg/roachpb/merge_spans.go, line 105 at r4 (raw file): Previously, RaduBerinde wrote…
Removed since this is no longer necessary (nor correct). pkg/sql/distsql_join.go, line 494 at r2 (raw file): Previously, RaduBerinde wrote…
That's correct: child rows always come after parent rows, so even if we read child rows first it will not be joined to anything and outputted. Beyond inner joins, correctness is a factor so I've taken your suggestion to move the "fixing" (pushing) pkg/sql/distsql_join.go, line 372 at r4 (raw file): Previously, RaduBerinde wrote…
As discussed offline, end keys can now be pushed on a per-input span basis in
pkg/sql/distsqlrun/flow_diagram.go, line 160 at r2 (raw file): Previously, RaduBerinde wrote…
I made it so that this only outputs info for the first two tables (left and right). Once we have more than 2 tables being joined, we can update this. pkg/sql/distsqlrun/flow_diagram.go, line 29 at r4 (raw file): Previously, RaduBerinde wrote…
Oops, fixed. Comments from Reviewable |
85ffb54
to
5a10a3a
Compare
So as discussed offline, I've introduced the concept of "join intervals" for ancestor spans in order to correspond each descendant span to its corresponding ancestor span. Take a look at Review status: 0 of 40 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed. Comments from Reviewable |
953cad7
to
eba2a85
Compare
I've added a decent number of logic tests that should cover all edge cases I can think of. Surprisingly only 1 bug surfaced which is both relieving but borderline unusual. Also updated docstrings of the new functions/concepts as per @RaduBerinde's input. One slight deviation from the RFC: parent-grandchild joins (or grandgrandchild, etc.) are supported now. Also ran a few benchmarks which show that this performs much better than interleave joins before and comparable to regular joins in a shallow interleave hierarchy. Ready for a full-on review! |
Great work! Would be great to get another pair of eyes on this though. Reviewed 1 of 17 files at r1, 3 of 21 files at r3, 17 of 34 files at r5, 18 of 18 files at r6. pkg/sql/distsql_join.go, line 101 at r6 (raw file):
This should be an error (I know it shouldn't happen, but if we have a bug and it does, an error is preferable) pkg/sql/distsql_join.go, line 531 at r6 (raw file):
I think this section should be moved to pkg/sql/distsql_join.go, line 541 at r6 (raw file):
[nit] alignment pkg/sql/distsql_join.go, line 562 at r6 (raw file):
extra "do not belong" pkg/sql/distsql_join.go, line 580 at r6 (raw file):
with which the child row(s) are supposed to join pkg/sql/distsql_join.go, line 584 at r6 (raw file):
repeated application (it's not really recursive) pkg/sql/distsql_join.go, line 586 at r6 (raw file):
Please add a TODO here to investigate making this more efficient (we keep looking for overlaps from the beginning every time). Perhaps pre-sorting the join spans would help. pkg/sql/distsql_join.go, line 596 at r6 (raw file):
parentJoinSpan (same below) pkg/sql/distsql_join.go, line 597 at r6 (raw file):
these start in the same place, this probably needs to be indented a bit pkg/sql/distsql_join.go, line 644 at r6 (raw file):
[nit] can't we use pkg/sql/distsql_join_test.go, line 295 at r6 (raw file):
Nice! pkg/sql/distsql_physical_planner.go, line 2045 at r6 (raw file):
Very nice cleanups in this file! pkg/sql/distsqlrun/flow_diagram.go, line 441 at r6 (raw file):
It's nice to sort them, but how was this non-deterministic before? It seems that for the same spec, we would always be adding them in the same order. Comments from Reviewable |
682d31a
to
bd2fa78
Compare
Review status: 3 of 18 files reviewed at latest revision, 13 unresolved discussions. pkg/sql/distsqlrun/flow_diagram.go, line 441 at r6 (raw file): Previously, RaduBerinde wrote…
After much digging around I realized I was using a map to keep track of unique node IDs after aligning the partitioned spans when creating an interleaved join. Removed this sorting since distsql plans should still be deterministic. pkg/sql/distsql_join.go, line 101 at r6 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/distsql_join.go, line 531 at r6 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/distsql_join.go, line 541 at r6 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/distsql_join.go, line 562 at r6 (raw file): Previously, RaduBerinde wrote…
Good catch, thanks! pkg/sql/distsql_join.go, line 580 at r6 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/distsql_join.go, line 584 at r6 (raw file): Previously, RaduBerinde wrote…
Good point. pkg/sql/distsql_join.go, line 586 at r6 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/distsql_join.go, line 596 at r6 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/distsql_join.go, line 597 at r6 (raw file): Previously, RaduBerinde wrote…
This should be fixed now 🤞 pkg/sql/distsql_join.go, line 644 at r6 (raw file): Previously, RaduBerinde wrote…
Good point, I guess it doesn't really matter which end we take from. Comments from Reviewable |
bd2fa78
to
162775d
Compare
Reviewed 2 of 21 files at r3, 3 of 34 files at r5, 2 of 18 files at r6, 2 of 22 files at r7, 13 of 14 files at r8. pkg/roachpb/data.go, line 1237 at r7 (raw file):
Can remove this comment, it is pretty obvious. pkg/roachpb/data.go, line 1239 at r7 (raw file):
This one as well. Pretty clear what's going on. pkg/roachpb/data.go, line 1256 at r7 (raw file):
Remove. pkg/roachpb/data.go, line 1295 at r7 (raw file):
I don't mean to pick on your code comments, so I will highlight that this comment is outstanding, and the kind of comment that should be there - something that adds value beyond the surface meaning of the function name. pkg/sql/distsql_physical_planner.go, line 2045 at r6 (raw file): Previously, RaduBerinde wrote…
Seconded! pkg/sql/distsql_plan_join.go, line 35 at r8 (raw file):
I'm not sure about the default true setting. @RaduBerinde, @petermattis ? pkg/sql/distsql_plan_join.go, line 69 at r8 (raw file):
super nitty nit (sorry): s/remapping/to remap pkg/sql/distsql_plan_join.go, line 75 at r8 (raw file):
since the metadata...? pkg/sql/distsql_plan_join.go, line 80 at r8 (raw file):
hmm, please unsplit the comment, or explicitly demarcate it with "first comment..." "...second comment". pkg/sql/distsql_plan_join.go, line 90 at r8 (raw file):
Why? Unclear to me why this limit is being finicked like this. pkg/sql/distsql_plan_join.go, line 432 at r8 (raw file):
pkg/sql/distsql_plan_join.go, line 480 at r8 (raw file):
@RaduBerinde is this correct? I think so, but my brain is melting. pkg/sql/join.go, line 873 at r8 (raw file):
nit: put the pkg/sql/distsqlrun/flow_diagram.go, line 183 at r8 (raw file):
can you amend this to display an optional prefix, instead of just the generic "Out: " prefix? It would be nice if you could say "left out" and "right out" as right now the pkg/sql/logictest/testdata/logic_test/interleaved_join, line 3 at r8 (raw file):
This testing strategy is outstanding. pkg/sql/distsql_join.go, line 44 at r6 (raw file):
I'd add a check above, and an explicit error return, to alleviate potential future panics. Comments from Reviewable |
Review status: 17 of 18 files reviewed at latest revision, 16 unresolved discussions, all commit checks successful. pkg/roachpb/data.go, line 1237 at r7 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
linter says no! (it's exported) pkg/sql/distsql_plan_join.go, line 35 at r8 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Hm, good question.. We don't have many tests exercising joins between interleaved tables (e.g. none of the "production" workloads do it). pkg/sql/distsql_plan_join.go, line 480 at r8 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
I think so too.. Comments from Reviewable |
162775d
to
def5409
Compare
Review status: 13 of 18 files reviewed at latest revision, 17 unresolved discussions. pkg/roachpb/data.go, line 1237 at r7 (raw file): Previously, RaduBerinde wrote…
I've updated it to be more precise and non-trivial. pkg/roachpb/data.go, line 1239 at r7 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Good point. pkg/roachpb/data.go, line 1295 at r7 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Gracias pkg/sql/distsql_plan_join.go, line 35 at r8 (raw file): Previously, RaduBerinde wrote…
So the new "interleaved joins" are always better (from benchmarking on a GCE cluster) than regular merge joins for interleaved tables. This follows from the fact that a regular merge join for two interleaved tables always reads the entire interleaved hierarchy (but concurrently in two processors: one for the left and one for the right table). An interleaved join also reads the entire interleaved hierarchy once so at worst it does the same amount of work scanning. The merging part always needs to co-locate rows that are to be joined, regardless of merge vs interleaved join. So even if interleaved joins end up scanning rows for either table on a different node, merge joins would have to do the same when streaming from the reader to the joiner. At best, interleaved joins avoid the pseudo-randomness of hashing the equality columns in order to co-locate rows for merging that merge joins have to do. At worst, interleaved joins will generate the same amount of RPC traffic. We could leave this pkg/sql/distsql_plan_join.go, line 69 at r8 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Oops, how about "... very useful for computing ordering and remapping..."? pkg/sql/distsql_plan_join.go, line 75 at r8 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Oops (see below). pkg/sql/distsql_plan_join.go, line 80 at r8 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
I must have accidentally spliced the comment from after above. Fixed. pkg/sql/distsql_plan_join.go, line 90 at r8 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Added a comment for clarity. pkg/sql/distsql_plan_join.go, line 432 at r8 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Ah good point, done! pkg/sql/join.go, line 873 at r8 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Much cleaner 👍 pkg/sql/distsqlrun/flow_diagram.go, line 183 at r8 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
pkg/sql/distsql_join.go, line 44 at r6 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Done. Comments from Reviewable |
def5409
to
4b5bac3
Compare
Review status: 3 of 18 files reviewed at latest revision, 16 unresolved discussions, some commit checks failed. pkg/sql/distsql_plan_join.go, line 35 at r8 (raw file): Previously, richardwu (Richard Wu) wrote…
I think the concern is correctness / regressions. But having it off for a period doesn't really help. I'm ok with enabling it. pkg/sql/distsqlrun/flow_diagram.go, line 183 at r8 (raw file): Previously, richardwu (Richard Wu) wrote…O_o this is nice! We probably don't need the "Left" and "Right" with the new "-----" delimiters, it's fine either way though. Comments from Reviewable |
4b5bac3
to
162775d
Compare
Still ! Thanks for addressing all the issues! Review status: 17 of 18 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed. pkg/sql/distsql_plan_join.go, line 35 at r8 (raw file): Previously, RaduBerinde wrote…
I'm fine with leaving it default on. You're right that having it off doesn't really help, and we have plenty of time before release. This only makes getting the interleaved benchmark shipped and running nightly more important, though. pkg/sql/distsql_plan_join.go, line 69 at r8 (raw file): Previously, richardwu (Richard Wu) wrote…
Fine by me! pkg/sql/distsqlrun/flow_diagram.go, line 183 at r8 (raw file): Previously, RaduBerinde wrote…
Looks good to me, as does Radu's suggestion. Comments from Reviewable |
2b31323
to
e4bfc7c
Compare
So as I was reducing down the sizes of the logic test tables (since they were a bit too big for testlogicrace), I realized I forgot to add test cases for swapping the positions of parent-child such that the child table is on the left and the parent table is on the right (previously it assumed the parent was always on the left, doh). I added the corrective code into Review status: 2 of 19 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. Comments from Reviewable |
Release notes: none
descendant Release notes: Performance improvement: equality joins on the entire interleave prefix between parent and (not necessarily direct) child interleaved tables are faster now.
e4bfc7c
to
69165ab
Compare
Reviewed 1 of 34 files at r5, 1 of 16 files at r9, 16 of 16 files at r10. Comments from Reviewable |
🎉 |
The first iteration (and goal of this RFC) of full interleaved prefix joins on a parent-child table was officially completed when #19853 was merged into master. The outstanding general cases outlined in the RFC can be incrementally introduced with or without additional RFCs. Namely: 1. Multi-table joins 2. Prefix and subset joins (#20661) 3. Sibling and common ancestor joins Avoiding splits in between interleaved children rows (or rather, encouraging splits right before a root parent table) is still outstanding.
For two tables
parent
andchild
A query on the full interleave prefix
uses the
InterleaveReaderJoiner
.A query on just a subset of the interleave prefix will simply default back to a
MergeJoiner
.Fixes #18948