-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: implement index join #65905
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
Comments
jordanlewis
added
the
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
label
May 31, 2021
jordanlewis
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
and removed
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
labels
May 31, 2021
This seems pretty valuable and we should add it in to our work opportunistically in this release if we have the time. |
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 10, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 10, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 12, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 12, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 14, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 15, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 15, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 15, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 15, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 15, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 16, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Jul 17, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Aug 9, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Aug 10, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Aug 10, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Aug 10, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Aug 11, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `spanAssembler` operators queue up spans until the memory allocated for the span keys reaches a preset limit (default 4MB). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
DrewKimball
added a commit
to DrewKimball/cockroach
that referenced
this issue
Aug 11, 2021
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `ColIndexJoin` operator queues up input rows until the memory footprint of the rows reaches a preset limit (default 4MB for parity with the row engine). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
See #65904 for more background.
Index join is the simplest of the
joinReader
algorithms. We should implement it in the vectorized engine. Besides the reasons that we already know for why the vectorized model is better (batch processing, noDatum
allocations, colocated memory, no expensive materializer/columnarizer pairs), span generation is one of the most expensive parts ofjoinReader
, and it can be sped significantly up with vectorized.Span generation requires one type switch and interface lookup per row per key column, and requires at minimum two array allocations (one start key, one end key, and any resizing) per row. Switching to vectorized would remove the type switches altogether, since we can statically plan the encoders and run them sequentially one at a time per batch. I think the allocation behavior could conceivably be improved with flat bytes, but I'm not sure.
For example, a primary key (int, date) that gets index joined would be planned by 2 span generation ops (
encodeSpanIntOpt
,encodeSpanDateOp
) followed by a modifiedcolBatchScan
whose spans are selected from a projected column.Perhaps the optimizer should be involved here? Should we plan index/lookup joins as a projection of the spans to lookup first, followed by some lookup op?
The text was updated successfully, but these errors were encountered: