Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
68430: sql,kv: introduce Streamer API and use it for index joins in some cases r=yuzefovich a=yuzefovich **execinfrapb: add a helper for index joins based on the JoinReaderSpec** Release note: None **rowexec: refactor the joinReader to not exceed the batch size** The joinReader operates by buffering the input rows until a certain size limit (which is dependent on the strategy). Previously, the buffering would stop right after the size limit is reached or exceeded, and this commit refactors the code to not exceed the limit except in a case of a single large row. This is what we already do for vectorized index joins. Release note: None **sql,kv: introduce Streamer API and use it for index joins in some cases** This commit introduces the Streamer API (see https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20210617_index_lookups_memory_limits.md) as well as its implementation for the simplest case - when requests are unique and can be served in any order. It additionally hooks up the implementation to be used by the index joins in both execution engines. There are three main pieces that this commit adds: 1. the `Streamer` struct itself. It is largely the same as described in the RFC. Some notable changes are: - `Cancel` is renamed to `Close` and is made blocking to ensure that all goroutines kicked off by the `Streamer` exit before `Close` returns. - `Shrink` has been removed from the `budget` struct (see below for more details). - furthermore, the `budget` interface has been unexported and the `streamer` has been tightly coupled with the `budget`'s implementation. - the TODO about collecting DistSQL metadata is removed because we are collecting the LeafTxnFinalState already when using the LeafTxn. 2. the limited `Streamer` implementation - only `OutOfOrder` mode is supported when the requests are unique. Notably, buffering and caching of the results is not implemented yet. 3. `TxnKVStreamer` component that sits below the SQL fetchers, uses the `Streamer`, and is an adapter from `BatchResponse`s to key/value pairs that fetchers understand. Although not used at the moment, `TxnKVStreamer` is written under the assumption that a single result can satisfy multiple requests. The memory budget of the `Streamer` is utilized lazily. The RFC was calling for making a large reservation upfront and then shrinking the budget if we see that we don't need that large reservation; however, while working on this I realized that lazy reservations are a better fit for this. The `Streamer` can reserve up to the specified limit (determined by `distsql_workmem` variable) against the root monitor (in the degenerate case of a single large row more memory will be reserved). The reservation then never shrinks under the assumption that if the reservation has gotten large, it means it was needed for higher concurrency (or large responses), and it is likely to be needed for the same reasons in the future. The layout of the main components of the `Streamer` implementation: - in `Enqueue` we have a logic similar to what DistSender does in order to split each request (that might span multiple ranges) into single-range requests. Those sub-requests are batched together to be evaluated by a single `BatchRequest`. - `workerCoordinator.mainLoop` is responsible for taking single-range batches, estimating the corresponding response size, and issuing requests to be evaluated in parallel while adhering to the provided memory budget. - `workerCoordinator.performRequestAsync` is responsible for populating the `BatchRequest` and then processing the results while updating the memory budget. Current known limitations that will be addressed in the follow-up work: - at the moment a single row can be split across multiple BatchResponses when TargetBytes limit is reached when the table has multiple column families; therefore, we use the streamer only for single column family cases. We will expand the KV API shortly to not split the rows in multiple column family cases. - manual refresh of spans when `ReadWithinUncertaintyIntervalError` is encountered by a single streamer in a single flow is not implemented. It is an optimization that is considered a must for the final implementation in order to not regress on simple cases in terms of retriable errors. This will be implemented shortly as a follow-up. - I'm thinking that eventually we probably want to disable the batch splitting done by the DistSender to eliminate unnecessary blocking when the streamer's splitting was incorrect. This would give us some performance improvements in face of range boundary changes, but it doesn't seem important enough for the initial implementation. Addresses: #54680. Release note: None 74179: sql: check equivalent constraint when creating hash index r=chengxiong-ruan a=chengxiong-ruan Fixes #68031 Previously we only try to create constraint for shard column if it's newly created. We check duplicate constraint for shard column when `Alter Primary Key` and `Create Index`, however the check is simply a name check. This pr adds logic to check equivalent constraint by checking if the formatted expression string is the same. With this logic we can try to create the constraint no matter if a shard column is newly created or not. With this fix, we also don't need to expose the constraint through `SHOW CREATE TABLE` result since we make sure the constraint is created or skipped if one already exists. Release note (sql change): Before this change, the check constraint on shard column used by hash sharded index was printed in the corresponding `SHOW CREATE TABLE`. The constraint had been shown because cockroach lacked logic to ensure that shard columns which are part of hash sharded indexs always have the check constraint which the optimizer relies on to achieve properly optimized plans on hash sharded indexes. We no longer display this constraint in `SHOW CREATE TABLE` as it is now implied by the `USING HASH` clause on the relevant index. 74710: colexec: clean up the usage of the binary overload helper r=yuzefovich a=yuzefovich Previously, for projection and selection operators we would always pass in `execgen.OverloadHelper`. Up until recently it served several purposes, but now it is only used in order to pass down the binary function (as well as the eval context) for the cases when we fallback to the row-by-row computation. This commit takes advantage of this observation and cleans up the situation: now the helper is only passed when it is needed which allows us to remove a lot of redundant code. Additionally, the helper itself has been renamed from `OverloadHelper` to `BinaryOverloadHelper`. Release note: None 74750: build: Add PATH to .bazelrc for dev builds r=benbardin a=benbardin Release note: none Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Chengxiong Ruan <[email protected]> Co-authored-by: Ben Bardin <[email protected]>
- Loading branch information