sql: figure out how to populate range cache in secondary tenants #108763

knz · 2023-08-15T11:04:45Z

In single-tenant world we propagate "misplanned ranges" metadata (which updates the DistSender's range cache of the gateway for the query) in ad-hoc fashion: namely, if we distribute the query but place the TableReaders not on leaseholders of the ranges, then those TableReaders produce the "misplanned ranges" metadata that is sent back to the gateway. Currently, this mechanism doesn't work in secondary tenants for two reasons:

we use random placement of TableReaders
when creating metadata, the TableReaders don't have access to NodeID which is a requirement for the "misplanned ranges" metadata generation.

We need to figure out how we want to improve the situation (perhaps shared-process and separate-process modes will behave differently). We also need to find a way of triggering the range cache population (ideally via SQL) that we often use in tests (one example where this is currently needed is TestSpanResolverUsesCaches).

xref #76378.

Jira issue: CRDB-30627
Epic: CRDB-26687

The text was updated successfully, but these errors were encountered:

108739: roachtest/costfuzz: create failure.log file on demand r=yuzefovich a=yuzefovich This commit makes it so that `failure.log` files are created lazily on demand in `costfuzz` and `unoptimized_query_oracle` roachtests. It seems nicer this way since for successful runs previously we'd see empty failure log files which were a bit confusing. Epic: None Release note: None 108744: build: update aws-cli r=pavelkalinnikov,srosenberg,herkolategan,rail a=renatolabs The version being previously used was quite old, and didn't support setting the `Throughput` parameter on `gp3` volumes. See: #108629 (comment). Epic: none Release note: None 108748: CODEOWNERS: split sql-queries-prs from sql-queries r=knz,rytaft a=michae2 This allows us to add or remove people from the PR rotation while keeping them on the team. Epic: None Release note: None 108752: ui: fix errors thrown by 0.toNumber() r=sjbarag a=sjbarag When generating JS protobuf bindings, protobufjs's `pbjs` CLI accepts a --strict-long (or the more modern --force-long) flag with the following description: > Enfores the use of 'Long' for s-/u-/int64 and s-/fixed64 fields. This uses the `long` package on NPM[^1], which it expects to be present when one of those clients is used at runtime. If `long` isn't available when `pbjs` is executed, it falls back to generating number-based fields where possible. Long is available in all consumers of `crdb-protobuf-client`, and `protobufjs` declares `long` as one of its few build-time and run-time dependencies, so nothing looked alarming. For each field with a type that might be represented by a Long (a heuristic influenced by --strict-long), that field is marked as "long" if and only if the protobufjs-internal util.Long is non-null[^2]. That util.Long value is set via the use of the ``@protobufjs/inquire`` package instead of a standard `require`[^3] though, since the former silently returns `null` instead of throwing an error[^4]. This, again, is not alarming on its own because `protobufjs` depends directly on `long`. The complication comes in with rules_js' use of the pnpm visibility rules, which are very strict when it comes to JS package dependencies. Put simply: packages only have access to the dependencies they directly declare, not transitive dependencies ("phantom dependencies")[^5]. Since ``@protobufjs/inquire`` is a separate package with no direct dependencies of its own[^6], Bazel lays out a node_modules tree where ``@protobufjs/inquire`` has no access to the `long` package. Since inaccessible packages are indistinguishable from nonexistent packages and `inquire` is intended to be silent, it returns `null`. The `pbjs` CLI then assumes Long isn't installed and falls back to number-based fields where a Long isn't strictly required. When combined with protobuf's zero-value, this caused several cases of '0.toNumber is not a function' JS errors at runtime[^7]. We requested a Long, we support Longs, but a client was silently generated without respecting --strict-long. Like with most other phantom dependencies, use pnpm.packageOverrides to declare that `@protobufjs/inquire` does in fact depend on Long. [^1]: https://www.npmjs.com/package/long [^2]: https://github.com/protobufjs/protobuf.js/blob/918ff014efe19f3eb43195ae3d71f7aeb3fcdd73/src/field.js#L153-L157 [^3]: https://github.com/protobufjs/protobuf.js/blob/918ff014efe19f3eb43195ae3d71f7aeb3fcdd73/src/util/minimal.js#L163-L167 [^4]: https://github.com/protobufjs/protobuf.js/blob/918ff014efe19f3eb43195ae3d71f7aeb3fcdd73/lib/inquire/index.js#L4-L17 [^5]: https://docs.aspect.build/rules/aspect_rules_js/docs/pnpm/#hoisting [^6]: https://github.com/protobufjs/protobuf.js/blob/918ff014efe19f3eb43195ae3d71f7aeb3fcdd73/lib/inquire/package.json [^7]: #106318 Fixes: #106318 Release note: Observability pages no longer crash when they encounter zeros (e.g. a session with no memory allocated). 108764: sql/physicalplan: make more tests compatible with secondary tenants r=maryliag,yuzefovich a=knz Informs #76378. Links to #108763. Epic: CRDB-18499 108765: sql: make TestScatterResponse work with secondary tenants r=yuzefovich a=knz First two commits from #108764 Informs #76378. Epic: CRDB-18499 Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Renato Costa <[email protected]> Co-authored-by: Michael Erickson <[email protected]> Co-authored-by: Sean Barag <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>

Two out of three tests tracked by cockroachdb#108763 required only a minor adjustment. However, one of the tests exposed (perhaps previously unknown) difference between single-tenant and multi-tenant configs in terms of populating the range cache. The issue has been repurposed to tracking investigating and addressing that difference. Release note: None

108825: physicalplan: make a couple of tests work with test tenant r=yuzefovich a=yuzefovich Two out of three tests tracked by #108763 required only a minor adjustment. However, one of the tests exposed (perhaps previously unknown) difference between single-tenant and multi-tenant configs in terms of populating the range cache. The issue has been repurposed to tracking investigating and addressing that difference. Addresses: #108763 Epic: None Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>

knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-testing Testing tools and infrastructure labels Aug 15, 2023

knz added this to SQL Queries Aug 15, 2023

github-project-automation bot moved this to Triage in SQL Queries Aug 15, 2023

knz mentioned this issue Aug 15, 2023

sql/physicalplan: make more tests compatible with secondary tenants #108764

Merged

yuzefovich changed the title ~~sql: update distsql phy planning tests to work with secondary tenants~~ sql: figure out how to populate range cache in secondary tenants Aug 16, 2023

exalate-issue-sync bot added the T-multitenant Issues owned by the multi-tenant virtual team label Aug 16, 2023

yuzefovich added T-sql-queries SQL Queries Team and removed T-multitenant Issues owned by the multi-tenant virtual team labels Aug 16, 2023

yuzefovich mentioned this issue Aug 16, 2023

physicalplan: make a couple of tests work with test tenant #108825

Merged

mgartner moved this from Triage to New Backlog in SQL Queries Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: figure out how to populate range cache in secondary tenants #108763

sql: figure out how to populate range cache in secondary tenants #108763

knz commented Aug 15, 2023 •

edited by yuzefovich

Loading

sql: figure out how to populate range cache in secondary tenants #108763

sql: figure out how to populate range cache in secondary tenants #108763

Comments

knz commented Aug 15, 2023 • edited by yuzefovich Loading

knz commented Aug 15, 2023 •

edited by yuzefovich

Loading