Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: figure out how to populate range cache in secondary tenants #108763

Open
knz opened this issue Aug 15, 2023 · 0 comments
Open

sql: figure out how to populate range cache in secondary tenants #108763

knz opened this issue Aug 15, 2023 · 0 comments
Labels
A-testing Testing tools and infrastructure C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@knz
Copy link
Contributor

knz commented Aug 15, 2023

In single-tenant world we propagate "misplanned ranges" metadata (which updates the DistSender's range cache of the gateway for the query) in ad-hoc fashion: namely, if we distribute the query but place the TableReaders not on leaseholders of the ranges, then those TableReaders produce the "misplanned ranges" metadata that is sent back to the gateway. Currently, this mechanism doesn't work in secondary tenants for two reasons:

  • we use random placement of TableReaders
  • when creating metadata, the TableReaders don't have access to NodeID which is a requirement for the "misplanned ranges" metadata generation.

We need to figure out how we want to improve the situation (perhaps shared-process and separate-process modes will behave differently). We also need to find a way of triggering the range cache population (ideally via SQL) that we often use in tests (one example where this is currently needed is TestSpanResolverUsesCaches).

xref #76378.

Jira issue: CRDB-30627
Epic: CRDB-26687

@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-testing Testing tools and infrastructure labels Aug 15, 2023
@knz knz added this to SQL Queries Aug 15, 2023
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Aug 15, 2023
craig bot pushed a commit that referenced this issue Aug 15, 2023
108739: roachtest/costfuzz: create failure.log file on demand r=yuzefovich a=yuzefovich

This commit makes it so that `failure.log` files are created lazily on demand in `costfuzz` and `unoptimized_query_oracle` roachtests. It seems nicer this way since for successful runs previously we'd see empty failure log files which were a bit confusing.

Epic: None

Release note: None

108744: build: update aws-cli r=pavelkalinnikov,srosenberg,herkolategan,rail a=renatolabs

The version being previously used was quite old, and didn't support setting the `Throughput` parameter on `gp3` volumes.

See: #108629 (comment).

Epic: none

Release note: None

108748: CODEOWNERS: split sql-queries-prs from sql-queries r=knz,rytaft a=michae2

This allows us to add or remove people from the PR rotation while keeping them on the team.

Epic: None
Release note: None

108752: ui: fix errors thrown by 0.toNumber() r=sjbarag a=sjbarag

When generating JS protobuf bindings, protobufjs's `pbjs` CLI accepts a --strict-long (or the more modern --force-long) flag with the following description:

  > Enfores the use of 'Long' for s-/u-/int64 and s-/fixed64 fields.

This uses the `long` package on NPM[^1], which it expects to be present when one of those clients is used at runtime. If `long` isn't available when `pbjs` is executed, it falls back to generating number-based fields where possible. Long is available in all consumers of `crdb-protobuf-client`, and `protobufjs` declares `long` as one of its few build-time and run-time dependencies, so nothing looked alarming.

For each field with a type that might be represented by a Long (a heuristic influenced by --strict-long), that field is marked as "long" if and only if the protobufjs-internal util.Long is non-null[^2]. That util.Long value is set via the use of the ``@protobufjs/inquire`` package instead of a standard `require`[^3] though, since the former silently returns `null` instead of throwing an error[^4]. This, again, is not alarming on its own because `protobufjs` depends directly on `long`.

The complication comes in with rules_js' use of the pnpm visibility rules, which are very strict when it comes to JS package dependencies. Put simply: packages only have access to the dependencies they directly declare, not transitive dependencies ("phantom dependencies")[^5]. Since ``@protobufjs/inquire`` is a separate package with no direct dependencies of its own[^6], Bazel lays out a node_modules tree where ``@protobufjs/inquire`` has no access to the `long` package. Since inaccessible packages are indistinguishable from nonexistent packages and `inquire` is intended to be silent, it returns `null`. The `pbjs` CLI then assumes Long isn't installed and falls back to number-based fields where a Long isn't strictly required.

When combined with protobuf's zero-value, this caused several cases of '0.toNumber is not a function' JS errors at runtime[^7]. We requested a Long, we support Longs, but a client was silently generated without respecting --strict-long.

Like with most other phantom dependencies, use pnpm.packageOverrides to declare that `@protobufjs/inquire` does in fact depend on Long.

[^1]: https://www.npmjs.com/package/long
[^2]: https://github.com/protobufjs/protobuf.js/blob/918ff014efe19f3eb43195ae3d71f7aeb3fcdd73/src/field.js#L153-L157
[^3]: https://github.com/protobufjs/protobuf.js/blob/918ff014efe19f3eb43195ae3d71f7aeb3fcdd73/src/util/minimal.js#L163-L167
[^4]: https://github.com/protobufjs/protobuf.js/blob/918ff014efe19f3eb43195ae3d71f7aeb3fcdd73/lib/inquire/index.js#L4-L17
[^5]: https://docs.aspect.build/rules/aspect_rules_js/docs/pnpm/#hoisting
[^6]: https://github.com/protobufjs/protobuf.js/blob/918ff014efe19f3eb43195ae3d71f7aeb3fcdd73/lib/inquire/package.json
[^7]: #106318

Fixes: #106318
Release note: Observability pages no longer crash when they encounter zeros (e.g. a session with no memory allocated).

108764: sql/physicalplan: make more tests compatible with secondary tenants  r=maryliag,yuzefovich a=knz

Informs #76378.
Links to #108763.
Epic: CRDB-18499

108765: sql: make TestScatterResponse work with secondary tenants r=yuzefovich a=knz

First two commits from #108764
Informs #76378.
Epic: CRDB-18499

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Renato Costa <[email protected]>
Co-authored-by: Michael Erickson <[email protected]>
Co-authored-by: Sean Barag <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
@yuzefovich yuzefovich changed the title sql: update distsql phy planning tests to work with secondary tenants sql: figure out how to populate range cache in secondary tenants Aug 16, 2023
@exalate-issue-sync exalate-issue-sync bot added the T-multitenant Issues owned by the multi-tenant virtual team label Aug 16, 2023
@yuzefovich yuzefovich added T-sql-queries SQL Queries Team and removed T-multitenant Issues owned by the multi-tenant virtual team labels Aug 16, 2023
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Aug 16, 2023
Two out of three tests tracked by cockroachdb#108763 required only a minor
adjustment. However, one of the tests exposed (perhaps previously
unknown) difference between single-tenant and multi-tenant configs in
terms of populating the range cache. The issue has been repurposed to
tracking investigating and addressing that difference.

Release note: None
craig bot pushed a commit that referenced this issue Aug 16, 2023
108825: physicalplan: make a couple of tests work with test tenant r=yuzefovich a=yuzefovich

Two out of three tests tracked by #108763 required only a minor adjustment. However, one of the tests exposed (perhaps previously unknown) difference between single-tenant and multi-tenant configs in terms of populating the range cache. The issue has been repurposed to tracking investigating and addressing that difference.

Addresses: #108763

Epic: None

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
@mgartner mgartner moved this from Triage to New Backlog in SQL Queries Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

2 participants