-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: allow an arbitrary number of spans in a span stats request #105638
Comments
The point of using a stream interface for the low-level API is to not run afoul of the batch size limit. It's OK if SHOW RANGES blocks. That was by design. (That's also why the expensive computation is only performed under WITH DETAILS) |
Hi @zachlite, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Previously, span stats requests could only request a maximum of 500 spans at a time, otherwise the request would return an error. This commit adds transparent batching at both the system tenant's handler and the secondary tenant's handler. The cluster setting `server.span_stats.span_batch_limit` is used to control the size of the batches. The default value has been raised to 1000. Resolves cockroachdb#105638 Epic: CRDB-30635 Release note (ops change): Requests for database details or table details from the UI, or usages of `SHOW RANGES WITH DETAILS` are no longer subject to errors if the number of requested spans is too large.
Previously, span stats requests could only request a maximum of 500 spans at a time, otherwise the request would return an error. This commit adds transparent batching at both the system tenant's handler and the secondary tenant's handler. The cluster setting `server.span_stats.span_batch_limit` is used to control the size of the batches. The default value has been raised to 1000. Resolves cockroachdb#105638 Epic: CRDB-30635 Release note (ops change): Requests for database details or table details from the UI, or usages of `SHOW RANGES WITH DETAILS` are no longer subject to errors if the number of requested spans is too large.
Previously, span stats requests could only request a maximum of 500 spans at a time, otherwise the request would return an error. This commit adds transparent batching at both the system tenant's handler and the secondary tenant's handler. The cluster setting `server.span_stats.span_batch_limit` is used to control the size of the batches. The default value has been raised to 1000. Resolves cockroachdb#105638 Epic: CRDB-30635 Release note (ops change): Requests for database details or table details from the UI, or usages of `SHOW RANGES WITH DETAILS` are no longer subject to errors if the number of requested spans is too large.
109464: server: batch large span stats requests r=zachlite a=zachlite Previously, span stats requests could only request a maximum of 500 spans at a time, otherwise the request would return an error. This commit adds transparent batching at both the system tenant's handler and the secondary tenant's handler. The cluster setting `server.span_stats.span_batch_limit` is used to control the size of the batches. The default value has been raised to 1000. Resolves #105638 Epic: CRDB-30635 Release note (ops change): Requests for database details or table details from the UI, or usages of `SHOW RANGES WITH DETAILS` are no longer subject to errors if the number of requested spans is too large. 109612: serverutils: implement the testserver interfaces indirectly r=stevendanna a=knz First commit from #109591. Epic: CRDB-18499 Fixes #107058. TLDR: unit tests using TestServer/TestCluster now produce warning and notice messages upon certain suspicious API usage. For example: ``` $ dev test //pkg/server/storage_api -f TestStatusGetFiles === RUN TestStatusGetFiles ... pkg/server/storage_api/storage_api_test_test/pkg/server/storage_api/files_test.go:46: (TestStatusGetFiles) NOTICE: .GetStatusClient() called via implicit interface ApplicationLayerInterface; HINT: consider using .ApplicationLayer().GetStatusClient() instead. conditional_wrap.go:193: TIP: consider replacing the test server initialization from: ts, db, kvDB := serverutils.StartServer(t, ...) // or: tc := serverutils.StartCluster(t, ...) ts := tc.Server(0) To: srv, db, kvDB := serverutils.StartServer(t, ...) defer srv.Stop(...) ts := srv.ApplicationLayer() // or: tc := serverutils.StartCluster(t, ...) ts := tc.Server(0).ApplicationLayer() ``` The following warnings/notices are implemented. **ApplicationLayerInterface** The problem here is that the implicit implementation of `ApplicationLayerInterface` inside `TestServerInterface` is likely misused. If/when the test server automatically starts a secondary tenant, the *implicit* `ApplicationLayerInterface` refers to the system tenant, whereas only `.ApplicationLayer()` refers to the secondary tenant. It's likely that the test code should use the latter. For this, a warning is printed when a method from the *implicit* `ApplicationLayerInterface` is called. The warning is not printed if the test has indicated (via `DefaultTestTenant: base.TestIsForStorageLayerAndNeedsASystemTenant`) that it is ever only interested in the storage layer. **StorageLayerInterface** We want to promote the use of the explicit interface accessor `.StorageLayer()` to access the methods of `StorageLayerInterface`. For this, a notice is printed when a method from the *implicit* `StorageLayerInterface` is called. The notice is not printed if the test has indicated (via `DefaultTestTenant: base.TestIsForStorageLayerAndNeedsASystemTenant`) that it is ever only interested in the storage layer. **TenantControlInterface** We want to promote the use of the explicit interface accessor `.TenantController()` to access the methods of `TenantControlInterface`. For this, a notice is printed when a method from the *implicit* `TenantControlInterface` is called. 109863: roachtest: add npgsql tests to ignorelist; not blocklist r=rafiss a=rafiss The previous commit 1928be2 accidentally put them in the wrong list. The blocklist is for tests that are expected to always fail, but a flaky test might pass sometimes. fixes #109806 fixes #109738 Release note: None 109872: sql: coherent handling of hidden session vars in info_schema r=rafiss a=knz Fixes #109870. Epic: CRDB-28893 Release note (bug fix): Certain SQL session variables are meant to be hidden from introspection but were showing up in `information_schema.session_variables`, which was incoherent with the handling in `pg_catalog.pg_settings`. This bug has now been fixed. Co-authored-by: Zach Lite <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>
Previously, span stats requests could only request a maximum of 500 spans at a time, otherwise the request would return an error. This commit adds transparent batching at both the system tenant's handler and the secondary tenant's handler. The cluster setting `server.span_stats.span_batch_limit` is used to control the size of the batches. The default value has been raised to 1000. Resolves #105638 Epic: CRDB-30635 Release note (ops change): Requests for database details or table details from the UI, or usages of `SHOW RANGES WITH DETAILS` are no longer subject to errors if the number of requested spans is too large.
Is your feature request related to a problem? Please describe.
As of #98490, A single
SpanStats
request can request stats for multiple spans via itsspans []roachpb.Span
field. However, the maximum number of spans allowed per request is controlled by the cluster settingserver.span_stats.span_batch_limit
, which has a default value of 500. The original purpose ofserver.span_stats.span_batch_limit
was to ensure that the caller received a speedy response.#103128 made
SpanStats
a dependency ofSHOW RANGES WITH DETAILS
, which immediately caused roachtest failures where the cluster under test had more ranges than the default 500.The cluster setting was bumped as a workaround in #105500, but the following realities still exist:
Describe the solution you'd like
A
SpanStats
request should be able to service an arbitrary number of spans in a reasonable amount of time.In the wild, the largest clusters can have 1e6 ranges or more, and we must be able to service these cases.
A possible implementation is proposed here:
Challenges:
SHOW RANGES WITH DETAILS
do not stream results back to the caller, so a streaming solution forSpanStats
that lowered the "time to first result" would not get results back to the user any faster, becauseSHOW RANGES
would still block.Jira issue: CRDB-29134
Epic: CRDB-30635
The text was updated successfully, but these errors were encountered: