stmtdiagnostics: implement range feed on system.statement_diagnostics_requests to reduce latency #47893

tbg · 2020-04-22T09:30:03Z

We should implement the range feed on system.statement_diagnostics_requests table in order to remove the "polling" that currently happens every 10s. See this comment for some pointers.

Jira issue: CRDB-4382

Epic CRDB-18185

The text was updated successfully, but these errors were encountered:

RaduBerinde · 2020-04-22T15:57:28Z

@andreimatei and @ajwerner discussed this quite a bit during the initial implementation. The gossip solution is seen as temporary indeed.

ajwerner · 2020-04-22T15:59:07Z

Yep, this is on my list.

ajwerner · 2020-04-22T15:59:47Z

Worst case we disable the gossip without anything better initially and clients will be exposed to a bit of extra latency when requesting statement diagnostics.

tbg · 2020-09-04T09:58:30Z

@ajwerner were you planning on touching this (I assume not anytime soon) and is this good enough for now as-is? We are nominally considering this a blocker still but it doesn't seem to be.

tbg · 2020-09-04T09:59:17Z

As you touch this issue, please also put it in the appropriate project. I would put it in SQL-Execution but I don't know if they own this.

ajwerner · 2020-09-04T19:01:55Z

Do we need to ensure that the gossip isn't called or does it just no-op? If it no-ops then we're good and don't need to do anything. We've not exposing the statements page for tenants right now anyway and even if we were, they poll periodically so it's just higher latency

tbg · 2020-09-08T07:57:26Z

From my reading of the stmtdiag code, it will return an error (unsupported w/ multi-tenancy) which will be returned to the requester of the stmt diag report. This is fine, so no need to do anything right now.

RaduBerinde · 2021-02-10T21:12:37Z

To fix this, we can use a range feed on the dignostic requests table.

irfansharif · 2022-06-10T17:10:57Z

To fix this, we can use a range feed on the dignostic requests table.

We have libraries now to make this pretty plug-n-play:

cockroach/pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go

Lines 35 to 64 in 0bd68bd

    
           // Watcher is used to implement a consistent cache over spans of KV data 
        
           // on top of a RangeFeed. Note that while rangefeeds offer events as they 
        
           // happen at low latency, a consistent snapshot cannot be constructed until 
        
           // resolved timestamp checkpoints have been received for the relevant spans. 
        
           // The watcher internally buffers events until the complete span has been 
        
           // resolved to some timestamp, at which point the events up to that timestamp 
        
           // are published as an update. This will take on the order of the 
        
           // kv.closed_timestamp.target_duration cluster setting (default 3s). 
        
           // 
        
           // If the buffer overflows (as dictated by the buffer limit the Watcher is 
        
           // instantiated with), the old rangefeed is wound down and a new one 
        
           // re-established. The client interacts with data from the RangeFeed in two 
        
           // ways, firstly, by translating raw KVs into kvbuffer.Events, and by handling 
        
           // a batch of such events when either the initial scan completes or the 
        
           // frontier changes. The OnUpdateCallback which is handed a batch of events, 
        
           // called an Update, is informed whether the batch of events corresponds to a 
        
           // complete or incremental update. 
        
           // 
        
           // It's expected to be Start-ed once. Start internally invokes Run in a retry 
        
           // loop. 
        
           // 
        
           // The expectation is that the caller will use a mutex to update an underlying 
        
           // data structure. 
        
           // 
        
           // NOTE (for emphasis): Update events after the initial scan published at a 
        
           // delay corresponding to kv.closed_timestamp.target_duration (default 3s). 
        
           // Users seeking to leverage the Updates which arrive with that delay but also 
        
           // react to the row-level events as they arrive can hijack the translateEvent 
        
           // function to trigger some non-blocking action. 
        
           type Watcher struct {

This guy for example maintains such a feed (in the tenant pod, over a tenant system table):

cockroach/pkg/server/systemconfigwatcher/cache.go

Lines 29 to 33 in 69e48ba

    
           // Cache caches a set of KVs in a set of spans using a rangefeed. The 
        
           // cache provides a consistent snapshot when available, but the snapshot 
        
           // may be stale. 
        
           type Cache struct { 
        
           	w                   *rangefeedcache.Watcher

yuzefovich · 2023-07-25T16:41:04Z

@rafiss could you take a look at this comment please? I don't understand why this issue has "skipped test" label.

rafiss · 2023-07-25T16:55:58Z

I added the label because I see this in the code:

cockroach/pkg/sql/sqltestutils/telemetry.go

Line 107 in 451d761

    
           skip.WithIssue(t, 47893, "tenant clusters do not support SQL features used by this test")

rafiss · 2023-07-25T16:56:42Z

I don't know if that code is referencing the correct issue. If not, feel free to create a separate issue for tracking the skipped test.

yuzefovich · 2023-07-25T16:58:25Z

I see, thanks. This issue originally has been about the support of stmt diagnostics in secondary tenants, but #83547 added the support with some caveats, so this issue was repurposed to be about optimizing the stmt diagnostics feature. I'll remove that skip.

107493: ui,build: push cluster-ui assets into external folder during watch mode r=nathanstilwell a=sjbarag Previously, watch mode builds of cluster-ui (e.g. 'dev ui watch' or 'pnpm build:watch') would emit files only to pkg/ui/workspaces/cluster-ui/dist. Using that output in a watch task of a private repo required setting up symlinks via a 'make' task[1]. Unfortunately, that symlink made it far too easy for the node module resolution algorithm in that private repo to follow the symlink back to cockroach.git, which gave that project access to the modules in pkg/ui/node_modules/ and pkg/ui/workspaces/cluster-ui/node_modules. This resulted in webpack finding multiple copies of react-router (which expects to be a singleton), typescript finding multiple incompatible versions of react, etc. Unfortunately, webpack doesn't support multiple output directories natively. Add a custom webpack plugin that copies emitted files to an arbitrary number of output directories. [1] pnpm link doesn't work due to some package-name aliasing we've got going on there. Release note: None Epic: none 107555: sql: remove stale skip in TestTelemetry r=yuzefovich a=yuzefovich This commit unskips multiple telemetry tests that were skipped for no good reason (they were referencing an unrelated issue). This uncovered some bugs in the new schema changer telemetry reporting where we duplicated `_index` twice in the feature counter for inverted indexes. Also, `index` telemetry test contained an invalid statement which is now removed. The only file that is still skipped is `sql-stats` where the output doesn't match the expectations, and I'm not sure whether the test is stale or something is broken, so a separate issue was filed. Addresses: #47893. Epic: None. Release note: None 107597: builtins: force production values in TestSerialNormalizationWithUniqueUnorderedID r=yuzefovich a=yuzefovich We've observed that if `batch-bytes-limit` value is set too low, then the "key counts" query in this test takes much longer (on my laptop it was 60s for a particular random seed vs 2.4s with production values), so this commit forces some production values. Fixes: #106829. Release note: None Co-authored-by: Sean Barag <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>

yuzefovich · 2023-07-26T17:56:55Z

The issue about implementing the rangefeed on reduce latency is still present.

yuzefovich · 2023-07-26T17:58:48Z

@maryliag looks like you removed this from cluster observability project, but I think it'd be up to your team to address this issue (i.e. implementing the range feed as Irfan suggested here), do you agree? I'll update the issue description accordingly.

tbg added the A-multitenancy Related to multi-tenancy label Apr 22, 2020

tbg assigned RaduBerinde Apr 22, 2020

tbg changed the title ~~stmtdiagnostics: needs rework for multi-tenancy phase 2~~ stmtdiagnostics: needs rework for multi-tenancy Apr 22, 2020

RaduBerinde assigned andreimatei and ajwerner and unassigned RaduBerinde Apr 22, 2020

RaduBerinde self-assigned this Feb 10, 2021

exalate-issue-sync bot unassigned ajwerner and RaduBerinde Sep 27, 2021

RaduBerinde unassigned andreimatei Nov 11, 2021

knz added the A-sql-observability Related to observability of the SQL layer label Jun 14, 2022

blathers-crl bot added the T-sql-observability label Jun 14, 2022

blathers-crl bot added the T-sql-queries SQL Queries Team label Jun 23, 2022

kevin-v-ngo mentioned this issue Jun 27, 2022

Surface statement bundles in the console including the collection UX for CockroachDB Serverless #83422

Closed

maryliag removed the T-cluster-observability label Jul 5, 2023

exalate-issue-sync bot added the T-cluster-observability label Jul 5, 2023

exalate-issue-sync bot added T-cluster-observability and removed T-cluster-observability labels Jul 5, 2023

exalate-issue-sync bot added T-cluster-observability T-multitenant Issues owned by the multi-tenant virtual team and removed T-cluster-observability labels Jul 5, 2023

yuzefovich mentioned this issue Jul 25, 2023

sql: remove stale skip in TestTelemetry #107555

Merged

yuzefovich removed the skipped-test label Jul 26, 2023

rafiss linked a pull request Jul 26, 2023 that will close this issue

sql: remove stale skip in TestTelemetry #107555

Merged

craig bot closed this as completed in #107555 Jul 26, 2023

yuzefovich reopened this Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stmtdiagnostics: implement range feed on system.statement_diagnostics_requests to reduce latency #47893

stmtdiagnostics: implement range feed on system.statement_diagnostics_requests to reduce latency #47893

tbg commented Apr 22, 2020 •

edited by yuzefovich

Loading

RaduBerinde commented Apr 22, 2020

ajwerner commented Apr 22, 2020

ajwerner commented Apr 22, 2020

tbg commented Sep 4, 2020

tbg commented Sep 4, 2020

ajwerner commented Sep 4, 2020

tbg commented Sep 8, 2020

RaduBerinde commented Feb 10, 2021

irfansharif commented Jun 10, 2022

yuzefovich commented Jul 25, 2023

rafiss commented Jul 25, 2023

rafiss commented Jul 25, 2023

yuzefovich commented Jul 25, 2023

yuzefovich commented Jul 26, 2023

yuzefovich commented Jul 26, 2023

stmtdiagnostics: implement range feed on system.statement_diagnostics_requests to reduce latency #47893

stmtdiagnostics: implement range feed on system.statement_diagnostics_requests to reduce latency #47893

Comments

tbg commented Apr 22, 2020 • edited by yuzefovich Loading

RaduBerinde commented Apr 22, 2020

ajwerner commented Apr 22, 2020

ajwerner commented Apr 22, 2020

tbg commented Sep 4, 2020

tbg commented Sep 4, 2020

ajwerner commented Sep 4, 2020

tbg commented Sep 8, 2020

RaduBerinde commented Feb 10, 2021

irfansharif commented Jun 10, 2022

yuzefovich commented Jul 25, 2023

rafiss commented Jul 25, 2023

rafiss commented Jul 25, 2023

yuzefovich commented Jul 25, 2023

yuzefovich commented Jul 26, 2023

yuzefovich commented Jul 26, 2023

tbg commented Apr 22, 2020 •

edited by yuzefovich

Loading