Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to cancel statement diagnostics from the frontend #74226

Closed
lindseyjin opened this issue Dec 22, 2021 · 2 comments · Fixed by #75733 or #75732
Closed

Add ability to cancel statement diagnostics from the frontend #74226

lindseyjin opened this issue Dec 22, 2021 · 2 comments · Fixed by #75733 or #75732
Assignees
Labels
A-sql-console-timeseries SQL Observability issues related to console timeseries metrics and views C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@lindseyjin
Copy link
Contributor

lindseyjin commented Dec 22, 2021

Is your feature request related to a problem? Please describe.
Currently, we do not have any way to cancel already created statement diagnostics from the frontend console.

Describe the solution you'd like
We would like to add the functionality to cancel statement diagnostic requests that are still waiting on a result.
The backend API for this has already been created in #73863.

Additional context
Figma mocks: https://www.figma.com/file/xdmwvnFQd6KkO9RJ0XLDH0/22.1_SQL-obsrv_query-performance?node-id=5178%3A142577

Epic: CRDB-6619

Jira issue: CRDB-11994

@lindseyjin lindseyjin added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-console-timeseries SQL Observability issues related to console timeseries metrics and views labels Dec 22, 2021
@THardy98
Copy link

THardy98 commented Jan 12, 2022

Ran into an issue while working on this issue.

Currently, the backend cancellation API uses a statement fingerprint string provided by the frontend in a query to determine the corresponding diagnostic request ID that needs to be cancelled/expired.

The frontend can sometimes provide a statement fingerprint that contains placeholders which come in the form of single-quoted underscores '_'. The single quotes aren't escaped prior to their use in the query and consequently cause an error. Not certain, but this issue could also highlight a potential sql injection possibility.

Alternatively, the frontend already has the diagnostic request ID and can directly pass it to the backend, removing the need to pass a statement fingerprint string, and the need to retrieve the ID via a query altogether.

I'm not certain of the amount of change this would require, but under a cursory look I think it would be relatively simple.

cc @yuzefovich for your thoughts on this

@yuzefovich
Copy link
Member

I didn't realize that we already had the request ID - using that to cancel on the backend would actually be preferable over the current implementation. I too think it'll be a pretty small change.

@craig craig bot closed this as completed in 30ee32c Jan 31, 2022
craig bot pushed a commit that referenced this issue Feb 4, 2022
74816: bulk,kv: create initial splits in imports and backfills r=dt a=dt

It has been observed that during a backfill or import, the split-as-we-go behavior used by the sst batcher, where it only explicitly splits and scatters after sending some amount of data all to one range, does not send any splits in the steady state when importing out of order data: since it is out of order, after chunking it by destination range, no one chunk is large enough to be considered filling that range enough to cause us to send a split. Of course, the range may fill on its own, as many processors send it small chunks over and over and those add up, but no one processor knows when that happens, so we simply leave it up to the range to split itself when it is full as it would in any other heavy write-rate, non-bulk ingestion case.

However, we've seen this lead to hotspots during bulk ingestion. Until the range becomes full and decides to split, it can be a bottleneck as many processors send it data at once.  Existing mechanisms for kv-driven load-based splitting are not tuned for ingestion workloads. When it does decide to split and rebalance, that is a more expensive operation, as we're now needing to read, move and ingest half of that *full* range that we just spend so much work sending data to and ingesting, all while still being sent more ingestion load.

Ideally, we'd prefer to split the span before we start ingesting, both to spread ingestion load over our available capacity better, and to reduce how much ingested data it will need to shuffle if we waited to split and scatter only once a range filled. By scattering first, we'll just ingest directly to the right place initially. However a challenge in doing this is that in many cases, we don't know what data we'll be ingesting until we start reading input and producing it -- it could be that we're working off of a key-ordered CSV, or it could be that we have a uniform random distribution. In some cases, such as an index backfill, we may be able to use some external information like a SQL statistics or a sampling scan of the table to derive a good partitioning, but in others, like IMPORTs or view materialization, we have no way to know what the data will be until we start producing it.

This change tries to do a bit better than not presplitting at all though, by instead using the first buffer's worth of data, if it was read out of sorted order, as if it is a representative sample of the rest of the data to come and then generating splits from it and splitting and scatter the target spans using those before proceeding to flush. If this buffer turns out not to be a good sample after all, this is no worse than before, where we pre-split not at all. If the input data was sorted, this step is unnecessary as we'll just split as we go, after each range we fill and scatter the empty remainder before moving to fill it, as we already do today.

Release note (performance improvement): IMPORTs and Index Creation attempt to better balance their expected data between nodes before ingesting data.

75709: cmd/dev: check if doctor needs to be run r=dt a=dt

This adds a check to all non-`doctor` dev commands that `dev doctor` has
previous succeeded, so that in those commands we can assume all of the
state checked or setup by `dev doctor` is correct.

If the definition of `doctor` is updated, this check can also be used to
force a re-run of doctor to run the new version and its new logic.

The is implemented by persisting the "version" of doctor that most recently
succeeded in a file (bin/.dev-status) and then comparing that to the
current dev version.

This also adds "setup" as an alias for "doctor" as it is now expected to
be run after initial clone to "setup" the dev environment (i.e. write
the status file, potentially setup githooks, etc).

Release note: none.

75733: ui: add ability to cancel statement diagnostics to the frontend r=THardy98 a=THardy98

Previously, there did not exist an option to cancel a running request
for statement diagnostics, this change provides a cancellation option.

The statements page now has an ellipsis dropdown button where the
download bundle links used to appear. The dropdown contains a
cancellation button (if there is a currently waiting diagnostic request)
and the download links to previously completed diagnostic requests.

The statement details page now has a "Cancel request" button while a
diagnostic request is not completed (i.e. in the "Waiting" status).

The statement diagnostics history page on db-console also has a "Cancel
Request" button while a diagnostic request is not completed (i.e. in the
"Waiting" status).

Added a small code change in the statement diagnostic "Activate" modal.
The expiry time picker allowed for values that were <1, the code change
fixes this bug.

Original PR with discussions: #74818
Resolves: #74226

**CC Console Changes**:
https://www.loom.com/share/36da7fa1554b45b9adac288bc3a7c23e
^ slight lag on the button updates, kind of highlights how notifications/alerts would be nice

**DB Console Changes**:
https://www.loom.com/share/f76694148f6f4fcdb129665eb519eb4b

**Note**: this change is dependent on the merge of its [corresponding backend changes](#75732) first.

Release note (ui change): Added an option to cancel a running request
for statement diagnostics.

Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Thomas Hardy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-console-timeseries SQL Observability issues related to console timeseries metrics and views C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
3 participants