Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insights Details page V1 #83775

Closed
maryliag opened this issue Jul 4, 2022 · 0 comments · Fixed by #86325
Closed

Insights Details page V1 #83775

maryliag opened this issue Jul 4, 2022 · 0 comments · Fixed by #86325
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@maryliag
Copy link
Contributor

maryliag commented Jul 4, 2022

With the creation on the insights page on #83774,
create the details page for the insight. Still focused on transaction high time only, but preparing to allow several types on insights on the table

Figma: https://www.figma.com/file/xpnRkP9cLXERVceRC4cec4/22.2_SQL-obsrv_query-performance?node-id=1086%3A72736

https://www.loom.com/share/daee54ca31644ba4b9b64784c0228b8e?t=135

Jira issue: CRDB-17254

@maryliag maryliag added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 4, 2022
ericharmeling added a commit to ericharmeling/cockroach that referenced this issue Aug 18, 2022
This commit adds the v1 Workload Insight Details page to the DB Console, via the
cluster-ui package. The v1 Workload Insight Details page only includes details
about a specific High Wait Time transaction insight event, populated with
information served a new "endpoint" built on top of the SQL-over-HTTP API.

Fixes cockroachdb#83775.

Release note (ui change): Added new Workload Insight Details page to DB Console

Release justification: low-risk updates to new functionality
craig bot pushed a commit that referenced this issue Aug 19, 2022
86325: ui: add workload insight details page v1 r=ericharmeling a=ericharmeling

This commit adds the v1 Workload Insight Details page to the DB Console, via the `cluster-ui` package. The v1 Workload Insight Details page only includes details about a specific High Wait Time transaction insight event, populated with information served a new "endpoint" built on top of the SQL-over-HTTP API.

Note that this v1 page only includes information available on contention for individual transaction executions recorded in the `crdb_internal.transaction_contention_events` table. As most transaction statistics are aggregated across multiple transaction executions, there are some data missing in this v1, most notably: end time, rows processed, priority, full scan, retry count, last retry reason, session id.

Fixes #83775.

https://www.loom.com/share/3f7dbd1326954c4ca7949ed8539ae4e9

Release note (ui change): Added new Workload Insight Details page to DB Console

Release justification: low-risk updates to new functionality

86412: kvserver: deflake TestMergeQueue{...} tests r=irfansharif a=irfansharif

Fixes #73107. Fixes #79109. Fixes #85372.

It's possible that under certain conditions the merge queue doesn't
immediately merge what these tests are expecting. One possible reasons is
span config asynchrony; the merge queue is actually disabled until each
store learns about at least some span config state: #78122 (if we don't
have any span configs available, enabling range merges would be
extremely dangerous -- we could collapse everything into a single
range).

Release justification: stability work
Release note: None

86416: bazel: remove `devdarwinx86_64` toolchain r=rail,dt,jlinder a=rickystewart

We haven't come up with a `clang` 10 toolchain for macOS/arm64 because
LLVM doesn't publish `clang` archives for that architecture. Instead of
letting the two platforms differ to this extent, just remove the `dev`
toolchain for x86_64.

Closes #72828.

Release justification: non-production code changes
Release note: None

86421: kvevent: implement chunked blocking buffer r=jayshrivastava a=jayshrivastava

### **kvevent: refactor blocking buffer benchmark**
This change updates the blocking buffer micro benchmark
in several ways:
- it uses different types of events
- it uses more producers than consumers to keep the
  buffer full
- it makes b.N correspond to the total number of events,
  so the benchmark can analyze allocs per event

Release note: None

Release justification: This change updates a benchmark only.

### **kvevent: implement chunked buffer event queue**
This change implements a simple chunked event queue.
The purpose of this queue is to be used by
kvevent.blockingBuffer in subsequent commits.

Release note: None

Release justification: This change does not affect
any production code. It adds files which are not
called by any packages.

### **kvevent: refactor memory buffer to chunked linked list**
This change refactors kvevent/blocking_buffer.go to use
a chunked linked list instead of a regular linked list to
reduce pointer usage. Note that the underlying sync.Pool,
which is also a linked list, will use less pointers due to
us pooling chunks instead of events.

Release note: None

Release justification: This change significantly
improves performance by significantly reducing
pressure on GC. Consequently, this significantly
improves foreground SQL p99 latency. GC has
been causing severe issues in production changefeeds.
Merging this change in this release is worth it
for its potential to reduce incidents.

### **Results (micro)**
These are the results of running the microbenchmark.
`./dev bench pkg/ccl/changefeedccl/kvevent --filter=BenchmarkMemBuffer --count=10 --bench-mem  --stream-output  --test-args="--test.benchtime=45s" -- --nocache_test_results --test_verbose_timeout_warnings |& tee bench.txt`
```
name          old time/op    new time/op    delta
MemBuffer-10    1.22µs ± 2%    0.85µs ± 3%  -30.04%  (p=0.000 n=8+10)

name          old alloc/op   new alloc/op   delta
MemBuffer-10     0.00B          0.00B          ~     (all equal)

name          old allocs/op  new allocs/op  delta
MemBuffer-10      0.00           0.00          ~     (all equal)
```
- Memory usage is 0 due to pooling in both implementations. 
- We can achieve a higher throughput with the chunked implementation - about 50-60M events in 45 seconds as opposed to ~40M with the old implementation. 


### **Results (Macro)**
Full results are published [here](https://docs.google.com/document/d/10-Pc_cnkCUhkRFpLyBAId39q1aqsjuqTb2k4ehTmQCY/edit?usp=sharing). In summary:

I analyzed performing by running TPC-C for 30 mins on a 15 node cluster with 10k warehouses. Before starting the workload, I started a changefeed on the `order_line` table (~200GB). I also set the following cluster settings to stress the buffer and pressure GC:
`changefeed.backfill.concurrent_scan_requests = 100;`
`changefeed.memory.per_changefeed_limit = '1073741824';` (~1GB)

Then, I analyzed SQL latency from admin UI and GC performance using the output of `GODEBUG=gctrace=1.` These are the outcomes:
- The p99 SQL latency during the workload was reduced from approx. 1.75s -> 0.150s (91%)
- CPU time spent doing GC was reduced from 37.86 mins -> 20.75 mins (45%)
- The p99 spike at the beginning of the workload was reduced from approx. 15s -> 12s (20%)

### Relevant Issues
Addresses: #84582
(for now...)

86426: sql: bump memory limits in a couple of tests r=yuzefovich a=yuzefovich

This commit is similar to dd82d13 where
we bumped --max-sql-memory limit. We have recently started hitting the
memory limit flakes again (because we do more accounting and have more
internal queries), so this commit bumps up the limits significantly
while still preserving the tests' spirit.

Fixes: #86372.

Release justification: test-only change.

Release note: None

86458: insights: surface suboptimal plans and failed executions r=matthewtodd a=matthewtodd

Fixes #85827.

Release justification: Category 2: Bug fixes and low-risk updates to new
functionality.

Release note: None

86471: vendor: bump Pebble to 406c1dce54c9 r=itsbilal,nicktrav a=jbowens

```
406c1dce *: add simpleLevelIterator, reduce merging levels in external iter
cd7f076e metamorphic: reposition iterator after SetOptions
5f6b4325 db: make RangeKeyMasking.Filter a constructor function
```

In addition to the vendor bump, this commit makes some small code
changes in response to interface changes within Pebbe. I kept these
changes minimal for the Pebble bump PR, and we can follow up with
fully adapting call sites to take advantage of the new ability to pass
'levels' of sstables into the sstable iterator.

Release note: None
Release justification: Non-production code changes, low-risk updates to
new functionality.

86481: Makefile: fix logpb generation r=nicktrav a=ajwerner

We needed to pass the flag. This will hit problems if any of the deps of the
file change.

Release justification: fixes the build

Release note: None

Co-authored-by: Eric Harmeling <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Matthew Todd <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@craig craig bot closed this as completed in bb2052a Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants