sql/contention: introduce contention duration threshold cluster setting #13232

cockroach-teamcity · 2022-03-12T09:07:21Z

Exalate commented:

cockroachdb/cockroach#77623 --- Release note (sql change): introduce sql.contention.event_store.duration_threshold cluster setting. This cluster setting specifies the minimum contention duration to cause the contention events to be collected into crdb_internal.transaction_contention_events virtual table.

Jira Issue: DOC-2893

Jira Issue: DOC-4352

exalate-issue-sync · 2022-03-18T05:44:31Z

Kevin Ngo (kevin-v-ngo) commented:
Archer Zhang, do you think we should have any guidance on this setting for now? Since it’s 0 by default I’m wondering what was the motivation of introducing it. Any perf implications with having events always captured?

exalate-issue-sync · 2022-03-18T18:53:29Z

Archer Zhang (Azhng) commented:
TLDR: it’s a partial escape hatch / admission control knob in case the contention events are overwhelming the CPU/memory. This allows the pressure to be partially levitated without fully turn off the system.

Full story: during the stability testing, we observed that contention-heavy workload like YCSB-A can only saturate up to 50% CPU/RAM resources. This gives the new contention event subsystem a lot of breathing room (in terms of CPU/RAM) to process all the contention events. In conclusion, there is no noticeable performance impact by turning contention event system on when running YCSB-A workload. For workload such as kv95 where there’s very little contention, the contention subsystem is effectively no-op, which means we also observed not noticeable performance impact.

I’m worried that if the users runs multiple workloads, [e.g. YCSB-A (low resource consumption, high contention) + KV95 (high resource consumption, low contention) ], we will be operating in a less-than-ideal situation, where we have very little CPU/RAM resources, and we will need to process large amount of contention events.

In short, this cluster setting is useful in this case. We don’t want to completely turn off the contention events system since it would still be useful for user to analyze the contention patterns. However, leaving this system on might also risk de-stablize the cluster. (might just be my paranoia, since I haven’t been able to destablize it myslef)

exalate-issue-sync · 2022-03-19T05:10:30Z

Kevin Ngo (kevin-v-ngo) commented:
Thanks for the explanation and context Archer! Seems fine we have this in the cluster settings table in our docs then since there are no issues with actually configuring the value - hopefully it’s not needed.

exalate-issue-sync · 2022-06-10T04:15:31Z

Archer Zhang commented:
TLDR: it’s a partial escape hatch / admission control knob in case the contention events are overwhelming the CPU/memory. This allows the pressure to be partially levitated without fully turn off the system.

Full story: during the stability testing, we observed that contention-heavy workload like YCSB-A can only saturate up to 50% CPU/RAM resources. This gives the new contention event subsystem a lot of breathing room (in terms of CPU/RAM) to process all the contention events. In conclusion, there is no noticeable performance impact by turning contention event system on when running YCSB-A workload. For workload such as kv95 where there’s very little contention, the contention subsystem is effectively no-op, which means we also observed not noticeable performance impact.

I’m worried that if the users runs multiple workloads, [e.g. YCSB-A (low resource consumption, high contention) + KV95 (high resource consumption, low contention) ], we will be operating in a less-than-ideal situation, where we have very little CPU/RAM resources, and we will need to process large amount of contention events.

In short, this cluster setting is useful in this case. We don’t want to completely turn off the contention events system since it would still be useful for user to analyze the contention patterns. However, leaving this system on might also risk de-stablize the cluster. (might just be my paranoia, since I haven’t been able to destablize it myslef)

exalate-issue-sync · 2022-06-10T04:15:32Z

Stephanie Bodoff (stbof) commented:
Kevin Ngo Should I be documenting this cluster setting in this PR: #13612 ?

cockroach-teamcity added C-product-change master labels Mar 12, 2022

exalate-issue-sync bot assigned ghost Mar 14, 2022

exalate-issue-sync bot closed this as completed Jun 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql/contention: introduce contention duration threshold cluster setting #13232

sql/contention: introduce contention duration threshold cluster setting #13232

cockroach-teamcity commented Mar 12, 2022 •

edited by exalate-issue-sync bot

Loading

exalate-issue-sync bot commented Mar 18, 2022

exalate-issue-sync bot commented Mar 18, 2022

exalate-issue-sync bot commented Mar 19, 2022

exalate-issue-sync bot commented Jun 10, 2022

exalate-issue-sync bot commented Jun 10, 2022

sql/contention: introduce contention duration threshold cluster setting #13232

sql/contention: introduce contention duration threshold cluster setting #13232

Comments

cockroach-teamcity commented Mar 12, 2022 • edited by exalate-issue-sync bot Loading

exalate-issue-sync bot commented Mar 18, 2022

exalate-issue-sync bot commented Mar 18, 2022

exalate-issue-sync bot commented Mar 19, 2022

exalate-issue-sync bot commented Jun 10, 2022

exalate-issue-sync bot commented Jun 10, 2022

cockroach-teamcity commented Mar 12, 2022 •

edited by exalate-issue-sync bot

Loading