Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/contention: introduce contention duration threshold cluster setting #13232

Closed
cockroach-teamcity opened this issue Mar 12, 2022 · 5 comments

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 12, 2022

Exalate commented:

cockroachdb/cockroach#77623 --- Release note (sql change): introduce sql.contention.event_store.duration_threshold cluster setting. This cluster setting specifies the minimum contention duration to cause the contention events to be collected into crdb_internal.transaction_contention_events virtual table.

Jira Issue: DOC-2893

Jira Issue: DOC-4352

@exalate-issue-sync
Copy link

Kevin Ngo (kevin-v-ngo) commented:
Archer Zhang, do you think we should have any guidance on this setting for now? Since it’s 0 by default I’m wondering what was the motivation of introducing it. Any perf implications with having events always captured?

@exalate-issue-sync
Copy link

Archer Zhang (Azhng) commented:
TLDR: it’s a partial escape hatch / admission control knob in case the contention events are overwhelming the CPU/memory. This allows the pressure to be partially levitated without fully turn off the system.

Full story: during the stability testing, we observed that contention-heavy workload like YCSB-A can only saturate up to 50% CPU/RAM resources. This gives the new contention event subsystem a lot of breathing room (in terms of CPU/RAM) to process all the contention events. In conclusion, there is no noticeable performance impact by turning contention event system on when running YCSB-A workload. For workload such as kv95 where there’s very little contention, the contention subsystem is effectively no-op, which means we also observed not noticeable performance impact.

I’m worried that if the users runs multiple workloads, [e.g. YCSB-A (low resource consumption, high contention) + KV95 (high resource consumption, low contention) ], we will be operating in a less-than-ideal situation, where we have very little CPU/RAM resources, and we will need to process large amount of contention events.

In short, this cluster setting is useful in this case. We don’t want to completely turn off the contention events system since it would still be useful for user to analyze the contention patterns. However, leaving this system on might also risk de-stablize the cluster. (might just be my paranoia, since I haven’t been able to destablize it myslef)

@exalate-issue-sync
Copy link

Kevin Ngo (kevin-v-ngo) commented:
Thanks for the explanation and context Archer! Seems fine we have this in the cluster settings table in our docs then since there are no issues with actually configuring the value - hopefully it’s not needed.

@exalate-issue-sync
Copy link

Archer Zhang commented:
TLDR: it’s a partial escape hatch / admission control knob in case the contention events are overwhelming the CPU/memory. This allows the pressure to be partially levitated without fully turn off the system.

Full story: during the stability testing, we observed that contention-heavy workload like YCSB-A can only saturate up to 50% CPU/RAM resources. This gives the new contention event subsystem a lot of breathing room (in terms of CPU/RAM) to process all the contention events. In conclusion, there is no noticeable performance impact by turning contention event system on when running YCSB-A workload. For workload such as kv95 where there’s very little contention, the contention subsystem is effectively no-op, which means we also observed not noticeable performance impact.

I’m worried that if the users runs multiple workloads, [e.g. YCSB-A (low resource consumption, high contention) + KV95 (high resource consumption, low contention) ], we will be operating in a less-than-ideal situation, where we have very little CPU/RAM resources, and we will need to process large amount of contention events.

In short, this cluster setting is useful in this case. We don’t want to completely turn off the contention events system since it would still be useful for user to analyze the contention patterns. However, leaving this system on might also risk de-stablize the cluster. (might just be my paranoia, since I haven’t been able to destablize it myslef)

@exalate-issue-sync
Copy link

Stephanie Bodoff (stbof) commented:
Kevin Ngo Should I be documenting this cluster setting in this PR: #13612 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant