-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: log rangelog events as structured events #104075
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Comments
miraradeva
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv
Anything in KV that doesn't belong in a more specific category.
T-kv
KV Team
labels
May 30, 2023
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
May 30, 2023
Previously, writing to the system.rangelog table was done as part of the transaction that triggered the range change (e.g. split, merge, rebalance). If the logging failed for some reason (e.g. JSON being logged was too large), the entire transaction needed to be retried. This has caused at least one major incident. This change introduces the option to write to system.rangelog async, and not as part of the original transaction; this option is also now used by default for all writes to system.rangelog. When logging async, the actual write to system.rangelog is done in an async task executed as a commit trigger after the original transaction ends. Epic: CRDB-16517 Fixes: cockroachdb#82538 Informs: cockroachdb#104075 Release note: None
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Jun 5, 2023
Previously, writing to the system.rangelog table was done as part of the transaction that triggered the range change (e.g. split, merge, rebalance). If the logging failed for some reason (e.g. JSON being logged was too large), the entire transaction needed to be retried. This has caused at least one major incident. This change introduces the option to write to system.rangelog async, and not as part of the original transaction; this option is also now used by default for all writes to system.rangelog. When logging async, the actual write to system.rangelog is done in an async task executed as a commit trigger after the original transaction ends. Epic: CRDB-16517 Fixes: cockroachdb#82538 Informs: cockroachdb#104075 Release note: None
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Jun 5, 2023
Previously, writing to the system.rangelog table was done as part of the transaction that triggered the range change (e.g. split, merge, rebalance). If the logging failed for some reason (e.g. JSON being logged was too large), the entire transaction needed to be retried. This has caused at least one major incident. This change introduces the option to write to system.rangelog async, and not as part of the original transaction; this option is also now used by default for all writes to system.rangelog. When logging async, the actual write to system.rangelog is done in an async task executed as a commit trigger after the original transaction ends. Epic: CRDB-16517 Fixes: cockroachdb#82538 Informs: cockroachdb#104075 Release note: None
craig bot
pushed a commit
that referenced
this issue
Jun 7, 2023
98889: kvserver: increase rebalance action impact req r=andrewbaptist a=kvoli Previously, the store rebalancer would attempt to transfer leases or relocate replicas of a range so long as the load of the range (estimated using the local leaseholder replica) was greater than 0.1% of the store's load. This commit updates the value to be higher for both lease rebalancing and range rebalancing, specifically: lease rebalancing: 0.1% -> 0.5% range rebalancing: 0.1% -> 2.0% The documented rationale, as for why these values exist is also the reason for the increase: decrease unnecessary churn. Only the hottest ranges are considered, when the impact of a rebalance action for these ranges is estimated to be less than a small fraction of the store's total load, it is indicative that the distribution of replica load is not skewed. Letting the replicate queue balance replica and lease counts is a better option in such distributions (until we consolidate rebalancing responsibility). Making the minimum for range rebalancing higher is also added, as multiple replicas for a range is inherently more costly than moving a lease. Resolves: #98890 Release note: None 102813: kvserver: write to system.rangelog async r=nvanbenschoten a=miraradeva Previously, writing to the system.rangelog table was done as part of the transaction that triggered the range change (e.g. split, merge, rebalance). If the logging failed for some reason (e.g. JSON being logged was too large), the entire transaction needed to be retried. This has caused at least one major incident. This change introduces the option to write to system.rangelog async, and not as part of the original transaction; this option is also now used by default for all writes to system.rangelog. When logging async, the actual write to system.rangelog is done in an async task executed as a commit trigger after the original transaction ends. Epic: [CRDB-16517](https://cockroachlabs.atlassian.net/browse/CRDB-16517) Fixes: #82538 Informs: #104075 Release note: None 104370: storage: better error message for non-dev store with dev binary r=RaduBerinde a=RaduBerinde Release note: None Fixes: #103647 104454: bazel: mark targets as being `no-remote-exec` instead of `no-remote` r=healthy-pod a=rickystewart `no-remote` implies `no-remote-cache` as well. Properly these targets can be cached remotely, they just can't be executed remotely. Epic: CRDB-8308 Release note: None 104501: configprofiles: deflake TestDataDriven r=itsbilal a=knz Informs #104500. Fixes #102408. This test was experiencing a failure due to a race condition. The root cause is described in issue #104500 and will be addressed separately. This change ensures that the test does not encounter the code path at all. Release note: None Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Mira Radeva <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Logging to
system.rangelog
is async by default (after #102813 is merged), which means a range change might succeed but the logging might fail. We should log a structured event toKV_DISTRIBUTION
as an auxiliary logging mechanism.Currently, we can't use
kvserver.RangeLogEvent
directly as a structured event because that will introduce an unnecessary dependency fromeventpb
tokvserver
androachpb
. We should either refactor theRangeLogEvent
struct, or create a new struct for structured logging.The alternative to write an unstructured event to
KV_DISTRIBUTION
was dismissed because we don't have a good way to process unstructured events.Epic: CRDB-28526
Jira issue: CRDB-28344
The text was updated successfully, but these errors were encountered: