Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: duplicate log lines in store rebalancer #98867

Closed
kvoli opened this issue Mar 17, 2023 · 1 comment · Fixed by #98888
Closed

kvserver: duplicate log lines in store rebalancer #98867

kvoli opened this issue Mar 17, 2023 · 1 comment · Fixed by #98888
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 GA-blocker T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented Mar 17, 2023

The store rebalancer logs load-based replica transfers successfully brought... even when the store was not below the max threshold.

I230313 23:39:08.737797 297 13@kv/kvserver/store_rebalancer.go:589 ⋮ [T1,n1,s1,store-rebalancer] 1014  load-based replica transfers successfully brought s1 down to ‹(queries-per-second=659.7 cpu-per-second=317ms)› load, mean=‹(queries-per-second=493.2 cpu-per-second=257ms)›, upperThreshold=‹(queries-per-second=493.2 cpu-per-second=307ms)›

The store rebalancer also logs when there are no available replica
rebalancing actions remaining but the store still above the desired load
threshold: ran out of replicas worth transferring and load....

This log line is duplicated in both the post range rebalancing phase and in
exiting the range rebalancing phase.

I230313 23:39:08.737691 297 13@kv/kvserver/store_rebalancer.go:566 ⋮ [T1,n1,s1,store-rebalancer] 1011  ran out of leases worth transferring and load ‹(queries-per-second=659.7 cpu-per-second=317ms)› is still above desired threshold ‹(queries-per-second=493.2 cpu-per-second=307ms)›; considering load-based replica rebalances
I230313 23:39:08.737741 297 13@kv/kvserver/store_rebalancer.go:616 ⋮ [T1,n1,s1,store-rebalancer] 1012  ran out of replicas worth transferring and load ‹(queries-per-second=659.7 cpu-per-second=317ms)› is still above desired threshold ‹(queries-per-second=493.2 cpu-per-second=307ms)›; will check again soon
I230313 23:39:08.737771 297 13@kv/kvserver/store_rebalancer.go:581 ⋮ [T1,n1,s1,store-rebalancer] 1013  ran out of replicas worth transferring and load ‹(queries-per-second=659.7 cpu-per-second=317ms)› is still above desired threshold ‹(queries-per-second=493.2 cpu-per-second=307ms)›; will check again soon

Jira issue: CRDB-25557

@kvoli kvoli added A-kv-distribution Relating to rebalancing and leasing. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 labels Mar 17, 2023
@kvoli kvoli self-assigned this Mar 17, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 17, 2023

Hi @kvoli, please add a C-ategory label to your issue. Check out the label system docs.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added the T-kv KV Team label Mar 17, 2023
@kvoli kvoli changed the title kvserver: dupelicate log lines in store rebalancer kvserver: duplicate log lines in store rebalancer Mar 17, 2023
kvoli added a commit to kvoli/cockroach that referenced this issue Mar 17, 2023
Previously, the store rebalancer would log `load-based replica transfers
successfully brought...` even when the store was not below the max
threshold. The store rebalancer also logs when there are no available
replica rebalancing actions remaining but the store still above the
desired load threshold: `ran out of replicas worth transferring and
load`. This log line was duplicated in both the post range rebalancing
phase and in exiting the range rebalancing phase.

This commit stops duplication by removing the exit log line and ensures
that the `load-based replica transfers successfully brought...` log line
occurs only when actually successful.

Resolves: cockroachdb#98867

Release note: None
craig bot pushed a commit that referenced this issue Mar 20, 2023
98805: kvserver: unskip merge queue during snapshot test r=nvanbenschoten a=AlexTalks

This change unskips `TestMergeQueueDoesNotInterruptReplicationChange`, which was skipped as potentially flaky in #94951. Despite repeated stress testing, this test no longer seems flaky, and correctly gets an error if attempting to enqueue a range in the merge queue during a replication change.

Fixes: #94951

Release note: None

98809: sql/logictest: clarify retry comment r=mgartner a=mgartner

Epic: None

Release note: None

98844: testutils: move tenant or server logic into test server r=stevendanna a=herkolategan

The logic to decide between returning a default test tenant, if one was started, or the server has been added to `testcluster` previously, but `testserver` can also benefit from exposing this logic for tests that do not require a whole cluster. Hence, this logic has been refactored to be available in both test cluster and test server.

Epic: CRDB-18499

98888: kvserver: remove duplicate logging in s-rebalancer r=andrewbaptist a=kvoli

Previously, the store rebalancer would log `load-based replica transfers successfully brought...` even when the store was not below the max threshold. The store rebalancer also logs when there are no available replica rebalancing actions remaining but the store still above the desired load threshold: `ran out of replicas worth transferring and load`. This log line was duplicated in both the post range rebalancing phase and in exiting the range rebalancing phase.

This commit stops duplication by removing the exit log line and ensures that the `load-based replica transfers successfully brought...` log line occurs only when actually successful.

Resolves: #98867

Release note: None

98959: Revert "Revert "backupccl: disallow restore of backups older than the minimum supported binary version"" r=dt a=adityamaru

This reverts commit c510676.

Some roachtests rely on fixtures generated before 22.2. To restore
such fixtures that are outside our compatibility window we must run
the restore with `unsafe_restore_incompatible_version`.

Fixes: #98918
Fixes: #98930
Release note: None
Epic: none

98997: multitenant: Increase timeout for kvtenantccl r=knz a=ajstorm

The new TestTenantUpgradeInterlock test takes a long time to run (up to a minute locally) and has started timing out in nightly runs. On the suspicion that it's timing out strictly because it's long running, increase the timeout to see if that resolves the failures.

Release note: None

99021: ui: fix jobs column name r=maryliag a=maryliag

The Last Execution Time called was previously labelled as Last Modified Time by mistake.
This commit fixes to use the correct name.

Epic: None

Release note (ui change): Update the Jobs table column name to the correct Last Execution Time.

Co-authored-by: Alex Sarkesian <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Herko Lategan <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Adam Storm <[email protected]>
Co-authored-by: maryliag <[email protected]>
@craig craig bot closed this as completed in 0b18c6e Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 GA-blocker T-kv KV Team
Projects
None yet
1 participant