Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: rebalancing between stores on the same node fails #60545

Closed
lunevalex opened this issue Feb 12, 2021 · 0 comments · Fixed by #60546
Closed

kvserver: rebalancing between stores on the same node fails #60545

lunevalex opened this issue Feb 12, 2021 · 0 comments · Fixed by #60546
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@lunevalex
Copy link
Collaborator

Describe the problem

This problem was reported here by @dankinder https://forum.cockroachlabs.com/t/under-replicated-ranges-after-decommission/4239/3. A range has the following descriptor (n1,s5):1, (n18,s51):2, (n7,s20):3, (n1,s2):4LEARNER and the allocator attempts to remove (n1, s2) but it fails. This is a valid operation and should be allowed.

To Reproduce

This has been reproduced in TestValidateReplicationChanges, by reversing the order of removal operations in Test Case 14.

Expected behavior

The removal of the replica on (n1, s2) should be allowed, as it returns the cluster to a healthy state.

@lunevalex lunevalex added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Feb 12, 2021
@lunevalex lunevalex self-assigned this Feb 12, 2021
lunevalex added a commit to lunevalex/cockroach that referenced this issue Feb 12, 2021
…plicas already exist on the same node

Fixes cockroachdb#60545

The allocator in some cases allows for a range to have a replica
on multiple stores of the same node. If that happens, it should allow
itself to fix the situation by removing one of the offending replicas.
This was only half working due to an ordering problem in how the replicas
appeared in the descriptor. It could remove the first replica, but not the second one.

.

Release note: None
craig bot pushed a commit that referenced this issue Feb 16, 2021
59865: sql: add schema_name,table_id to crdb_internal.ranges r=rafiss a=jordanlewis

... and crdb_internal.ranges_no_leases

Closes #59601.

This commit adds schema_name to crdb_internal.ranges and
crdb_internal.ranges_no_leases to ensure that it's possible to
disambiguate between ranges that are contained by a table with the same
name in two different user-defined schemas.

In addition, it also adds the table_id column which allows unambiguous
lookups of ranges for a given table id. This will also enable making a
virtual index on the table_id column later, which should be a nice win
for some introspection commands.

Release note (sql change): add the schema_name and table_id columns to
the crdb_internal.ranges and crdb_internal.ranges_no_leases virtual
tables.

60546: kvserver: improve handling for removal of a replica, when multiple replicas already exist on the same node r=lunevalex a=lunevalex

Fixes #60545

The allocator in some cases allows for a range to have a replica
on multiple stores of the same node. If that happens, it should allow
itself to fix the situation by removing one of the offending replicas.
This was only half working due to an ordering problem in how the replicas
appeared in the descriptor. It could remove the first replica, but not the second one.

.

Release note: None

60561: geo/wkt: simplify parser grammar and improve error messages r=otan a=andyyang890

This patch simplifies the yacc grammar for the WKT parser
and also improves the error messages for mixed dimensionality
problems.

Refs: #53091

Release note: None

Co-authored-by: Jordan Lewis <[email protected]>
Co-authored-by: Alex Lunev <[email protected]>
Co-authored-by: Andy Yang <[email protected]>
@craig craig bot closed this as completed in 306d2e9 Feb 16, 2021
lunevalex added a commit to lunevalex/cockroach that referenced this issue Feb 17, 2021
…plicas already exist on the same node

Fixes cockroachdb#60545

The allocator in some cases allows for a range to have a replica
on multiple stores of the same node. If that happens, it should allow
itself to fix the situation by removing one of the offending replicas.
This was only half working due to an ordering problem in how the replicas
appeared in the descriptor. It could remove the first replica, but not the second one.

Release note (bug fix): 20.2 introduced an ability to rebalance replicas
between multiple stores on the same node. This change fixed a problem
with that feature, where ocassionaly an intra-node rebalance would
fail and a range would get stuck permanently under replicated.
craig bot pushed a commit that referenced this issue Feb 17, 2021
60633: release-20.2: kvserver: improve handling for removal of a replica, when multiple replicas already exist on the same node r=aayushshah15 a=lunevalex

Backport 1/1 commits from #60546.

/cc @cockroachdb/release

---

Fixes #60545

The allocator in some cases allows for a range to have a replica
on multiple stores of the same node. If that happens, it should allow
itself to fix the situation by removing one of the offending replicas.
This was only half working due to an ordering problem in how the replicas
appeared in the descriptor. It could remove the first replica, but not the second one.

.

Release note: None


Co-authored-by: Alex Lunev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
1 participant