Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qa: Decommissioning nodes #21882

Closed
vivekmenezes opened this issue Jan 29, 2018 · 6 comments
Closed

qa: Decommissioning nodes #21882

vivekmenezes opened this issue Jan 29, 2018 · 6 comments
Assignees
Labels
Milestone

Comments

@vivekmenezes
Copy link
Contributor

No description provided.

@tbg
Copy link
Member

tbg commented Feb 26, 2018

@m-schneider one thing I'd like you to check is that when you decommission a node (whether it is down before you do that or not), the rest of the cluster stops connecting to it. In particular, it shouldn't show up in any of the debug pages, ui, ..., and there shouldn't be log entries for its IP.

@tbg
Copy link
Member

tbg commented Mar 7, 2018

In a four node cluster, decommissioning the first node in absentia (i.e. after terminating it cleanly), the graphs show one replica remains on that node forever (this isn't actually true; select replicas from crdb_internal.ranges; shows all replicas on the remaining three nodes):

image

@couchand is that a ui bug? The node is down, so it can't possibly write that time series.

@tbg
Copy link
Member

tbg commented Mar 7, 2018

On the plus side, after waiting for the store dead timeout, the node disappeared from the graphs and wasn't shown as dead any more, which was expected and worked fine! 👍

@tbg
Copy link
Member

tbg commented Mar 7, 2018

Some graphs from my WIP node decommissioning test (gracefully quit first of three nodes, decommission it, wipe it, bring it back, wait a minute). The test runs a max-qps=500 kv on the side. This is all on my laptop.

TLDR is that it looks pretty good. We seem to be doing a decent enough job a) quitting gracefully and b) not messing things up during the rebalances when the node comes back:

image

image

There are occasionally dips like these:

image

correlated with a spike in node heartbeat latency (~500ms)

That's something to investigate. Potentially this is an artifact of running on my laptop while I do other stuff, though.

@couchand
Copy link
Contributor

couchand commented Mar 7, 2018

show one replica remains on that node forever

This is a known charting issue (#23480 I think, though it's manifesting a bit strangely in your case...) If you look at the big dot for node 1, it's not at the same point in time as the other ones. I think all that's happening is that the last recorded data point for that series was 1. It might be a good usability win to make sure decommissioned nodes always record one more datapoint before going offline, so that they are forever recorded as being clean of all replicas. I've created #23525 to keep track of that idea.

@tbg
Copy link
Member

tbg commented Mar 7, 2018

Thanks @couchand! I've commented on #23525 directly.

tbg added a commit to tbg/cockroach that referenced this issue Mar 8, 2018
A user observed that after decommissioning a node, connections would
still be attempted to the node, seemingly forever. This was easily
reproduced (gRPC warnings are logged when our onlyOnceDialer returns
`grpcutil.ErrCannotReuseClientConn`).

The root cause is that when the node disappears, many affected range
descriptors are not evicted by `DistSender`: when the decommissioned
node was not the lease holder, there will be no reason to evict.
Similarly, if it was, we'll opportunistically send an RPC to the other
possibly stale, but often not) replicas which frequently discovers the
new lease holder.

This doesn't seem like a problem; the replica is carried through and
usually placed last (behind the lease holder and any other healthy
replica) - the only issue was that the transport layer connects to all
of the replicas eagerly, which provokes the spew of log messages.

This commit changes the initialization process so that we don't actually
GRPCDial replicas until they are needed. The previous call site to
GRPCDial and the new one are very close to each other, so nothing is
lost. In the common case, the rpc context already has a connection open
and no actual work is done anyway.

I diagnosed this using an in-progress decommissioning roachtest (and
printf debugging); I hope to be able to prevent regression of this bug
in that test when it's done.

Touches cockroachdb#21882.

Release note (bug fix): Avoid connection attempts to former members of
the cluster and the associated spurious log messages.
tbg added a commit to tbg/cockroach that referenced this issue Mar 8, 2018
A user observed that after decommissioning a node, connections would
still be attempted to the node, seemingly forever. This was easily
reproduced (gRPC warnings are logged when our onlyOnceDialer returns
`grpcutil.ErrCannotReuseClientConn`).

The root cause is that when the node disappears, many affected range
descriptors are not evicted by `DistSender`: when the decommissioned
node was not the lease holder, there will be no reason to evict.
Similarly, if it was, we'll opportunistically send an RPC to the other
possibly stale, but often not) replicas which frequently discovers the
new lease holder.

This doesn't seem like a problem; the replica is carried through and
usually placed last (behind the lease holder and any other healthy
replica) - the only issue was that the transport layer connects to all
of the replicas eagerly, which provokes the spew of log messages.

This commit changes the initialization process so that we don't actually
GRPCDial replicas until they are needed. The previous call site to
GRPCDial and the new one are very close to each other, so nothing is
lost. In the common case, the rpc context already has a connection open
and no actual work is done anyway.

I diagnosed this using an in-progress decommissioning roachtest (and
printf debugging); I hope to be able to prevent regression of this bug
in that test when it's done.

Touches cockroachdb#21882.

Release note (bug fix): Avoid connection attempts to former members of
the cluster and the associated spurious log messages.
@petermattis petermattis modified the milestones: 2.0, 2.0.x, 2.1 Apr 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants