server: delete node status entries during decommissioning #56529

erikgrinaker · 2020-11-10T22:10:34Z

Decommissioning a node did not remove its node status entry in the
status-node- keyspace. This caused the decommissioned node to remain
visible via e.g. Status/Nodes gRPC calls, affecting other systems that
did not check against the node's liveness entry.

This commit deletes a node's status entry when it is being
decommissioned. Unfortunately, since the status entry is inline this
cannot be transactional with the liveness entry, which can cause a stale
status entry to be left behind if the operation should fail or the node
should crash. In these cases, the decommission operation (which is
idempotent) must be run again to remove it.

To avoid a race condition where the decommissioned node's status
recorder loop resurrects the status entry after it has been removed,
this also changes the recorder to only update the status entry if it
already exists. The initial status entry is written during node startup,
and the node will now fail to start if the entry cannot be written.

Release note (bug fix): remove a node's status entry when the node is
decommissioned, to prevent it from appearing in API calls and UIs, and
avoid it affecting node constraints such as localities and attributes
for various operations.

Resolves #51304.

This also adds support for conditional puts of inline values, as a separate commit -- see commit message for details.

blathers-crl · 2020-11-10T22:10:38Z

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

Before a member of our team reviews your PR, I have some potential action items for you:

Please ensure your git commit message contains a release note.
When CI has completed, please ensure no errors have appeared.

I have added a few people who may be able to assist in reviewing:

@nvanbenschoten (author of kv: NodeStatus keys not cleared on decommissioning #51304)
@tbg (commented on kv: NodeStatus keys not cleared on decommissioning #51304)

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

cockroach-teamcity · 2020-11-10T22:10:40Z

This change is

tbg · 2020-11-16T14:28:22Z

Thanks @erikgrinaker and sorry it's taking me so long to look at this. It turns out that there's actually a near-term desire to rely on these status keys internally a bit more. @nvanbenschoten and/or @aayushshah15, could you provide pointers on what's needed there?

One option here which isn't outlined in the issue might be to have all nodes attempt to clean up the NodeStatusKey when they see the liveness entry change to decommissioned, since this is a cheap and idempotent operation - this would at the very least be eventually consistent (or highly likely to be). The status update loop would also need to check the liveness entry before writing, to avoid recreating it.

That's not a bad idea, though in practice it's still going to be a little crufty (as will anything that requires sync between the update loop and the record). For example, we want a decommissioning node to keep writing these updates (as the UI uses them to display node info) and only stop on decommissioned, which is a very short time interval. Unless we introduce a waiting period, it's bound to accidentally violate the desired ordering.

For an alternative solution, we could add an RPC endpoint that stops the node status loop (i.e. a WillBeDecommissionedCallback). When the target node is live, we invoke it (otherwise we don't). After that point, we delete the entry, and flip the entry to decommissioned. This still leaves, theoretically, a small window in which the node could rejoin the cluster and write an offending key, but it does seem fine, as long as the guarantees required for Nathan and Aayush's work are fine with that remote chance.

erikgrinaker · 2020-11-16T20:26:15Z

No rush! I agree that there aren't really any obvious good ways to solve this. I think I'm leaning towards either check-on-read or a reconciliation loop (possibly piggybacking on some other cleanup loop), since those will always converge on a correct state, eventually. But happy to implement the RPC approach if that's what we'd prefer.

A good solution would probably involve improving the capabilities of this data store somehow, not sure what they currently are and what plans (if any) we have for it - will read up on it a bit tomorrow.

nvanbenschoten

Hi @erikgrinaker, this looks like a great start! Thanks for putting it together and sorry my review is coming a little late.

Performing a best-effort removal of the NodeStatusKey when decommissioning is the right first step here. I left a comment below about a potential change we could make to allow for manual retries in the presence of node failures or other interruptions. Given the limited set of places that we intend to use NodeStatusKeys for #51304 (comment), this may actually be sufficient for our use case.

Past that, it sounds like the biggest blocker towards clearer semantics is that there is no coordination with the MetricsRecorder, so any work we do to delete the node's status key may be undone due to a race. The most natural way to avoid these kinds of issues would be to replace the blind put in MetricsRecorder with a conditional put. If writes to the key from the startWriteNodeStatus loop (but not from initializeAdditionalStores) were to atomically check that some value already existed for the key, then I think we'd be in the clear. The immediate issue with this is that we don't have an Inline flag on ConditionalPutRequest. I don't think there's any strong reason for this though, other than that we have never needed one (@tbg does that sound right?). So we could consider:

adding inline support to ConditionalPutRequest
either querying or remembering the previous value for the node's NodeStatusKey in startWriteNodeStatus
conditionally writing to the key and logging a warning on ConditionFailedErrors

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)

pkg/server/server.go, line 2092 at r1 (raw file):

			// the status entries if we crash or fail. Note that the status
			// entry is an inline value, thus this cannot be transactional.
			if targetStatus.Decommissioned() {

While this can't be transactional, I wonder if we can leverage the fact that decommissioning is idempotent to permit crashes/failures that prevent the NodeStatus key from being deleted to be retried. If we pulled this operation out of the if statusChanged block, would that be enough to support this form of manual reconciliation?

blathers-crl · 2020-11-27T13:32:31Z

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

erikgrinaker · 2020-11-27T13:48:31Z

I wonder if we can leverage the fact that decommissioning is idempotent to permit crashes/failures that prevent the NodeStatus key from being deleted to be retried. If we pulled this operation out of the if statusChanged block, would that be enough to support this form of manual reconciliation?

I like this idea. Since there's no great way to ensure this automatically, having some way to manually repair this in the case of failure seems like a reasonable first step -- have updated the PR with this.

The most natural way to avoid these kinds of issues would be to replace the blind put in MetricsRecorder with a conditional put.

Awesome! I was thinking we could use some sort of optimistic concurrency control, but put the idea aside since I didn't think the KV store supported it (given that it also uses MVCC) -- great to learn that this is possible after all, thanks for the tip! I'll go ahead and implement this.

One problem with this approach is that if the original write in initializeAdditionalStores fails (which would just be logged and ignored) then the status key will never be written. Maybe we should return an error for this?

erikgrinaker · 2020-11-27T14:37:56Z

The most natural way to avoid these kinds of issues would be to replace the blind put in MetricsRecorder with a conditional put.

Awesome! I was thinking we could use some sort of optimistic concurrency control, but put the idea aside since I didn't think the KV store supported it (given that it also uses MVCC) -- great to learn that this is possible after all

Hm, I'm actually not sure if ConditionalPut can be implemented safely for inline values. Since inline values do not have a timestamp, and Pebble does not appear to support transactions, OCC, or locks, I don't think there's any way to atomically read and write the value. A concurrent writer (e.g. the decommissioner) could therefore modify it in the meanwhile. Or am I missing something?

nvanbenschoten

One problem with this approach is that if the original write in initializeAdditionalStores fails (which would just be logged and ignored) then the status key will never be written. Maybe we should return an error for this?

Returning an error there seems reasonable to me. That way, we can guarantee that an initialized store will have a corresponding node status key.

Hm, I'm actually not sure if ConditionalPut can be implemented safely for inline values. Since inline values do not have a timestamp, and Pebble does not appear to support transactions, OCC, or locks, I don't think there's any way to atomically read and write the value. A concurrent writer (e.g. the decommissioner) could therefore modify it in the meanwhile. Or am I missing something?

Everything you said is true, but what you're missing is that CRDB has a lightweight latching mechanism that provides request-level isolation, even to inline and non-transaction operations. This is enough to synchronize with other inline ConditionalPut operations. To read up on this, see https://github.com/cockroachdb/cockroach/blob/13ce0e8db15efd201af782322669f0542c40b29a/pkg/kv/kvserver/concurrency/concurrency_control.go and https://github.com/cockroachdb/cockroach/blob/13ce0e8db15efd201af782322669f0542c40b29a/pkg/kv/kvserver/spanlatch/doc.go.

To use this, we'll need to configure the inline ConditionalPut to support latching just like we do with the inline Put operation. See

cockroach/pkg/kv/kvserver/batcheval/cmd_put.go

Lines 34 to 38 in 13ce0e8

    
           if args.Inline { 
        
           	DefaultDeclareKeys(desc, header, req, latchSpans, lockSpans) 
        
           } else { 
        
           	DefaultDeclareIsolatedKeys(desc, header, req, latchSpans, lockSpans) 
        
           }

.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)

andreimatei · 2020-12-03T19:14:46Z

Hi Erik!
Drive-by comment: please update the (almost non-existent) comment on the Nodes RPC with this patch to say something about decommissioned nodes - namely spell out that they're not included (or that they can still be included in some edge cases?)

cockroach/pkg/server/serverpb/status.proto

Line 994 in 9e4ec7d

rpc Nodes(NodesRequest) returns (NodesResponse) {

erikgrinaker · 2020-12-03T19:30:42Z

Sure thing @andreimatei!

Been busy with some other things lately, but will pick this up over the weekend.

blathers-crl · 2020-12-05T20:00:34Z

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
When CI has completed, please ensure no errors have appeared.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

erikgrinaker · 2020-12-05T20:35:55Z

@tbg @nvanbenschoten Thanks for the input, I think this should be ready for review now.

We do a best-effort removal during decommissioning, on failure the operation must be rerun to remove the stale entry. The status recorder loop uses a conditional put to update the value only if it already exists, and we unconditionally write a status entry on node startup with a fatal error on failure.

This may still leave a race condition when decommissioning a node that happens to be starting up (where it's decommissioned between checking the decommissioned status and writing the status entry), but I feel like that's going pretty far into edge case territory. In any case, it can be repaired using the same approach as for crashes, by rerunning the decommission operation.

Let me know what you think.

nvanbenschoten

This looks great! Very nice job @erikgrinaker. There are just a few nits remaining before we can merge this, nothing serious.

Reviewed 10 of 10 files at r2, 10 of 10 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)

pkg/kv/batch.go, line 427 at r2 (raw file):

}

// CPutInline conditionally sets the value for a key if the existing value is

nit: drop this below CPutAllowingIfNotExists to keep the transaction versions of CPut grouped together.

pkg/kv/client_test.go, line 379 at r2 (raw file):

	value := []byte("value")

	// Should fail on non-existent key with expected value

nit: in accordance with the Go style guide, we generally try to use proper punctuation in comments whenever possible.

pkg/server/decommission_test.go, line 61 at r3 (raw file):

		ctx, livenesspb.MembershipStatus_DECOMMISSIONED, []roachpb.NodeID{decomNodeID}))

	// The node status entries should shortly be cleaned up.

I might be thinking about this incorrectly, but shouldn't this have been performed synchronously? In other words - why do we need the SucceedsSoon?

pkg/server/node.go, line 585 at r3 (raw file):

	// Write a new status summary after all stores have been initialized; this
	// helps the UI remain responsive when new nodes are added.
	if err := n.writeNodeStatus(ctx, 0 /* alertTTL */, false); err != nil {

nit: add a /* mustExist */ comment after the false. See the "Avoid Naked Parameters" section of our extended style guide for the reason why: https://wiki.crdb.io/wiki/spaces/CRDB/pages/181371303/Go+coding+guidelines.

Same thing down below.

pkg/server/node.go, line 725 at r3 (raw file):

				//
				// The status key must already exist, to avoid race conditions
				// during node decommissioning.

Consider expanding this comment a bit to mention that this process may not even be the one performing the decommissioning.

erikgrinaker · 2020-12-10T20:14:22Z

Thanks for the review and suggestions @nvanbenschoten, I've updated the PR with fixes.

The CI failure appears unrelated.

tbg · 2020-12-16T08:43:59Z

Good idea, thanks. I added this for DB.CPutInline(), but Batch.CPutInline() doesn't have access to a context so I'd have to go into the batch execution machinery and inspect the requests there, which seemed a bit excessive -- can add this if you'd prefer though.

Yeah that seems excessive. You are calling that method, right? Otherwise - unexport. If you're calling it, you could still rename it (UnsafeCPutInline?) Anyway, I agree that this is going into diminishing returns.

Yeah, I thought it might be nice to always return the same error kind as the conditional put, but you're right in that we're not going to inspect it here anyway. Used a generic error instead.

ConditionalPutError is primarily intended for programmatic consumption (i.e. translate into for example a unique constraint violation error), but this error we were looking at is just going to the logs, so human-readable is the right choice.

nvanbenschoten

Reviewed 10 of 21 files at r8, 11 of 11 files at r9.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

pkg/server/status/recorder.go, line 511 at r7 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This doesn't seem like the right error to return (since we're not in any code path where the caller knows do to anything with a structured KV error). Don't you want something like
return errors.New("this node was decomissioned")

This is a good point. What do you think about catching ConditionFailedErrors coming from CPutInline and wrapping them with similar context about a concurrent update (if ActualValue != nil) and a concurrent decommission (if ActualValue == nil).

The KV server supports regular put operations for inline (non-versioned) values, but not conditional put operations as there hasn't been a need for them yet. These are now needed for cockroachdb#51304, to avoid a race condition where a decommissioned node can accidentally resurrect its status entry. Even though inline values are not versioned, the KV server provides request-level key-scoped isolation, which allows the value to be atomically compared and swapped during a conditional put operation. This commit adds the APIs `DB.CPutInline()` and `Batch.cPutInline()`, as well as the Protobuf field `ConditionalPutRequest.inline` and corresponding constructor `roachpb.NewConditionalPutInline()`, to perform conditional puts for inline values. It also adds the version gate `clusterversion.CPutInline`, but the client APIs do not check this automatically as they don't have access to cluster settings. Instead, `DB.CPutInline()` requires the given context to be wrapped with `kv.CtxForCPutInline()` to enable the feature, and `Batch.cPutInline()` is temporarily unexported. Release note: none

erikgrinaker · 2020-12-16T19:27:08Z

You are calling that method, right? Otherwise - unexport.

Done. 👍

What do you think about catching ConditionFailedErrors coming from CPutInline and wrapping them with similar context about a concurrent update

For sure -- should've done that when I changed the other error, but slipped my mind. Done.

nvanbenschoten

thanks for pushing through the migration concerns.

I'll merge on green.

Reviewed 5 of 16 files at r10, 11 of 11 files at r11.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

erikgrinaker · 2020-12-16T19:43:20Z

Bumped this again, misinterpreted what you meant with the ActualValue check. Thanks!

Decommissioning a node did not remove its node status entry in the `status-node-` keyspace. This caused the decommissioned node to remain visible via e.g. `Status/Nodes` gRPC calls, affecting other systems that did not check against the node's liveness entry. This commit deletes a node's status entry when it is being decommissioned. Unfortunately, since the status entry is inline this cannot be transactional with the liveness entry, which can cause a stale status entry to be left behind if the operation should fail or the node should crash. In these cases, the decommission operation (which is idempotent) must be run again to remove it. To avoid a race condition where the decommissioned node's status recorder loop resurrects the status entry after it has been removed, this also changes the recorder to only update the status entry if it already exists. The initial status entry is written during node startup, and the node will now fail to start if the entry cannot be written. Release note (bug fix): remove a node's status entry when the node is decommissioned, to prevent it from appearing in API calls and UIs, and avoid it affecting node constraints such as localities and attributes for various operations.

nvanbenschoten · 2020-12-17T01:02:44Z

bors r+

craig · 2020-12-17T01:36:25Z

Build succeeded:

GitHub CI (Cockroach)

Previously, the decommissioned node lists considered node status entries to determine decommissioning and decommissioned status. This changed in cockroachdb#56529, resulting in empty lists. Now, the node's liveness entry is considered and these lists are correctly populated. Release note (bug fix): The list of recently decommissioned nodes and the historical list of decommissioned nodes correctly display decommissioned nodes. fix tests

Previously, the decommissioned node lists considered node status entries to determine decommissioning and decommissioned status. This changed in cockroachdb#56529, resulting in empty lists. Now, the node's liveness entry is considered and these lists are correctly populated. Release note (bug fix): The list of recently decommissioned nodes and the historical list of decommissioned nodes correctly display decommissioned nodes. fix tests fix pb import and add default arg value

…76591 75809: [CRDB-12226] server, ui: display circuit breakers in problem ranges and range status r=Santamaura a=Santamaura This PR adds changes to the reports/problemranges and reports/range pages. Ranges with replicas that have a circuit breaker will show up as problem ranges and the circuit breaker error will show up as a row on the status page. Release note (ui change): display circuit breakers in problems ranges and range status Problem Ranges page: ![Screen Shot 2022-02-08 at 4 57 51 PM](https://user-images.githubusercontent.com/17861665/153082648-6c03d195-e395-456a-be00-55ad24863752.png) Range status page: ![Screen Shot 2022-02-08 at 4 57 34 PM](https://user-images.githubusercontent.com/17861665/153082705-cbbe5507-e81d-49d7-a3f7-21b4c84226c2.png) 76278: Add cluster version as feature gate for block properties r=dt,erikgrinaker,jbowens a=nicktrav Two commits here - the first adds a new cluster version, the second makes use of the cluster version as a feature gate (and update various call sites all over the place). --- pkg/clusterversion: add new version as feature gate for block properties Prior to this change, the block properties SSTable-level feature was enabled in a single cluster version. This introduced a subtle race in that for the period in which each node is being updated to `PebbleFormatBlockPropertyCollector`, there is a brief period where not all nodes are at the same cluster version, and thus store versions. If nodes at the newer version write SSTables that make use of block properties, and these tables are consumed by nodes that are yet to be updated, the older nodes could panic when attempting to read the tables with a format they do not yet understand. While this race is academic, given that a) there are now subsequent cluster versions that act as barriers during the migration, and b) block properties were disabled in 1a8fb57, this patch addresses the race by adding a second cluster version. `EnablePebbleFormatVersionBlockProperties` acts as a barrier and a feature gate. A guarantee of the migration framework is that any node at this newer version is part of a cluster that has already run the necessary migrations for the older version, and thus ratcheted the format major version in the store, and thus enabled the block properties feature, across all nodes. Add additional documentation in `pebble.go` that details how to make use of the two-version pattern for future table-level version changes. --- pkg/storage: make MakeIngestionWriterOptions version aware With the introduction of block properties, and soon, range keys, which introduce backward-incompatible changes at the SSTable level, all nodes in a cluster must all have a sufficient store version in order to avoid runtime incompatibilities. Update `storage.MakeIngestionWriterOptions` to add a `context.Context` and `cluster.Settings` as parameters, which allows for determining whether a given cluster version is active (via `(clusterversion.Handle).IsActive()`). This allows gating the enabling / disabling of block properties (and soon, range keys), on all nodes being at a sufficient cluster version. Update various call-sites to pass in the `context.Context` and `cluster.Settings`. --- 76348: ui: downsample SQL transaction metrics using MAX r=maryliag a=dhartunian Previously, we were using the default downsampling behavior of the timeseries query engine for "Open SQL Transactions" and "Active SQL Statements" on the metrics page in DB console. This led to confusion when zooming in on transaction spikes since the spike would get larger as the zoom got tighter. This PR changes the aggregation function to use MAX to prevent this confusion. Resolves: #71827 Release note (ui change): Open SQL Transactions and Active SQL Transactions are downsampled using MAX instead of AVG and will more accurately reflect narrow spikes in transaction counts when looking and downsampled data. 76414: spanconfig: teach the KVAccessor about system span configurations r=arulajmani a=arulajmani First 3 commits are from #76219, this one's quite small -- mostly just tests. ---- This patch teaches the KVAccessor to update and get system span configurations. Release note: None 76538: ui: Use liveness info to populate decommissioned node lists r=zachlite a=zachlite Previously, the decommissioned node lists considered node status entries to determine decommissioning and decommissioned status. This changed in #56529, resulting in empty lists. Now, the node's liveness entry is considered and these lists are correctly populated. Release note (bug fix): The list of recently decommissioned nodes and the historical list of decommissioned nodes correctly display decommissioned nodes. 76544: builtins: add rehome_row to DistSQLBlocklist r=mgartner,otan a=rafiss fixes #76153 This builtin always needs to run on the gateway node. Release note: None 76546: build: display pebble git SHA in GitHub messages r=rickystewart a=nicktrav Use the short from of the Git SHA from the go.mod-style version in DEPS.bzl as the Pebble commit. This ensures that failure messages created by Team City link to a GitHub page that renders correctly. Release note: None 76550: gen/genbzl: general improvements r=ajwerner a=ajwerner This change does a few things: * It reworks the queries in terms of eachother in-memory. This is better than the previous iteration whereby it'd generate the results and then rely on the output of that query. Instead, we just build up bigger query expressions and pass them to bazel using the --query_file flag. * It avoids exploring the pkg/ui directory (and the pkg/gen directory) because those can cause problems. The pkg/ui directory ends up bringing in npm, which hurts. * It stops rewriting the files before executing the queries. It no longer needs to rewrite them up front because they aren't referenced by later queries. * It removes the excluded target which was problematic because those files weren't properly visible. Fixes #76521 Fixes #76503 Release note: None 76562: ccl/sqlproxyccl: update PeekMsg to return message size instead of body size r=JeffSwenson a=jaylim-crl Informs #76000. Follow-up to #76006. Previously, PeekMsg was returning the body size (excluding header size), which is a bit awkward from an API point of view because most callers of PeekMsg immediately adds the header size to the returned size previously. This commit cleans the API design up by making PeekMsg return the message size instead, i.e. header inclusive. At the same time, returning the message size makes it consistent with the ReadMsg API since that returns the entire message. Release note: None 76591: bazel: update shebang line in `sql-gen.sh` r=rail a=rickystewart Release note: None Co-authored-by: Santamaura <[email protected]> Co-authored-by: Nick Travers <[email protected]> Co-authored-by: David Hartunian <[email protected]> Co-authored-by: arulajmani <[email protected]> Co-authored-by: Zach Lite <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Jay <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>

Previously, the decommissioned node lists considered node status entries to determine decommissioning and decommissioned status. This changed in cockroachdb#56529, resulting in empty lists. Now, the node's liveness entry is considered and these lists are correctly populated. Release note (bug fix): The list of recently decommissioned nodes and the historical list of decommissioned nodes correctly display decommissioned nodes. fix tests fix pb import and add default arg value

to determine decommissioning and decommissioned status. This changed in cockroachdb#56529, resulting in empty lists. Now, the node's liveness entry is considered and these lists are correctly populated. Release note (bug fix): The list of recently decommissioned nodes and the historical list of decommissioned nodes correctly display decommissioned nodes. lint fix

to determine decommissioning and decommissioned status. This changed in cockroachdb#56529, resulting in empty lists. Now, the node's liveness entry is considered and these lists are correctly populated. Release note (bug fix): The list of recently decommissioned nodes and the historical list of decommissioned nodes correctly display decommissioned nodes. lint fix fix test

blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Nov 10, 2020

blathers-crl bot requested review from nvanbenschoten and tbg November 10, 2020 22:10

nvanbenschoten reviewed Nov 25, 2020

View reviewed changes

erikgrinaker force-pushed the cleanup-decommissioned branch from c9dd70c to 900edb3 Compare November 27, 2020 13:32

nvanbenschoten reviewed Nov 30, 2020

View reviewed changes

erikgrinaker force-pushed the cleanup-decommissioned branch from 900edb3 to 93cbbd7 Compare December 5, 2020 20:00

erikgrinaker changed the title ~~server: delete NodeStatus entry on decommissioning~~ server: delete node status entries during decommissioning Dec 5, 2020

erikgrinaker force-pushed the cleanup-decommissioned branch from 93cbbd7 to f694a7f Compare December 5, 2020 20:10

erikgrinaker marked this pull request as ready for review December 5, 2020 20:15

erikgrinaker requested a review from a team as a code owner December 5, 2020 20:15

erikgrinaker force-pushed the cleanup-decommissioned branch from f694a7f to 481fb1b Compare December 5, 2020 23:55

erikgrinaker requested a review from a team as a code owner December 5, 2020 23:55

erikgrinaker force-pushed the cleanup-decommissioned branch from 481fb1b to 2ec4751 Compare December 6, 2020 07:39

nvanbenschoten reviewed Dec 10, 2020

View reviewed changes

nvanbenschoten added the A-multiregion Related to multi-region label Dec 10, 2020

erikgrinaker force-pushed the cleanup-decommissioned branch from 2ec4751 to 9d20cf7 Compare December 10, 2020 20:12

nvanbenschoten approved these changes Dec 16, 2020

View reviewed changes

erikgrinaker force-pushed the cleanup-decommissioned branch from 98d6f49 to c4f6bf0 Compare December 16, 2020 19:25

nvanbenschoten approved these changes Dec 16, 2020

View reviewed changes

erikgrinaker force-pushed the cleanup-decommissioned branch from c4f6bf0 to 24d020d Compare December 16, 2020 19:42

erikgrinaker force-pushed the cleanup-decommissioned branch from 24d020d to 090eef9 Compare December 16, 2020 20:18

craig bot merged commit 1abab27 into cockroachdb:master Dec 17, 2020

dhartunian mentioned this pull request Feb 1, 2021

stats referencing dead nodes create unrecoverable failure in db console network page #59322

Closed

This was referenced Mar 10, 2021

webui: update to handle removal of decommissioned nodes #61812

Closed

Update decommissioning process info cockroachdb/docs#9968

Closed

erikgrinaker deleted the cleanup-decommissioned branch March 19, 2021 14:12

dhartunian mentioned this pull request Oct 21, 2021

console: filter out stats referencing dead nodes #62917

Closed

zachlite mentioned this pull request Feb 14, 2022

ui: Use liveness info to populate decommissioned node lists #76538

Merged

zachlite mentioned this pull request Apr 28, 2022

ui: Use liveness info to populate decommissioned node lists #80747

Merged

zachlite mentioned this pull request Apr 28, 2022

ui: Use liveness info to populate decommissioned node lists #80750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: delete node status entries during decommissioning #56529

server: delete node status entries during decommissioning #56529

erikgrinaker commented Nov 10, 2020 •

edited

Loading

blathers-crl bot commented Nov 10, 2020

cockroach-teamcity commented Nov 10, 2020

tbg commented Nov 16, 2020 •

edited

Loading

erikgrinaker commented Nov 16, 2020 •

edited

Loading

nvanbenschoten left a comment

blathers-crl bot commented Nov 27, 2020

erikgrinaker commented Nov 27, 2020 •

edited

Loading

erikgrinaker commented Nov 27, 2020

nvanbenschoten left a comment

andreimatei commented Dec 3, 2020

erikgrinaker commented Dec 3, 2020 •

edited

Loading

blathers-crl bot commented Dec 5, 2020

erikgrinaker commented Dec 5, 2020 •

edited

Loading

nvanbenschoten left a comment

erikgrinaker commented Dec 10, 2020 •

edited

Loading

tbg commented Dec 16, 2020

nvanbenschoten left a comment

erikgrinaker commented Dec 16, 2020

nvanbenschoten left a comment

erikgrinaker commented Dec 16, 2020 •

edited

Loading

nvanbenschoten commented Dec 17, 2020

craig bot commented Dec 17, 2020

	if args.Inline {
	DefaultDeclareKeys(desc, header, req, latchSpans, lockSpans)
	} else {
	DefaultDeclareIsolatedKeys(desc, header, req, latchSpans, lockSpans)
	}

server: delete node status entries during decommissioning #56529

server: delete node status entries during decommissioning #56529

Conversation

erikgrinaker commented Nov 10, 2020 • edited Loading

blathers-crl bot commented Nov 10, 2020

cockroach-teamcity commented Nov 10, 2020

tbg commented Nov 16, 2020 • edited Loading

erikgrinaker commented Nov 16, 2020 • edited Loading

nvanbenschoten left a comment

Choose a reason for hiding this comment

blathers-crl bot commented Nov 27, 2020

erikgrinaker commented Nov 27, 2020 • edited Loading

erikgrinaker commented Nov 27, 2020

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei commented Dec 3, 2020

erikgrinaker commented Dec 3, 2020 • edited Loading

blathers-crl bot commented Dec 5, 2020

erikgrinaker commented Dec 5, 2020 • edited Loading

nvanbenschoten left a comment

Choose a reason for hiding this comment

erikgrinaker commented Dec 10, 2020 • edited Loading

tbg commented Dec 16, 2020

nvanbenschoten left a comment

Choose a reason for hiding this comment

erikgrinaker commented Dec 16, 2020

nvanbenschoten left a comment

Choose a reason for hiding this comment

erikgrinaker commented Dec 16, 2020 • edited Loading

nvanbenschoten commented Dec 17, 2020

craig bot commented Dec 17, 2020

erikgrinaker commented Nov 10, 2020 •

edited

Loading

tbg commented Nov 16, 2020 •

edited

Loading

erikgrinaker commented Nov 16, 2020 •

edited

Loading

erikgrinaker commented Nov 27, 2020 •

edited

Loading

erikgrinaker commented Dec 3, 2020 •

edited

Loading

erikgrinaker commented Dec 5, 2020 •

edited

Loading

erikgrinaker commented Dec 10, 2020 •

edited

Loading

erikgrinaker commented Dec 16, 2020 •

edited

Loading