keyring: update handle to state inside replication loop #15227

tgross · 2022-11-11T21:40:25Z

When keyring replication starts, we take a handle to the state store. But whenever a snapshot is restored, this handle is invalidated and no longer points to a state store that is receiving new keys. This leaks a bunch of memory too!

In addition to operator-initiated restores, when fresh servers are added to existing clusters with large-enough state, the keyring replication can get started quickly enough that it's running before the snapshot from the existing clusters have been restored.

Fix this by updating the handle to the state store when we query.

tgross · 2022-11-11T21:42:20Z

nomad/encrypter.go

@@ -457,6 +456,7 @@ START:
 				goto ERR_WAIT // rate limit exceeded
 			}

+			store := krr.srv.fsm.State()
 			ws := store.NewWatchSet()


I'm going to admit I feel like I'm supposed to be able to have this store.NewWatchSet at the top of the loop and then I can select on it with the rest of the context to replace the state handle... but I can't figure out which is the right thing to be polling on there and this diff fixes the test case I've got.

(And we should probably not create a watchset here at all if we're not using it to poll for changes?)

So I think this can restructured a bit, but I could be wrong as I'm new to this area of code:

(*state.StateStore).AbandonCh() will get closed when a restore finishes, so I think just toss it in the select above and then you only need to do store = krr.srv.fsm.State() in that case block.

The WatchSet can be dropped and nil apssed tWatchSets are internal to iradix indexes while Abandoning is a StateStore concept. I think this is an accurate mental model:

Raft | v FSM <readers> | ^ v | StateStore <- Abandon | v MemDB | v iradix indexes <- Watches

If replicators were waiting for their local root key meta to be updated via the FSM then I think the watchset could be added to the select {} to be notified when it was updated. Since the root key meta is replicated by RPC outside of the FSM, I don't think a watchset is useful here.

But like I said: new code to me, so let's maybe zoom if this all sounds way off.

schmichael · 2022-11-15T00:35:44Z

nomad/encrypter.go

@@ -457,6 +456,7 @@ START:
 				goto ERR_WAIT // rate limit exceeded
 			}

+			store := krr.srv.fsm.State()
 			ws := store.NewWatchSet()


So I think this can restructured a bit, but I could be wrong as I'm new to this area of code:

(*state.StateStore).AbandonCh() will get closed when a restore finishes, so I think just toss it in the select above and then you only need to do store = krr.srv.fsm.State() in that case block.

The WatchSet can be dropped and nil apssed tWatchSets are internal to iradix indexes while Abandoning is a StateStore concept. I think this is an accurate mental model:

Raft | v FSM <readers> | ^ v | StateStore <- Abandon | v MemDB | v iradix indexes <- Watches

If replicators were waiting for their local root key meta to be updated via the FSM then I think the watchset could be added to the select {} to be notified when it was updated. Since the root key meta is replicated by RPC outside of the FSM, I don't think a watchset is useful here.

But like I said: new code to me, so let's maybe zoom if this all sounds way off.

tgross · 2022-11-15T14:00:09Z

(*state.StateStore).AbandonCh() will get closed when a restore finishes, so I think just toss it in the select above and then you only need to do store = krr.srv.fsm.State() in that case block.

That's what I was looking for! Thank you!

If replicators were waiting for their local root key meta to be updated via the FSM then I think the watchset could be added to the select {} to be notified when it was updated. Since the root key meta is replicated by RPC outside of the FSM, I don't think a watchset is useful here.

The RootKeyMeta is replicated in the FSM; it's the root key material that's not (the random bytes that make up the key). So a watchset would normally be what we want here. But I couldn't figure out how to make this work when bringing up a fresh node. While the snapshot restores, any watchsets that fire will be on the new iradix tree which we're not yet watching on, so we'd never detect an update.

But I think we could do this similar to how state.BlockingQuery works:

select on the state store abandon channel. when that channel is closed:
- replace the store variable
- create a new watchset that includes the abandon channel, and pass it to a new query for all the keys
select on the watchset. when that fires:
- create a new watchset that includes the abandon channel, and pass it to a new query for all the keys
select on a timer every few seconds. That allows for periodic attempts to repair if the on-disk keystore needs to be repaired by the cluster administrator after a botched recovery from backup.

That would let us lean on the efficiency of the watchset in the common case, while still handling new nodes and cases where the cluster needs to self-repair.

When keyring replication starts, we take a handle to the state store. But whenever a snapshot is restored, this handle is invalidated and no longer points to a state store that is receiving new keys. This leaks a bunch of memory too! In addition to operator-initiated restores, when fresh servers are added to existing clusters with large-enough state, the keyring replication can get started quickly enough that it's running before the snapshot from the existing clusters have been restored. Fix this by updating the handle to the state store whenever the store's abandon channel is closed. Refactor the query for key metadata to use blocking queries for efficiency.

tgross · 2022-11-15T21:21:17Z

@schmichael I've updated this to select over the abandon channel and to use a blocking query (to take advantage of the wait channel). This ends up being a little more verbose but more idiomatic with the rest of the code base.

tgross · 2022-11-16T14:38:38Z

~~Bah, I need to rework this, as it caused a panic during shutdown somewhere in https://github.com/hashicorp/nomad/actions/runs/3474236511/jobs/5807491485~~ Edit: that panic was because of #15267 but it seems that I'm still getting test failures because leadership is failing now? It's stable on main so it must be a problem in this branch but heck if I know why yet.

nomad/encrypter.go

…ing ERR_WAIT block

tgross · 2022-11-16T21:05:27Z

@schmichael I've backed out the blocking query change and just left refreshing the State() call in the loop (and clean up some of the rest of the code in the process). There's something I'm missing with blocking queries where everything would be working as I'd expect -- the blocking query would return on either an update or the context timeout. But then it'd randomly get into a state where it'd enter a tight spin loop and ignore the context timeout entirely. This didn't seem to be correlated with the snapshot restore either.

This tight loop was destabilizing the whole test suite, as it was causing enough load that leadership was failing. In the interest of shipping a critical fix I'm using the rate-limited query and I'll see about revisiting blocking queries in later work -- I have a sneaking suspicion there's may be a bug in our blocking query implementation that's hard to hit in RPCs. You've got a request-changes on this so I'll need your 👍 here to ship.

shoenig

LGTM! goto -= 3 🎉

github-actions · 2023-03-18T02:14:37Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added theme/keyring type/bug backport/1.4.x backport to 1.4.x release line labels Nov 11, 2022

tgross requested review from shoenig, schmichael and angrycub November 11, 2022 21:40

tgross added this to the 1.4.3 milestone Nov 11, 2022

tgross commented Nov 11, 2022

View reviewed changes

tgross mentioned this pull request Nov 11, 2022

failed to submit plan for evaluation: ... no such key \"<snip>\" in keyring error after moving cluster to 1.4.1 #14981

Closed

vercel bot deployed to Preview – nomad-storybook-and-ui November 11, 2022 21:45 View deployment

tgross marked this pull request as ready for review November 11, 2022 21:48

tgross force-pushed the b-keyring-state-restore branch from a9134dd to 7a41d2f Compare November 11, 2022 21:49

vercel bot deployed to Preview – nomad-storybook-and-ui November 11, 2022 21:56 View deployment

schmichael requested changes Nov 15, 2022

View reviewed changes

tgross force-pushed the b-keyring-state-restore branch from 7a41d2f to 9a1c450 Compare November 15, 2022 20:16

tgross changed the title ~~keyring: take handle to state inside replication loop~~ keyring: update handle to state inside replication loop Nov 15, 2022

vercel bot deployed to Preview – nomad-storybook-and-ui November 15, 2022 20:22 View deployment

tgross force-pushed the b-keyring-state-restore branch from 9a1c450 to e7703d1 Compare November 15, 2022 21:20

tgross requested a review from schmichael November 15, 2022 21:20

vercel bot deployed to Preview – nomad-storybook-and-ui November 15, 2022 21:26 View deployment

move binding for AbandonCh

b3f8ff5

vercel bot deployed to Preview – nomad-storybook-and-ui November 16, 2022 17:27 View deployment

lgfa29 reviewed Nov 16, 2022

View reviewed changes

nomad/encrypter.go Outdated Show resolved Hide resolved

vercel bot deployed to Preview – nomad-storybook-and-ui November 16, 2022 18:50 View deployment

back out blocking query in lieu of limiter and simplify code by remov…

82fce0a

…ing ERR_WAIT block

tgross force-pushed the b-keyring-state-restore branch from 01dc833 to 82fce0a Compare November 16, 2022 19:06

vercel bot deployed to Preview – nomad-storybook-and-ui November 16, 2022 19:13 View deployment

shoenig approved these changes Nov 16, 2022

View reviewed changes

schmichael approved these changes Nov 16, 2022

View reviewed changes

tgross merged commit f54a50b into main Nov 17, 2022

tgross deleted the b-keyring-state-restore branch November 17, 2022 13:40

hc-github-team-nomad-core mentioned this pull request Nov 17, 2022

Backport of keyring: update handle to state inside replication loop into release/1.4.x #15281

Merged

github-actions bot locked as resolved and limited conversation to collaborators Mar 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keyring: update handle to state inside replication loop #15227

keyring: update handle to state inside replication loop #15227

tgross commented Nov 11, 2022 •

edited

Loading

tgross Nov 11, 2022 •

edited

Loading

schmichael Nov 15, 2022

schmichael Nov 15, 2022

tgross commented Nov 15, 2022 •

edited

Loading

tgross commented Nov 15, 2022 •

edited

Loading

tgross commented Nov 16, 2022 •

edited

Loading

tgross commented Nov 16, 2022

shoenig left a comment

github-actions bot commented Mar 18, 2023

keyring: update handle to state inside replication loop #15227

keyring: update handle to state inside replication loop #15227

Conversation

tgross commented Nov 11, 2022 • edited Loading

tgross Nov 11, 2022 • edited Loading

Choose a reason for hiding this comment

schmichael Nov 15, 2022

Choose a reason for hiding this comment

schmichael Nov 15, 2022

Choose a reason for hiding this comment

tgross commented Nov 15, 2022 • edited Loading

tgross commented Nov 15, 2022 • edited Loading

tgross commented Nov 16, 2022 • edited Loading

tgross commented Nov 16, 2022

shoenig left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 18, 2023

tgross commented Nov 11, 2022 •

edited

Loading

tgross Nov 11, 2022 •

edited

Loading

tgross commented Nov 15, 2022 •

edited

Loading

tgross commented Nov 15, 2022 •

edited

Loading

tgross commented Nov 16, 2022 •

edited

Loading