agent: handle dependencies between cached leases during persistent storage restore #12765

tomhjp · 2021-10-07T15:47:11Z

This PR transforms the restoring of persistent cache leases into a concurrent operation. This is explicitly not for performance reasons, but instead to help coordinate the processing of leases that depend on other leases. When a lease depends on another, it can wait on the channel for that lease. With channels set up as a coordination primitive, we simply allow the Go runtime to resolve our dependency graph for us in Goroutines.

The algorithm requires that the graph of dependencies is a DAG (Directed Acyclic Graph). I think this is true by definition. For there to be a cycle in the graph, Vault would have to allow a child lease to be a parent of its own parent - which just doesn't make sense.

There is a new field in the schema, but no schema migration logic is required. See the comment here for more detail: #12765 (comment)

command/agent/cache/lease_cache.go

tvoran · 2021-10-07T23:31:07Z

command/agent/cache/cachememdb/index.go

+	// RequestTokenIndexID is the ID of the RequestToken's entry in the cache
+	RequestTokenIndexID string
+


Will adding this field require a schema migration for the boltdb? Maybe not a schema migration per se, but I'm wondering how this would work restoring from a previous version that didn't populate RequestTokenIndexID, if that's something we're going to support.

Good question that I didn't really address in any comments. As you point out, we could end up restoring from a previous version that didn't populate this field. In that case, the algorithm basically reverts to the same behaviour as today. The child won't be able to find its parent in the leasesMap, and so it won't restore in dependency order. It will just depend on luck whether the child is restored before or after the parent.

We could try harder to populate existing leases in existing caches with this information (rather than the PR's current approach of just populating new leases), but I doubt it's worth the complexity given the typical lifetimes we would expect for the persistent cache file and the leases inside it.

To summarise the above, I think it's probably not worth adding any schema migration logic for this specific change, but it does deserve consideration.

command/agent/cache/lease_cache.go

…orage restore

command/agent/cache/lease_cache.go

calvn · 2021-10-09T00:44:26Z

command/agent/cache/lease_cache.go

@@ -381,6 +382,7 @@ func (c *LeaseCache) Send(ctx context.Context, req *SendRequest) (*SendResponse,
 			parentCtx = entry.RenewCtxInfo.Ctx

 			index.TokenParent = req.Token
+			index.RequestTokenIndexID = entry.ID


Isn't index.RequestTokenIndexID the same as index.ID?

index.ID is the ID of the request currently being sent. index.RequestTokenIndexID is the ID of the request's auth token: entry, err := c.db.Get(cachememdb.IndexNameToken, req.Token)

Oh I see what you mean, this *Index.ID is against the req.Token. I'm not sure if this works if the cache is set to not use the auto-auth token (use_auto_auth_token = false) since the request token might not be cached in that scenario.

command/agent/cache/lease_cache.go

command/agent/cache/lease_cache_test.go

briankassouf · 2021-10-20T17:33:47Z

command/agent/cache/lease_cache.go

+				select {
+				case <-parent.ch:
+				}
+				c.logger.Trace("parent token restored", "id", index.RequestTokenIndexID)


Do we need to care here if the Parent token was or was not restored based on expiry?

benashz reviewed Oct 7, 2021

View reviewed changes

command/agent/cache/lease_cache.go Outdated Show resolved Hide resolved

benashz reviewed Oct 7, 2021

View reviewed changes

command/agent/cache/lease_cache.go Outdated Show resolved Hide resolved

tvoran reviewed Oct 7, 2021

View reviewed changes

tomhjp mentioned this pull request Oct 8, 2021

agent: tolerate partial restore failure from persistent cache #12718

Merged

Base automatically changed from agent-persistent-cache-restore to main October 8, 2021 10:30

agent: handle dependencies between cached leases during persistent st…

72b29d4

…orage restore

tomhjp force-pushed the agent-persistent-cache-restore-dependencies branch from d53e739 to 72b29d4 Compare October 8, 2021 10:35

vercel bot temporarily deployed to Preview – vault-storybook October 8, 2021 10:35 Inactive

vercel bot temporarily deployed to Preview – vault October 8, 2021 10:35 Inactive

calvn reviewed Oct 9, 2021

View reviewed changes

Simplify concurrent error handling

6445d0b

vercel bot temporarily deployed to Preview – vault-storybook October 11, 2021 11:58 Inactive

vercel bot temporarily deployed to Preview – vault October 11, 2021 11:58 Inactive

Only do lookups on auth leases, new unit test

fe53e0b

vercel bot temporarily deployed to Preview – vault October 13, 2021 09:10 Inactive

vercel bot temporarily deployed to Preview – vault-storybook October 13, 2021 09:10 Inactive

Add changelog and update comment

32c7b33

vercel bot temporarily deployed to Preview – vault-storybook October 13, 2021 10:11 Inactive

vercel bot temporarily deployed to Preview – vault October 13, 2021 10:11 Inactive

tomhjp marked this pull request as ready for review October 13, 2021 10:11

tomhjp requested review from a team, tvoran, benashz and calvn October 13, 2021 10:11

Stronger test assertions for ordering

289a7c3

vercel bot temporarily deployed to Preview – vault-storybook October 13, 2021 12:11 Inactive

vercel bot temporarily deployed to Preview – vault October 13, 2021 12:11 Inactive

Test comment

a9a3692

vercel bot temporarily deployed to Preview – vault October 13, 2021 12:15 Inactive

vercel bot temporarily deployed to Preview – vault-storybook October 13, 2021 12:15 Inactive

Fix test race condition for log buffer

87b750d

vercel bot temporarily deployed to Preview – vault October 14, 2021 13:20 Inactive

vercel bot temporarily deployed to Preview – vault-storybook October 14, 2021 13:20 Inactive

benashz reviewed Oct 14, 2021

View reviewed changes

command/agent/cache/lease_cache.go Show resolved Hide resolved

benashz reviewed Oct 14, 2021

View reviewed changes

command/agent/cache/lease_cache_test.go Show resolved Hide resolved

tomhjp mentioned this pull request Oct 15, 2021

agent/cache: Store leases in-order in persistent cache so that restore respects dependencies #12843

Merged

Don't rely on log order for test

d59a6c6

vercel bot temporarily deployed to Preview – vault October 18, 2021 15:30 Inactive

vercel bot temporarily deployed to Preview – vault-storybook October 18, 2021 15:30 Inactive

No need to clear the cache in tests

37582b6

vercel bot temporarily deployed to Preview – vault October 18, 2021 15:41 Inactive

vercel bot temporarily deployed to Preview – vault-storybook October 18, 2021 15:41 Inactive

briankassouf reviewed Oct 20, 2021

View reviewed changes

tomhjp closed this Oct 25, 2021

tomhjp deleted the agent-persistent-cache-restore-dependencies branch October 25, 2021 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent: handle dependencies between cached leases during persistent storage restore #12765

agent: handle dependencies between cached leases during persistent storage restore #12765

tomhjp commented Oct 7, 2021 •

edited

Loading

tvoran Oct 7, 2021 •

edited

Loading

tomhjp Oct 8, 2021

tomhjp Oct 8, 2021 •

edited

Loading

calvn Oct 9, 2021

tomhjp Oct 11, 2021

calvn Oct 13, 2021

briankassouf Oct 20, 2021

		// RequestTokenIndexID is the ID of the RequestToken's entry in the cache
		RequestTokenIndexID string

agent: handle dependencies between cached leases during persistent storage restore #12765

agent: handle dependencies between cached leases during persistent storage restore #12765

Conversation

tomhjp commented Oct 7, 2021 • edited Loading

tvoran Oct 7, 2021 • edited Loading

Choose a reason for hiding this comment

tomhjp Oct 8, 2021

Choose a reason for hiding this comment

tomhjp Oct 8, 2021 • edited Loading

Choose a reason for hiding this comment

calvn Oct 9, 2021

Choose a reason for hiding this comment

tomhjp Oct 11, 2021

Choose a reason for hiding this comment

calvn Oct 13, 2021

Choose a reason for hiding this comment

briankassouf Oct 20, 2021

Choose a reason for hiding this comment

tomhjp commented Oct 7, 2021 •

edited

Loading

tvoran Oct 7, 2021 •

edited

Loading

tomhjp Oct 8, 2021 •

edited

Loading