Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: handle dependencies between cached leases during persistent storage restore #12765

Closed
wants to merge 9 commits into from

Conversation

tomhjp
Copy link
Contributor

@tomhjp tomhjp commented Oct 7, 2021

This PR transforms the restoring of persistent cache leases into a concurrent operation. This is explicitly not for performance reasons, but instead to help coordinate the processing of leases that depend on other leases. When a lease depends on another, it can wait on the channel for that lease. With channels set up as a coordination primitive, we simply allow the Go runtime to resolve our dependency graph for us in Goroutines.

The algorithm requires that the graph of dependencies is a DAG (Directed Acyclic Graph). I think this is true by definition. For there to be a cycle in the graph, Vault would have to allow a child lease to be a parent of its own parent - which just doesn't make sense.

There is a new field in the schema, but no schema migration logic is required. See the comment here for more detail: #12765 (comment)

Comment on lines +63 to +65
// RequestTokenIndexID is the ID of the RequestToken's entry in the cache
RequestTokenIndexID string

Copy link
Member

@tvoran tvoran Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will adding this field require a schema migration for the boltdb? Maybe not a schema migration per se, but I'm wondering how this would work restoring from a previous version that didn't populate RequestTokenIndexID, if that's something we're going to support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question that I didn't really address in any comments. As you point out, we could end up restoring from a previous version that didn't populate this field. In that case, the algorithm basically reverts to the same behaviour as today. The child won't be able to find its parent in the leasesMap, and so it won't restore in dependency order. It will just depend on luck whether the child is restored before or after the parent.

We could try harder to populate existing leases in existing caches with this information (rather than the PR's current approach of just populating new leases), but I doubt it's worth the complexity given the typical lifetimes we would expect for the persistent cache file and the leases inside it.

Copy link
Contributor Author

@tomhjp tomhjp Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarise the above, I think it's probably not worth adding any schema migration logic for this specific change, but it does deserve consideration.

command/agent/cache/lease_cache.go Outdated Show resolved Hide resolved
Base automatically changed from agent-persistent-cache-restore to main October 8, 2021 10:30
@tomhjp tomhjp force-pushed the agent-persistent-cache-restore-dependencies branch from d53e739 to 72b29d4 Compare October 8, 2021 10:35
@vercel vercel bot temporarily deployed to Preview – vault-storybook October 8, 2021 10:35 Inactive
@vercel vercel bot temporarily deployed to Preview – vault October 8, 2021 10:35 Inactive
command/agent/cache/lease_cache.go Outdated Show resolved Hide resolved
command/agent/cache/lease_cache.go Outdated Show resolved Hide resolved
command/agent/cache/lease_cache.go Outdated Show resolved Hide resolved
@@ -381,6 +382,7 @@ func (c *LeaseCache) Send(ctx context.Context, req *SendRequest) (*SendResponse,
parentCtx = entry.RenewCtxInfo.Ctx

index.TokenParent = req.Token
index.RequestTokenIndexID = entry.ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't index.RequestTokenIndexID the same as index.ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index.ID is the ID of the request currently being sent. index.RequestTokenIndexID is the ID of the request's auth token: entry, err := c.db.Get(cachememdb.IndexNameToken, req.Token)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean, this *Index.ID is against the req.Token. I'm not sure if this works if the cache is set to not use the auto-auth token (use_auto_auth_token = false) since the request token might not be cached in that scenario.

@vercel vercel bot temporarily deployed to Preview – vault-storybook October 11, 2021 11:58 Inactive
@vercel vercel bot temporarily deployed to Preview – vault October 11, 2021 11:58 Inactive
@vercel vercel bot temporarily deployed to Preview – vault October 13, 2021 09:10 Inactive
@vercel vercel bot temporarily deployed to Preview – vault-storybook October 13, 2021 09:10 Inactive
@vercel vercel bot temporarily deployed to Preview – vault-storybook October 13, 2021 10:11 Inactive
@vercel vercel bot temporarily deployed to Preview – vault October 13, 2021 10:11 Inactive
@tomhjp tomhjp marked this pull request as ready for review October 13, 2021 10:11
@tomhjp tomhjp requested review from a team, tvoran, benashz and calvn October 13, 2021 10:11
@vercel vercel bot temporarily deployed to Preview – vault-storybook October 13, 2021 12:11 Inactive
@vercel vercel bot temporarily deployed to Preview – vault October 13, 2021 12:11 Inactive
@vercel vercel bot temporarily deployed to Preview – vault October 13, 2021 12:15 Inactive
@vercel vercel bot temporarily deployed to Preview – vault-storybook October 13, 2021 12:15 Inactive
@vercel vercel bot temporarily deployed to Preview – vault October 14, 2021 13:20 Inactive
@vercel vercel bot temporarily deployed to Preview – vault-storybook October 14, 2021 13:20 Inactive
@vercel vercel bot temporarily deployed to Preview – vault October 18, 2021 15:30 Inactive
@vercel vercel bot temporarily deployed to Preview – vault-storybook October 18, 2021 15:30 Inactive
@vercel vercel bot temporarily deployed to Preview – vault October 18, 2021 15:41 Inactive
@vercel vercel bot temporarily deployed to Preview – vault-storybook October 18, 2021 15:41 Inactive
select {
case <-parent.ch:
}
c.logger.Trace("parent token restored", "id", index.RequestTokenIndexID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care here if the Parent token was or was not restored based on expiry?

@tomhjp tomhjp closed this Oct 25, 2021
@tomhjp tomhjp deleted the agent-persistent-cache-restore-dependencies branch October 25, 2021 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants