-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: handle dependencies between cached leases during persistent storage restore #12765
Conversation
// RequestTokenIndexID is the ID of the RequestToken's entry in the cache | ||
RequestTokenIndexID string | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will adding this field require a schema migration for the boltdb? Maybe not a schema migration per se, but I'm wondering how this would work restoring from a previous version that didn't populate RequestTokenIndexID
, if that's something we're going to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question that I didn't really address in any comments. As you point out, we could end up restoring from a previous version that didn't populate this field. In that case, the algorithm basically reverts to the same behaviour as today. The child won't be able to find its parent in the leasesMap
, and so it won't restore in dependency order. It will just depend on luck whether the child is restored before or after the parent.
We could try harder to populate existing leases in existing caches with this information (rather than the PR's current approach of just populating new leases), but I doubt it's worth the complexity given the typical lifetimes we would expect for the persistent cache file and the leases inside it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarise the above, I think it's probably not worth adding any schema migration logic for this specific change, but it does deserve consideration.
d53e739
to
72b29d4
Compare
@@ -381,6 +382,7 @@ func (c *LeaseCache) Send(ctx context.Context, req *SendRequest) (*SendResponse, | |||
parentCtx = entry.RenewCtxInfo.Ctx | |||
|
|||
index.TokenParent = req.Token | |||
index.RequestTokenIndexID = entry.ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't index.RequestTokenIndexID
the same as index.ID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index.ID
is the ID of the request currently being sent. index.RequestTokenIndexID
is the ID of the request's auth token: entry, err := c.db.Get(cachememdb.IndexNameToken, req.Token)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see what you mean, this *Index.ID
is against the req.Token
. I'm not sure if this works if the cache is set to not use the auto-auth token (use_auto_auth_token = false
) since the request token might not be cached in that scenario.
select { | ||
case <-parent.ch: | ||
} | ||
c.logger.Trace("parent token restored", "id", index.RequestTokenIndexID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to care here if the Parent token was or was not restored based on expiry?
This PR transforms the restoring of persistent cache leases into a concurrent operation. This is explicitly not for performance reasons, but instead to help coordinate the processing of leases that depend on other leases. When a lease depends on another, it can wait on the channel for that lease. With channels set up as a coordination primitive, we simply allow the Go runtime to resolve our dependency graph for us in Goroutines.
The algorithm requires that the graph of dependencies is a DAG (Directed Acyclic Graph). I think this is true by definition. For there to be a cycle in the graph, Vault would have to allow a child lease to be a parent of its own parent - which just doesn't make sense.
There is a new field in the schema, but no schema migration logic is required. See the comment here for more detail: #12765 (comment)