-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
template: trigger change_mode for dynamic secrets on restore #9636
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I have one question mostly so I understand the low level mechanics of the consul-template library
// driver, which is useful for distinguishing restored tasks during | ||
// prestart hooks. Note: prestart should be idempotent whenever possible | ||
// to handle restored tasks safely; use this as an escape hatch. | ||
HasHandle() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true only if the task is running? If so, would IsRunning()
be a more appropriate name? I also suspect we need to watch out for check-then-act races (e.g. process killed immediately after check) - I suspect in this context, it's fine. Might be worth noting the issue in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out... it's probably worth me double-checking that if the driver handle goes away we can't NPE in the template runner. At that point we're in a non-working state for this task to be sure but it shouldn't blow up the client process.
I went back-and-forth on the name because we only know that there's a driver handle, not that the process is running and I didn't want future callers to be confused by that. But given the TOCTOU concern you've pointed out, I don't think the caller can assume all that much here except a best effort so I'll change the name and add some commentary about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name changed in 8c66608
|
||
events := tm.runner.RenderEvents() | ||
for id, event := range events { | ||
func (tm *TaskTemplateManager) onTemplateRendered(handledRenders map[string]time.Time, allRenderedTime time.Time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on extracting the function. This commit is best viewed with ?w=1
: 815f01c?w=1 .
select { | ||
case <-harness.mockHooks.RestartCh: | ||
t.Fatalf("should not have restarted: %+v", harness.mockHooks) | ||
case <-harness.mockHooks.SignalCh: | ||
t.Fatalf("should not have restarted: %+v", harness.mockHooks) | ||
case <-time.After(time.Duration(1*testutil.TestMultiplier()) * time.Second): | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to testing that process doesn't get restarted if the template doesn't change. However, I'm unclear how consul-template state management handles that? I assumed consul-template library kept template rendition in memory to compare with re-renditions. On nomad agent restart, how does CT library refresh the latest state - does it read the template file and refresh its state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On nomad agent restart, how does CT library refresh the latest state - does it read the template file and refresh its state?
Yeah, there are effectively 3 chunks of state that CT has:
- The
source
which gets reloaded on every client restart. - The Consul/Vault contents, which gets refreshed on every client restart. That's actually the source of our problem here because for dynamic secrets that's a write which creates a new lease and not a read. Every time this triggers, the
RenderEvent.WouldRender
is true. And that includes a client restart. - The
destination
: the CT library then compares the results with the current contents of the destination and theRenderEvent.DidRender
is set to false if it didn't actually change on-disk.
So what we're doing is differentiating between the RenderEvent.WouldRender
and RenderEvent.DidRender
cases to avoid doing extra restarts we don't need.
Also: this is making me realize we should clarify the behavior around changing the source
out of band; if you do that while Nomad is running, CT doesn't see it. But then if you restart it will see the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a CHANGELOG and upgrade guide docs PR to go with this (holding off until the giant docs replatform change lands). I'll update the docs on source
in that PR.
2da0b1f
to
8c66608
Compare
Move this back into draft as per #9491 (comment)
|
8c66608
to
b40ae67
Compare
Added changelog and upgrade docs. See #9491 (comment) for why this is now ready-to-go. |
Pull this function out so it can be used by the first-pass renderer when we restore a task from a restarted client.
In the task hook implementation, `HasHandle` returns true if the task runner has a handle to the task driver, so that we can distinguish restored tasks during prestart hooks without passing around additional state.
When a task is restored after a client restart, the template runner will create a new lease for any dynamic secret (ex. Consul or PKI secrets engines). But because this lease is being created in the prestart hook, we don't trigger the `change_mode`. This changeset uses the the existence of the task handle to detect a previously running task that's been restored, so that we can trigger the template `change_mode` if the template is changed, as it will be only with dynamic secrets.
b40ae67
to
f357837
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is a bit beyond my ability to review in-depth; there's too much that I'm unfamiliar with.
When a task is restored after a client restart, the template runner will create a new lease for any dynamic secret (ex. Consul or PKI secrets engines). But because this lease is being created in the prestart hook, we don't trigger the `change_mode`. This changeset uses the the existence of the task handle to detect a previously running task that's been restored, so that we can trigger the template `change_mode` if the template is changed, as it will be only with dynamic secrets.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #9491
When a task is restored after a client restart, the template runner will
create a new lease for any dynamic secret (ex. Consul or PKI secrets
engines). But because this lease is being created in the prestart hook, we
don't trigger the
change_mode
.This changeset uses the the existence of the task handle to detect a
previously running task that's been restored, so that we can trigger the
template
change_mode
if the template is changed, as it will be only withdynamic secrets.
Note to reviewers: because this requires a bit of a refactor of the
onTemplateRendered
so that it can be used for both first-pass and re-render, this PR is probably best reviewed commit-by-commit. #9491 (comment) and the diagram in the comment that follow it can walk you thru the problem.