Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul/connect: fixed a bug where restarting proxy tasks failed. #16815

Merged
merged 2 commits into from
Apr 11, 2023

Conversation

jrasell
Copy link
Member

@jrasell jrasell commented Apr 6, 2023

The first start of a Consul Connect proxy sidecar triggers a run of the envoy_version hook which modifies the task config image entry. The modification takes into account a number of factors to correctly populate this. Importantly, once the hook has run, it marks itself as done so the taskrunner will not execute it again.

When the client receives a non-destructive update for the allocation which the proxy sidecar is a member of, it will update and overwrite the task definition within the taskerunner. In doing so it overwrite the modification performed by the hook. If the allocation is restarted, the envoy_version hook will be skipped as it previously marked itself as done, and therefore the sidecar config image is incorrect and causes a driver error.

The fix removes the hook in marking itself as done to the view of the taskrunner, and instead tracks this internally. The hook also implements the TaskPreKillHook interface, so that it's state can be correctly modified when required by task events.

closes #9887

The linked issue has reproduction steps for testing.

@jrasell jrasell added backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line labels Apr 6, 2023
@jrasell jrasell requested review from schmichael, shoenig and tgross April 6, 2023 13:20
@jrasell jrasell self-assigned this Apr 6, 2023
jrasell added a commit that referenced this pull request Apr 6, 2023
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of "what it says on the tin", this looks great @jrasell. I've left a comment about unrequested restarts though, which I think is another case where this will come up.

client/allocrunner/taskrunner/envoy_version_hook.go Outdated Show resolved Hide resolved
client/allocrunner/taskrunner/envoy_version_hook_test.go Outdated Show resolved Hide resolved
jrasell added 2 commits April 11, 2023 09:11
The first start of a Consul Connect proxy sidecar triggers a run
of the envoy_version hook which modifies the task config image
entry. The modification takes into account a number of factors to
correctly populate this. Importantly, once the hook has run, it
marks itself as done so the taskrunner will not execute it again.

When the client receives a non-destructive update for the
allocation which the proxy sidecar is a member of, it will update
and overwrite the task definition within the taskerunner. In doing
so it overwrite the modification performed by the hook. If the
allocation is restarted, the envoy_version hook will be skipped as
it previously marked itself as done, and therefore the sidecar
config image is incorrect and causes a driver error.

The fix removes the hook in marking itself as done to the view of
the taskrunner.
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jrasell jrasell merged commit fc75e9d into main Apr 11, 2023
@jrasell jrasell deleted the b-gh-9887-envoy-proxy-restart-hook branch April 11, 2023 14:56
jrasell added a commit that referenced this pull request Apr 11, 2023
)

The first start of a Consul Connect proxy sidecar triggers a run
of the envoy_version hook which modifies the task config image
entry. The modification takes into account a number of factors to
correctly populate this. Importantly, once the hook has run, it
marks itself as done so the taskrunner will not execute it again.

When the client receives a non-destructive update for the
allocation which the proxy sidecar is a member of, it will update
and overwrite the task definition within the taskerunner. In doing
so it overwrite the modification performed by the hook. If the
allocation is restarted, the envoy_version hook will be skipped as
it previously marked itself as done, and therefore the sidecar
config image is incorrect and causes a driver error.

The fix removes the hook in marking itself as done to the view of
the taskrunner.
jrasell added a commit that referenced this pull request Apr 11, 2023
)

The first start of a Consul Connect proxy sidecar triggers a run
of the envoy_version hook which modifies the task config image
entry. The modification takes into account a number of factors to
correctly populate this. Importantly, once the hook has run, it
marks itself as done so the taskrunner will not execute it again.

When the client receives a non-destructive update for the
allocation which the proxy sidecar is a member of, it will update
and overwrite the task definition within the taskerunner. In doing
so it overwrite the modification performed by the hook. If the
allocation is restarted, the envoy_version hook will be skipped as
it previously marked itself as done, and therefore the sidecar
config image is incorrect and causes a driver error.

The fix removes the hook in marking itself as done to the view of
the taskrunner.
jrasell added a commit that referenced this pull request Apr 11, 2023
) (#16844)

The first start of a Consul Connect proxy sidecar triggers a run
of the envoy_version hook which modifies the task config image
entry. The modification takes into account a number of factors to
correctly populate this. Importantly, once the hook has run, it
marks itself as done so the taskrunner will not execute it again.

When the client receives a non-destructive update for the
allocation which the proxy sidecar is a member of, it will update
and overwrite the task definition within the taskerunner. In doing
so it overwrite the modification performed by the hook. If the
allocation is restarted, the envoy_version hook will be skipped as
it previously marked itself as done, and therefore the sidecar
config image is incorrect and causes a driver error.

The fix removes the hook in marking itself as done to the view of
the taskrunner.
jrasell added a commit that referenced this pull request Apr 11, 2023
) (#16843)

The first start of a Consul Connect proxy sidecar triggers a run
of the envoy_version hook which modifies the task config image
entry. The modification takes into account a number of factors to
correctly populate this. Importantly, once the hook has run, it
marks itself as done so the taskrunner will not execute it again.

When the client receives a non-destructive update for the
allocation which the proxy sidecar is a member of, it will update
and overwrite the task definition within the taskerunner. In doing
so it overwrite the modification performed by the hook. If the
allocation is restarted, the envoy_version hook will be skipped as
it previously marked itself as done, and therefore the sidecar
config image is incorrect and causes a driver error.

The fix removes the hook in marking itself as done to the view of
the taskrunner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

envoyproxy/envoy:v${NOMAD_envoy_version} error="API error (400): invalid tag format"
3 participants