Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: fix non-streaming exec attachment #24095

Merged
merged 1 commit into from
Oct 1, 2024
Merged

Conversation

tgross
Copy link
Member

@tgross tgross commented Oct 1, 2024

In ##23966 when we switched to using the official Docker SDK client, this included new API calls for attaching to the "exec objects" created for running processes inside a running Docker task. When we updated the API for the non-streaming cases (script health checks, and change_mode = "script"), we used the container ID and not the exec object ID. These IDs aren't identical because you can have multiple exec objects for a given container. This results in errors like "unable to upgrade to tcp, received 404" because the Docker API can't find the exec object with the container ID.

  • Ref: NET-11202 (comment)
  • This has shipped in Nomad 1.9.0-beta.1 but not production yet.
  • I found this via hands-on testing but I suspect it's what's causing test failures overnight in E2E as well, but haven't proven that yet.

In ##23966 when we switched to using the official Docker SDK client, this
included new API calls for attaching to the "exec objects" created for running
processes inside a running Docker task. When we updated the API for the
non-streaming cases (script health checks, and `change_mode = "script"`), we
used the container ID and not the exec object ID. These IDs aren't identical
because you can have multiple exec objects for a given container. This results
in errors like "unable to upgrade to tcp, received 404" because the Docker API
can't find the exec object with the container ID.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
@@ -110,12 +110,12 @@ func (h *taskHandle) Exec(ctx context.Context, cmd string, args []string) (*driv
}

// hijack exec output streams
hijacked, err := h.dockerClient.ContainerExecAttach(ctx, h.containerID, containerapi.ExecStartOptions{
hijacked, err := h.dockerClient.ContainerExecAttach(ctx, exec.ID, containerapi.ExecStartOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooof a nasty copy-paste :/

Copy link
Contributor

@pkazmierczak pkazmierczak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tgross tgross merged commit 7a88d5d into main Oct 1, 2024
32 checks passed
@tgross tgross deleted the docker-exec-attach-scripts branch October 1, 2024 15:27
tgross added a commit that referenced this pull request Oct 1, 2024
In #24095 we made a fix for non-streaming exec into Docker tasks for script
checks and `change_mode = "script"`, but didn't complete E2E testing. We need to
use `ContainerExecAttach` in the new API in order to get stdout/stderr from
tasklets, but the previous `ContainerExecStart` call will prevent this from
running successfully with an error that the exec has already run.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
* This should fix the remaining issues in nightly E2E for Docker.
tgross added a commit that referenced this pull request Oct 1, 2024
In #24095 we made a fix for non-streaming exec into Docker tasks for script
checks and `change_mode = "script"`, but didn't complete E2E testing. We need to
use `ContainerExecAttach` in the new API in order to get stdout/stderr from
tasklets, but the previous `ContainerExecStart` call will prevent this from
running successfully with an error that the exec has already run.

* Ref: [NET-11202 (comment)](https://hashicorp.atlassian.net/browse/NET-11202?focusedCommentId=551618)
* This has shipped in Nomad 1.9.0-beta.1 but not production yet.
* This should fix the remaining issues in nightly E2E for Docker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants