remote: exec: do not leak session IDs on errors #20405

giuseppe · 2023-10-18T19:01:51Z

commit fa19e1b partially introduced the fix, but was merged too quickly and didn't work with remote.

Introduce a new binding to allow removing a session from the remote client.

[NO NEW TESTS NEEDED] since it fixes a CI failure

Does this PR introduce a user-facing change?

None

openshift-ci · 2023-10-18T19:01:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

edsantiago · 2023-10-18T19:26:48Z

You're going to need to rebase & repush, to incorporate #20404 and then remove that skip, but no need to do that yet. Let's see how this CI run goes. Thank you for working on this.

mheon · 2023-10-18T20:00:44Z

Idea LGTM. Haven't done a solid look over the code yet, though.

Luap99

Do we really want to expose a REST endpoint for this?
AFAIK via remote we use the 5 mins delay for the conmon cleanup process so the expectation of the test is just wrong IMO. (conf.Engine.ExitCommandDelay)

So if we want to cleanup right away this should be just set to 0 but that then would cause race conditions on the client as it has to inspect after exec finishes to get the exit code.

So I am not sure if that has to be fixed at all as the exec session will be cleanup up eventually. And in your test we only have a error condition so I see no reason why the server shouldn't handle the remove to make it work. Trusting lcients to do the right things sounds wrong to me.

As far as I can tell docker does also only cleanup exec session every 5 min:
https://github.com/moby/moby/blob/0253fedf03f4964c20906795e119404633cb9c1a/daemon/exec.go#L323

Luap99 · 2023-10-19T08:49:37Z

pkg/domain/infra/tunnel/containers.go

+	defer func() {
+		if retErr != nil {
+			_ = containers.ExecRemove(ic.ClientCtx, sessionID, nil)
+		}
+	}()


shouldn't this be handled on the server side? We should not expect all API clients to know about this.

commit fa19e1b partially introduced the fix, but was merged too quickly and didn't work with remote. Introduce a new binding to allow removing a session from the remote client. Signed-off-by: Giuseppe Scrivano <[email protected]>

This reverts commit 44ed415. Signed-off-by: Giuseppe Scrivano <[email protected]>

giuseppe · 2023-10-19T09:41:47Z

Having the cleanup server-side would simplify the implementation but IMO it is cleaner if this is handled by the client since the client is already responsible for creating the session through ExecCreate, and now we are just exposing a way to delete it whenever necessary without waiting for the timeout.

If we handle it server-side, then should the session be deleted on any kind of error (e.g. invalid JSON request) or only on internal errors? Not sure if it classifies as a breaking change but it changes the current behavior since the session ID used by the client won't be valid anymore after an error, which is not the case now.

Luap99 · 2023-10-19T10:03:38Z

Yes I guess that is a fair question but IMO dumping responsibility on clients makes it worse now, as they have the responsibility to cleanup exec session that they did not have before (ok it was just leaked for now). If we talk about podman-remote only I would agree that we can take care of it but this endpoint will be part of our public stable API.
Users like podman-py or really anyone using exec via REST API may need to use that as podman seems unable to correctly cleanup exec sessions which is a bug and the server should take care of leaked sessions regardless, we should never trust the client to do this.

Also if I read the moby code correctly deleting the exec session on a failed start seems to be what docker is doing so I doubt that this would cause problems for API users
https://github.com/moby/moby/blob/0253fedf03f4964c20906795e119404633cb9c1a/daemon/exec.go#L181-L193

giuseppe · 2023-10-19T12:11:15Z

Also if I read the moby code correctly deleting the exec session on a failed start seems to be what docker is doing so I doubt that this would cause problems for API users
https://github.com/moby/moby/blob/0253fedf03f4964c20906795e119404633cb9c1a/daemon/exec.go#L181-L193

thanks for checking that. If that is the expectation for Docker compatibility, let's go with it. Pushed a simpler implementation with the server-side cleanup

mheon · 2023-10-19T13:34:05Z

LGTM, tests are red though

giuseppe · 2023-10-19T15:14:59Z

if we delete the session ID on errors, we cannot retrieve the exit code later

giuseppe · 2023-10-19T15:41:54Z

we would break the current use case where we can query the sessionID after an error.

I've reverted to the previous version of doing the removal client-side, if you don't like this approach, we can just disable the test for remote and rely on the timeout mechanism

rhatdan · 2023-10-19T18:41:55Z

/lgtm

Can we also do it server side in a separate PR?

giuseppe · 2023-10-19T18:42:42Z

/hold

giuseppe · 2023-10-19T18:48:38Z

Can we also do it server side in a separate PR?

I've tried doing it server side before, but it breaks the current client code where we check the exit status after the failure. If we delete the session immediately, the client won't be able to retrieve the exit code. Server-side I think the best we can do is what we already have with the timeout mechanism, unless we do it in a new API, which seems overkill for such a minor issue. I've not worked much on the server/client part, so if anyone has better ideas, I am all ears.

@cevich CI is currently failing with:

Failed to start an instance: FAILED_PRECONDITION: Monthly free compute limit exceeded!

mheon · 2023-10-19T20:19:36Z

/hold cancel

mheon · 2023-10-19T20:19:51Z

CI has returned to normal

openshift-ci bot added the release-note-none label Oct 18, 2023

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2023

github-actions bot added the kind/api-change Change to remote API; merits scrutiny label Oct 18, 2023

giuseppe mentioned this pull request Oct 18, 2023

Emergency workaround for CI breakage #20404

Merged

giuseppe force-pushed the do-not-leak-sessions-with-remote branch from bc1fb91 to edf1654 Compare October 18, 2023 19:16

giuseppe force-pushed the do-not-leak-sessions-with-remote branch from edf1654 to 7765d46 Compare October 18, 2023 19:40

giuseppe force-pushed the do-not-leak-sessions-with-remote branch from d7dae7c to ddecb51 Compare October 19, 2023 07:42

Luap99 reviewed Oct 19, 2023

View reviewed changes

giuseppe force-pushed the do-not-leak-sessions-with-remote branch from ddecb51 to 00d2303 Compare October 19, 2023 09:25

giuseppe added 2 commits October 19, 2023 11:34

remote: exec: do not leak session IDs on errors

f48a706

commit fa19e1b partially introduced the fix, but was merged too quickly and didn't work with remote. Introduce a new binding to allow removing a session from the remote client. Signed-off-by: Giuseppe Scrivano <[email protected]>

Revert "Emergency workaround for CI breakage"

1d2589c

This reverts commit 44ed415. Signed-off-by: Giuseppe Scrivano <[email protected]>

giuseppe force-pushed the do-not-leak-sessions-with-remote branch from 00d2303 to 1d2589c Compare October 19, 2023 09:34

giuseppe force-pushed the do-not-leak-sessions-with-remote branch 2 times, most recently from aa7abd9 to 106b097 Compare October 19, 2023 12:09

giuseppe force-pushed the do-not-leak-sessions-with-remote branch from 106b097 to 5a6252a Compare October 19, 2023 12:28

giuseppe force-pushed the do-not-leak-sessions-with-remote branch from 5a6252a to 1d2589c Compare October 19, 2023 15:40

openshift-ci bot assigned rhatdan Oct 19, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 19, 2023

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 19, 2023

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 19, 2023

openshift-ci bot merged commit 37292a1 into containers:main Oct 19, 2023
100 checks passed

Luap99 mentioned this pull request Nov 29, 2023

fix podman-remote exec regression with v4.8 #20831

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jan 18, 2024

github-actions bot locked as resolved and limited conversation to collaborators Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remote: exec: do not leak session IDs on errors #20405

remote: exec: do not leak session IDs on errors #20405

giuseppe commented Oct 18, 2023

openshift-ci bot commented Oct 18, 2023

edsantiago commented Oct 18, 2023

mheon commented Oct 18, 2023

Luap99 left a comment

Luap99 Oct 19, 2023

giuseppe commented Oct 19, 2023

Luap99 commented Oct 19, 2023

giuseppe commented Oct 19, 2023

mheon commented Oct 19, 2023

giuseppe commented Oct 19, 2023

giuseppe commented Oct 19, 2023

rhatdan commented Oct 19, 2023

giuseppe commented Oct 19, 2023

giuseppe commented Oct 19, 2023

mheon commented Oct 19, 2023

mheon commented Oct 19, 2023

remote: exec: do not leak session IDs on errors #20405

remote: exec: do not leak session IDs on errors #20405

Conversation

giuseppe commented Oct 18, 2023

Does this PR introduce a user-facing change?

openshift-ci bot commented Oct 18, 2023

edsantiago commented Oct 18, 2023

mheon commented Oct 18, 2023

Luap99 left a comment

Choose a reason for hiding this comment

Luap99 Oct 19, 2023

Choose a reason for hiding this comment

giuseppe commented Oct 19, 2023

Luap99 commented Oct 19, 2023

giuseppe commented Oct 19, 2023

mheon commented Oct 19, 2023

giuseppe commented Oct 19, 2023

giuseppe commented Oct 19, 2023

rhatdan commented Oct 19, 2023

giuseppe commented Oct 19, 2023

giuseppe commented Oct 19, 2023

mheon commented Oct 19, 2023

mheon commented Oct 19, 2023