Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Docker APIv2 container wait endpoint #9048

Merged
merged 6 commits into from
Feb 5, 2021

Conversation

matejvasek
Copy link
Contributor

@matejvasek matejvasek commented Jan 21, 2021

resolves #8697
resolves #9157

  • container wait functions are now cancel-able via context.Context
  • container wait functions can wait for multiple states, not just single one
  • docker container wait endpoint now accepts docker conditions: not-running, removed, next-exit
  • docker container wait endpoint now sends headers in advance before actual waiting

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 21, 2021
@matejvasek matejvasek force-pushed the apiv2_wait branch 3 times, most recently from bbf73df to ce81023 Compare January 27, 2021 19:23
@matejvasek matejvasek requested a review from mheon January 27, 2021 19:41
@matejvasek
Copy link
Contributor Author

@mheon PTAL

@matejvasek matejvasek changed the title [WIP] APIv2 wait APIv2 wait endpoint Jan 27, 2021
@matejvasek matejvasek marked this pull request as ready for review January 27, 2021 19:42
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 27, 2021

t POST "containers/nonExistent/wait?condition=next-exit" '' 404

podman create --name "${CTR}" --entrypoint '["sleep", "0.5"]' "${IMAGE}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I could have just --entrypoint '["true"]' here, however such a container exits too fast. 😞

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meaning the container is in running state for such a short duration we are not capable of detecting that

}

func waitRemoved(ctx context.Context, con *libpod.Container, interval time.Duration) (int32, error) {
code, err := con.WaitForConditionWithInterval(ctx, interval, define.ContainerStateUnknown)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am not waiting for specific state but for an error.

@mheon
Copy link
Member

mheon commented Jan 27, 2021

@rhatdan I think this conflicts with some of the work you've been doing, and I think I prefer this version - it enables more functionality that we don't have yet.

@rhatdan
Copy link
Member

rhatdan commented Jan 27, 2021

I disagree, I am fine with most of this, but it should all be implemented in the standard API code. We should not have three different ways of doing the same function. Too much risk of them drifting, as I have already seen with every single interface I have looked at.

@matejvasek
Copy link
Contributor Author

matejvasek commented Jan 27, 2021

@rhatdan do you mean I should add the three new condition to the define.ContainerStatus "enum" and then handle it in WaitForConditionWithInterval() ? What you are working on?

@mheon
Copy link
Member

mheon commented Jan 27, 2021

@rhatdan I disagree. Our API should not support the Docker-specific constants or the specifics of their operation mode. We can try and unify these but it's going to be a massive pain of conditional code

@rhatdan
Copy link
Member

rhatdan commented Jan 29, 2021

My point being that podman-remote wait and podman wait should ultimately call into pkg/domain/infra/abi/ wait function.
If Docker API needs some fancy handling, I am fine with adding it.
I would like to get this merged along with the Stop/Remove/Wait patch that I have going into todays release.

@mheon
Copy link
Member

mheon commented Jan 29, 2021

@rhatdan I don't believe the way Docker's Wait and our Wait work are compatible at this point. The handling for the custom condition codes is fundamentally different from the way our Wait code functions.

@matejvasek
Copy link
Contributor Author

I think that really different is next-exit condition. Seems to me that in podman condition == state. This is not true for next-exit since it describes transition between two states instead of one particular state.

@matejvasek
Copy link
Contributor Author

Condition not-running is not that different what podman has. As for removed, there is no such a state since the container no longer exists 😄 , so I am basically waiting for define.ErrNoSuchCtr.

@matejvasek
Copy link
Contributor Author

Note that next-exit may fail for short living containers (even if polling interval is 1ns).

@matejvasek
Copy link
Contributor Author

I did experiment with condition variables it the shared memory as replacement for polling, but results weren't satisfactory.

@matejvasek
Copy link
Contributor Author

To make next-exit truly reliable we probably need to introduce some better inter-process notification mechanism. Any ideas what we could use?

@rhatdan
Copy link
Member

rhatdan commented Jan 30, 2021

Let's get #9051 in and then rebase this on it.
Please review ^^

@rhatdan
Copy link
Member

rhatdan commented Jan 30, 2021

/approve

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: matejvasek, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 30, 2021
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 3, 2021
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 3, 2021
@matejvasek matejvasek changed the title APIv2 wait endpoint Fix Docker APIv2 container wait endpoint Feb 4, 2021
@@ -176,7 +176,7 @@ type UnpauseOptions struct{}
//go:generate go run ../generator/generator.go WaitOptions
// WaitOptions are optional options for waiting on containers
type WaitOptions struct {
Condition *define.ContainerStatus
Condition []define.ContainerStatus
Copy link
Contributor Author

@matejvasek matejvasek Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately choose not to change this to plural for preserving some degree of compatibility.
With this, older client can call new server safely. Similarly new client can call older server, but only as long as the condition contains at most one element.

Copy link
Contributor Author

@matejvasek matejvasek Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand generator correctly uri param is derived just from member name. Maybe we can introduce some custom tag. So it could look like:

type WaitOptions struct {
	Conditions []define.ContainerStatus `uriparam:"condition"`
	Interval   *string                  `uriparam:"interval"`
}

Meaning the name is changed, but on wire it's the same.

@@ -478,13 +479,15 @@ func (c *Container) RemoveArtifact(name string) error {
}

// Wait blocks until the container exits and returns its exit code.
func (c *Container) Wait() (int32, error) {
return c.WaitWithInterval(DefaultWaitInterval)
func (c *Container) Wait(ctx context.Context) (int32, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the wait functions cancel-able. Useful for not leaking gorutines when client cancels request (closes connection abruptly). For CLI podman it's not used since context passed here is not canceled on SIGTERM or SIGINT. CLI chooses IMO rather brutal os.Exit().

libpod/container_api.go Outdated Show resolved Hide resolved
@@ -495,41 +498,111 @@ func (c *Container) WaitWithInterval(waitTimeout time.Duration) (int32, error) {
}
chWait := make(chan error, 1)

defer close(chWait)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this could still be useful, given that ctx.Done() is a goroutine and not guaranteed to fire at the end of the function, but rather when the entire handler returns, AFAIK

Copy link
Contributor Author

@matejvasek matejvasek Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably shouldn't hurt, but also it's not necessary to close the channel here.


var wg sync.WaitGroup

if waitForExit {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like both this and the len(wantedStates) > 0 below can both be active at the same time - it looks like we don't need that, though, that goroutine seems to take care of the same cases this one does. Should we only do this waitForExit bit if the user is only waiting for Stopped and Exited, and return immediately after?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now in practice only one is active. I future they might be both active. Waiting for terminal state differs from non-terminal. For instance if user condition is [Stopped, Removing], both gorutines have to be active. One is waiting for Stopped and the other one is waiting for Removing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The len(wantedStates) > 0 gorutine is not taking care of non-terminal states.

@mheon
Copy link
Member

mheon commented Feb 4, 2021

Done reviewing. @rhatdan @jwhonce PTAL

Signed-off-by: Matej Vasek <[email protected]>
@mheon
Copy link
Member

mheon commented Feb 4, 2021

@containers/podman-maintainers PTAL

@rhatdan
Copy link
Member

rhatdan commented Feb 4, 2021

Great work @matejvasek Thanks.
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 4, 2021
@rhatdan
Copy link
Member

rhatdan commented Feb 5, 2021

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 5, 2021
@openshift-merge-robot openshift-merge-robot merged commit 42d4652 into containers:master Feb 5, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
6 participants