Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Begin exec rework #5088

Merged
merged 3 commits into from
Mar 19, 2020
Merged

Conversation

mheon
Copy link
Member

@mheon mheon commented Feb 4, 2020

As part of the rework of exec sessions, we need to address them independently of containers. In the new API, we need to be able to fetch them by their ID, regardless of what container they are associated with. Unfortunately, our existing exec sessions are tied to individual containers; there's no way to tell what container a session belongs to and retrieve it without getting every exec session for every container.

I originally debated just adding a registry to the database to point individual exec sessions at each container, but if we're going so far as to completely restructure exec sessions, it makes sense to store them separately as well. Existing exec sessions will be retained as-is for legacy purposes, while the new-style exec sessions will be stored separately from containers and accessed via their own API endpoints.

There's some potential performance impact to be discussed here in that we'll now need one DB call to get each exec session, and then one call per exec session to update its state, whenever we need to do an operation that needs to check if there are running exec sessions. I'm also concerned as to the circumstances of them remaining in the database; I need to verify on the Docker side as to when they are cleaned (only on reboot? on container restart as well? when the exec session finishes?).

This is very early; I'm only about halfway through DB changes, which will then require a refactor of existing exec code to use the new fields in the DB, and associated changes to the lifecycle of exec sessions.

TODO:

  • Finish updating database for new storage scheme of exec sessions
  • Update Exec() API in container_exec.go to use new exec session storage
  • Create function for checking if any exec sessions are running, and use this to replace all existing checks of that type
  • Update container removal to deal with new exec sessions
  • Investigate when exec sessions are removed from the database and update our code to follow that lifecycle
  • Investigate cleanup processes for exec sessions so that we can support detached exec

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL labels Feb 4, 2020
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2020
@mheon
Copy link
Member Author

mheon commented Feb 5, 2020

Update: looking more into exec sessions in Docker, it looks like they are removed immediately after termination - probably just the exit code is retrieved via Inspect. We will need the ability to autoremove exec sessions for detached containers. I still see some potential for leaking, but we can probably counter that by clearing them on container restart.

@mheon mheon force-pushed the begin_exec_rework branch from 21e3a65 to 4204f35 Compare February 7, 2020 19:43
@mheon
Copy link
Member Author

mheon commented Feb 7, 2020

DB changes should be complete. Still doesn't compile, but all functions are there. Going to convince everything to use the new exec sessions, then write unit tests for the DB.

@mheon mheon force-pushed the begin_exec_rework branch from 4204f35 to 46aeb44 Compare February 12, 2020 20:34
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL labels Feb 12, 2020
@mheon
Copy link
Member Author

mheon commented Feb 12, 2020

Exec API has been rewritten to use the new structure.
Next steps: rewrite remainder of Libpod and cmd/podman to use the new APIs, write unit tests for DB work, implement missing APIs (Inspect, Resize).

@mheon mheon force-pushed the begin_exec_rework branch from 46aeb44 to f304b6a Compare February 14, 2020 20:00
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 14, 2020
@mheon mheon force-pushed the begin_exec_rework branch from f304b6a to 96a5bd4 Compare February 14, 2020 20:05
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 14, 2020
@mheon
Copy link
Member Author

mheon commented Feb 14, 2020

Libpod work mostly done. It compiles, even.

Next steps: need to rewrite resize code, write unit tests for DB code.

@mheon mheon force-pushed the begin_exec_rework branch 6 times, most recently from 5e12f80 to 8cb75a8 Compare February 14, 2020 22:35
@mheon mheon force-pushed the begin_exec_rework branch from 8cb75a8 to 3395e88 Compare February 25, 2020 20:41
@mheon
Copy link
Member Author

mheon commented Feb 25, 2020

Added resizing. Going to chase down test failures, add some unit tests, then we can merge.

@mheon mheon force-pushed the begin_exec_rework branch from 3395e88 to 1202f9d Compare February 25, 2020 21:11
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 29, 2020
@mheon mheon force-pushed the begin_exec_rework branch from 1202f9d to e328c56 Compare March 2, 2020 16:29
@mheon
Copy link
Member Author

mheon commented Mar 2, 2020

Rebased, healthcheck errors should be fixed. Stripping WIP, should be ready for merge.


sessionExists := execBucket.Get(sessionID)
if sessionExists == nil {
return define.ErrNoSuchExecSession
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap with id?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up to you i guess, if the caller wraps it, then ok too

return errors.Wrapf(define.ErrCtrExists, "container %s has active exec sessions: %s", ctr.ID(), strings.Join(sessions, ", "))
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block and the previous block at 800 appear to be the same and also have the same error message. any chance we could stub that out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually they're completely redundant. Oops.

}

if session.State != define.ExecStateRunning {
return errors.Wrapf(define.ErrExecSessionStateInvalid, "can only stop running exec sessions, while container %s session %s state is %q", c.ID(), session.ID(), session.State.String())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% is not running, current state %s ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@baude
Copy link
Member

baude commented Mar 9, 2020

lgtm, couple of things to think about

@mheon mheon force-pushed the begin_exec_rework branch 5 times, most recently from 328d53b to 8e180ab Compare March 16, 2020 03:23
mheon added 3 commits March 18, 2020 11:02
As part of the rework of exec sessions, we want to split Create
and Start - and, as a result, we need to keep everything needed
to start exec sessions in the struct, not just the bare minimum
for tracking running ones.

Signed-off-by: Matthew Heon <[email protected]>
As part of the rework of exec sessions, we need to address them
independently of containers. In the new API, we need to be able
to fetch them by their ID, regardless of what container they are
associated with. Unfortunately, our existing exec sessions are
tied to individual containers; there's no way to tell what
container a session belongs to and retrieve it without getting
every exec session for every container.

This adds a pointer to the container an exec session is
associated with to the database. The sessions themselves are
still stored in the container.

Exec-related APIs have been restructured to work with the new
database representation. The originally monolithic API has been
split into a number of smaller calls to allow more fine-grained
control of lifecycle. Support for legacy exec sessions has been
retained, but in a deprecated fashion; we should remove this in
a few releases.

Signed-off-by: Matthew Heon <[email protected]>
This produces detailed information about the configuration of an
exec session in a format suitable for the new HTTP API.

Signed-off-by: Matthew Heon <[email protected]>
@mheon mheon force-pushed the begin_exec_rework branch from 8e180ab to e89c638 Compare March 18, 2020 15:02
@mheon
Copy link
Member Author

mheon commented Mar 18, 2020

Rebased on top of fixed Conmon. Fingers are crossed.

@rhatdan
Copy link
Member

rhatdan commented Mar 18, 2020

@mheon is this the same issue you were seeing or a different one?

@mheon
Copy link
Member Author

mheon commented Mar 18, 2020

Tests are (amazingly) passing now, save a network flake.

@TomSweeneyRedHat @rhatdan @vrothberg PTAL, let's see about getting this landed

@mheon
Copy link
Member Author

mheon commented Mar 18, 2020

@rhatdan Different issue, looks like a flake in the APIv2 tests

@rhatdan
Copy link
Member

rhatdan commented Mar 18, 2020

@mheon Do we want to get this into podman 1.8.2 or wait until we ship it?

@rhatdan
Copy link
Member

rhatdan commented Mar 18, 2020

LGTM

@TomSweeneyRedHat
Copy link
Member

LGTM and happy green tests buttons.

@rhatdan
Copy link
Member

rhatdan commented Mar 18, 2020

/hold
/lgtm
I want @mheon to cancel the hold if he wants it in the 1.8.2 release or afterwards.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 18, 2020
@mheon
Copy link
Member Author

mheon commented Mar 19, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 19, 2020
@openshift-merge-robot openshift-merge-robot merged commit aa6c8c2 into containers:master Mar 19, 2020
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants