Begin exec rework #5088

mheon · 2020-02-04T22:07:38Z

As part of the rework of exec sessions, we need to address them independently of containers. In the new API, we need to be able to fetch them by their ID, regardless of what container they are associated with. Unfortunately, our existing exec sessions are tied to individual containers; there's no way to tell what container a session belongs to and retrieve it without getting every exec session for every container.

I originally debated just adding a registry to the database to point individual exec sessions at each container, but if we're going so far as to completely restructure exec sessions, it makes sense to store them separately as well. Existing exec sessions will be retained as-is for legacy purposes, while the new-style exec sessions will be stored separately from containers and accessed via their own API endpoints.

There's some potential performance impact to be discussed here in that we'll now need one DB call to get each exec session, and then one call per exec session to update its state, whenever we need to do an operation that needs to check if there are running exec sessions. I'm also concerned as to the circumstances of them remaining in the database; I need to verify on the Docker side as to when they are cleaned (only on reboot? on container restart as well? when the exec session finishes?).

This is very early; I'm only about halfway through DB changes, which will then require a refactor of existing exec code to use the new fields in the DB, and associated changes to the lifecycle of exec sessions.

TODO:

Finish updating database for new storage scheme of exec sessions
Update Exec() API in container_exec.go to use new exec session storage
Create function for checking if any exec sessions are running, and use this to replace all existing checks of that type
Update container removal to deal with new exec sessions
Investigate when exec sessions are removed from the database and update our code to follow that lifecycle
Investigate cleanup processes for exec sessions so that we can support detached exec

openshift-ci-robot · 2020-02-04T22:42:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mheon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mheon · 2020-02-05T20:33:30Z

Update: looking more into exec sessions in Docker, it looks like they are removed immediately after termination - probably just the exit code is retrieved via Inspect. We will need the ability to autoremove exec sessions for detached containers. I still see some potential for leaking, but we can probably counter that by clearing them on container restart.

mheon · 2020-02-07T19:44:36Z

DB changes should be complete. Still doesn't compile, but all functions are there. Going to convince everything to use the new exec sessions, then write unit tests for the DB.

mheon · 2020-02-12T20:35:53Z

Exec API has been rewritten to use the new structure.
Next steps: rewrite remainder of Libpod and cmd/podman to use the new APIs, write unit tests for DB work, implement missing APIs (Inspect, Resize).

mheon · 2020-02-14T20:06:36Z

Libpod work mostly done. It compiles, even.

Next steps: need to rewrite resize code, write unit tests for DB code.

mheon · 2020-02-25T20:41:45Z

Added resizing. Going to chase down test failures, add some unit tests, then we can merge.

mheon · 2020-03-02T16:30:15Z

Rebased, healthcheck errors should be fixed. Stripping WIP, should be ready for merge.

baude · 2020-03-09T17:17:42Z

libpod/boltdb_state.go

+
+		sessionExists := execBucket.Get(sessionID)
+		if sessionExists == nil {
+			return define.ErrNoSuchExecSession


wrap with id?

up to you i guess, if the caller wraps it, then ok too

baude · 2020-03-09T17:20:33Z

libpod/boltdb_state_internal.go

+			return errors.Wrapf(define.ErrCtrExists, "container %s has active exec sessions: %s", ctr.ID(), strings.Join(sessions, ", "))
+		}
+	}
+


this block and the previous block at 800 appear to be the same and also have the same error message. any chance we could stub that out?

Actually they're completely redundant. Oops.

baude · 2020-03-09T17:23:33Z

libpod/container_exec.go

+	}
+
+	if session.State != define.ExecStateRunning {
+		return errors.Wrapf(define.ErrExecSessionStateInvalid, "can only stop running exec sessions, while container %s session %s state is %q", c.ID(), session.ID(), session.State.String())


% is not running, current state %s ?

baude · 2020-03-09T17:25:44Z

lgtm, couple of things to think about

As part of the rework of exec sessions, we want to split Create and Start - and, as a result, we need to keep everything needed to start exec sessions in the struct, not just the bare minimum for tracking running ones. Signed-off-by: Matthew Heon <[email protected]>

As part of the rework of exec sessions, we need to address them independently of containers. In the new API, we need to be able to fetch them by their ID, regardless of what container they are associated with. Unfortunately, our existing exec sessions are tied to individual containers; there's no way to tell what container a session belongs to and retrieve it without getting every exec session for every container. This adds a pointer to the container an exec session is associated with to the database. The sessions themselves are still stored in the container. Exec-related APIs have been restructured to work with the new database representation. The originally monolithic API has been split into a number of smaller calls to allow more fine-grained control of lifecycle. Support for legacy exec sessions has been retained, but in a deprecated fashion; we should remove this in a few releases. Signed-off-by: Matthew Heon <[email protected]>

This produces detailed information about the configuration of an exec session in a format suitable for the new HTTP API. Signed-off-by: Matthew Heon <[email protected]>

mheon · 2020-03-18T15:15:35Z

Rebased on top of fixed Conmon. Fingers are crossed.

rhatdan · 2020-03-18T17:26:37Z

@mheon is this the same issue you were seeing or a different one?

mheon · 2020-03-18T18:04:24Z

Tests are (amazingly) passing now, save a network flake.

@TomSweeneyRedHat @rhatdan @vrothberg PTAL, let's see about getting this landed

mheon · 2020-03-18T18:04:49Z

@rhatdan Different issue, looks like a flake in the APIv2 tests

rhatdan · 2020-03-18T18:42:27Z

@mheon Do we want to get this into podman 1.8.2 or wait until we ship it?

rhatdan · 2020-03-18T18:48:14Z

LGTM

TomSweeneyRedHat · 2020-03-18T19:29:46Z

LGTM and happy green tests buttons.

rhatdan · 2020-03-18T19:45:01Z

/hold
/lgtm
I want @mheon to cancel the hold if he wants it in the 1.8.2 release or afterwards.

mheon · 2020-03-19T21:05:50Z

/hold cancel

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL labels Feb 4, 2020

openshift-ci-robot requested review from giuseppe and mrunalp February 4, 2020 22:42

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2020

mheon force-pushed the begin_exec_rework branch from 21e3a65 to 4204f35 Compare February 7, 2020 19:43

mheon force-pushed the begin_exec_rework branch from 4204f35 to 46aeb44 Compare February 12, 2020 20:34

openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL labels Feb 12, 2020

mheon force-pushed the begin_exec_rework branch from 46aeb44 to f304b6a Compare February 14, 2020 20:00

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 14, 2020

mheon force-pushed the begin_exec_rework branch from f304b6a to 96a5bd4 Compare February 14, 2020 20:05

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 14, 2020

mheon force-pushed the begin_exec_rework branch 6 times, most recently from 5e12f80 to 8cb75a8 Compare February 14, 2020 22:35

mheon force-pushed the begin_exec_rework branch from 8cb75a8 to 3395e88 Compare February 25, 2020 20:41

mheon force-pushed the begin_exec_rework branch from 3395e88 to 1202f9d Compare February 25, 2020 21:11

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 29, 2020

mheon force-pushed the begin_exec_rework branch from 1202f9d to e328c56 Compare March 2, 2020 16:29

baude reviewed Mar 9, 2020

View reviewed changes

mheon force-pushed the begin_exec_rework branch 5 times, most recently from 328d53b to 8e180ab Compare March 16, 2020 03:23

mheon added 3 commits March 18, 2020 11:02

Add inspect for exec sessions

e89c638

This produces detailed information about the configuration of an exec session in a format suitable for the new HTTP API. Signed-off-by: Matthew Heon <[email protected]>

mheon force-pushed the begin_exec_rework branch from 8e180ab to e89c638 Compare March 18, 2020 15:02

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2020

openshift-ci-robot assigned rhatdan Mar 18, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 18, 2020

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 19, 2020

openshift-merge-robot merged commit aa6c8c2 into containers:master Mar 19, 2020

rh-atomic-bot mentioned this pull request Mar 19, 2020

Attempt manual removal of CNI IP allocations on refresh #5434

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 25, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Begin exec rework #5088

Begin exec rework #5088

mheon commented Feb 4, 2020 •

edited

Loading

openshift-ci-robot commented Feb 4, 2020

mheon commented Feb 5, 2020

mheon commented Feb 7, 2020

mheon commented Feb 12, 2020

mheon commented Feb 14, 2020

mheon commented Feb 25, 2020

mheon commented Mar 2, 2020

baude Mar 9, 2020

baude Mar 9, 2020

baude Mar 9, 2020

mheon Mar 10, 2020

baude Mar 9, 2020

mheon Mar 10, 2020

baude commented Mar 9, 2020

mheon commented Mar 18, 2020

rhatdan commented Mar 18, 2020

mheon commented Mar 18, 2020

mheon commented Mar 18, 2020

rhatdan commented Mar 18, 2020

rhatdan commented Mar 18, 2020

TomSweeneyRedHat commented Mar 18, 2020

rhatdan commented Mar 18, 2020 •

edited

Loading

mheon commented Mar 19, 2020

Begin exec rework #5088

Begin exec rework #5088

Conversation

mheon commented Feb 4, 2020 • edited Loading

openshift-ci-robot commented Feb 4, 2020

mheon commented Feb 5, 2020

mheon commented Feb 7, 2020

mheon commented Feb 12, 2020

mheon commented Feb 14, 2020

mheon commented Feb 25, 2020

mheon commented Mar 2, 2020

baude Mar 9, 2020

Choose a reason for hiding this comment

baude Mar 9, 2020

Choose a reason for hiding this comment

baude Mar 9, 2020

Choose a reason for hiding this comment

mheon Mar 10, 2020

Choose a reason for hiding this comment

baude Mar 9, 2020

Choose a reason for hiding this comment

mheon Mar 10, 2020

Choose a reason for hiding this comment

baude commented Mar 9, 2020

mheon commented Mar 18, 2020

rhatdan commented Mar 18, 2020

mheon commented Mar 18, 2020

mheon commented Mar 18, 2020

rhatdan commented Mar 18, 2020

rhatdan commented Mar 18, 2020

TomSweeneyRedHat commented Mar 18, 2020

rhatdan commented Mar 18, 2020 • edited Loading

mheon commented Mar 19, 2020

mheon commented Feb 4, 2020 •

edited

Loading

rhatdan commented Mar 18, 2020 •

edited

Loading