Should plugin DEL ever error? #309

rosenhouse · 2016-10-17T00:57:57Z

Issue #306 raises an interesting question that applies generally to all plugins: how should runtimes react if an ADD fails?

One possibility is for DEL to be best-effort idempotent. It tries to clean-up any & all left-over resources and swallows any errors that might arise if a resource was never created in the first-place.

Then we could recommend that a runtime call DEL after a failed ADD.

The text was updated successfully, but these errors were encountered:

dcbw · 2016-10-17T16:22:28Z

@rosenhouse I agree. Two points in random order:

openshift-sdn and kubernetes 'kubenet' plugins both have ugly "if err != nil { run teardown code; return }" stuff in ADD error paths to handle cleanup themselves on ADD error. Would be really nice to get rid of that and be able to rely on the runtime calling DEL when an add fail happens. But we should add this as a spec requirement, not just a recommendation. The plugins that conform to say v0.5.0 could drop their icky add error path code and rely on the runtime, and it would be a runtime bug (doesn't conform to 0.5.0 of spec) or a runtime error (fail because plugin doesn't handle older spec version) if teardown didn't happen.
IPAM release/garbage collection (see add ipam garbageCollection kubernetes/kubernetes#34373 and kubenet/kubelet leaks ips on docker restart kubernetes/kubernetes#34278) needs to happen if the runtime gets restarted. It can often be the case that the container is already gone and thus no network namespace can be passed to the teardown. But we still need to release the IPAM allocation because the container is gone, and both skel.go and the plugins currently require the netns and will fail if it's not given, or if it's invalid. Essentially:

kubernetes kubelet runtime does
docker dies (and kills container and network namespace)
docker restarts
kubelet restarts
kubelet figures out the container is no longer alive
calls teardown to clean things up, but no netns exists because the container is gone
IPAM teardown fails because no CNI_NETNS
containers IPAM allocation leaks

bboreham · 2016-10-18T16:07:55Z

We could have CNI_NETNS=="" mean "clean up what you can, but there's no namespace so don't worry about that".

dcbw · 2016-10-18T16:08:41Z

Though we could just do this now by making the internal plugins do best-effort cleanup, without changing the spec... not sure if we'd consider that a material spec change or not.

bboreham · 2016-10-18T16:25:56Z

The phrasing "tries to clean-up any & all left-over resources" - is that meant to mean "... relating to the specific container ID supplied", or more broadly, clean up anything that seems to be left-over? The latter sounds dangerous.

I think I'm ok with best-efforts meaning "I looked for resources assigned to that container ID; didn't find anything; returning success"; I would still like to error on a completely invalid parameter.

dcbw · 2016-10-18T16:28:42Z

@bboreham yes, I would expect that the operation would still be scoped to the specific container ID.

chenchun · 2016-10-21T10:01:56Z

add this as a spec requirement, not just a recommendation

+1 of adding this as a spec requirement, and we could still keep the error return of DEL.
Make DEL return nothing means it can't complain any mistakes during cleanup.

jieyu · 2016-12-29T23:46:08Z

+1 on making this a spec requirement.

It's very hard for container runtime to react if cleanup fails. It has no idea if it should proceed with the rest of the cleanup or not.

bboreham · 2017-03-15T12:17:39Z

The spec was changed in #346 to say "Plugins should generally complete a DEL action without error even if some resources are missing [...] even if the container network namespace no longer exists"

So I think this should be closed.

rosenhouse · 2017-03-15T15:17:02Z

Agreed. Though there is still some outstanding work to make the plugins actually behave this way.

bboreham · 2017-03-15T16:58:33Z

FYI Kubernetes 1.6 beta seems to rely on plugins not erroring on DEL - see weaveworks/weave#2801 (comment)

squeed mentioned this issue Oct 26, 2016

rkt leaks netns if CNI plugin fails rkt/rkt#3114

Closed

bboreham mentioned this issue Nov 14, 2016

Don't release CNI-allocated IP address when container dies weaveworks/weave#2643

Merged

freehan mentioned this issue Dec 14, 2016

Verify the network interface before deleting it kubernetes/kubernetes#36747

Closed

freehan mentioned this issue Dec 27, 2016

api/plugins: implement ordered plugin chaining #346

Merged

freehan mentioned this issue Jan 19, 2017

Should CNI plugins checkpoint information for DEL? #353

Closed

rosenhouse mentioned this issue Jan 24, 2017

State sync/Garbage collection for plugins #338

Open

rosenhouse closed this as completed Mar 15, 2017

tomdee mentioned this issue Mar 16, 2017

Kubelet starts pods before CNI is configured kubernetes/kubernetes#43014

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should plugin DEL ever error? #309

Should plugin DEL ever error? #309

rosenhouse commented Oct 17, 2016 •

edited

Loading

dcbw commented Oct 17, 2016 •

edited

Loading

bboreham commented Oct 18, 2016

dcbw commented Oct 18, 2016

bboreham commented Oct 18, 2016

dcbw commented Oct 18, 2016

chenchun commented Oct 21, 2016

jieyu commented Dec 29, 2016

bboreham commented Mar 15, 2017

rosenhouse commented Mar 15, 2017 •

edited

Loading

bboreham commented Mar 15, 2017

Should plugin DEL ever error? #309

Should plugin DEL ever error? #309

Comments

rosenhouse commented Oct 17, 2016 • edited Loading

dcbw commented Oct 17, 2016 • edited Loading

bboreham commented Oct 18, 2016

dcbw commented Oct 18, 2016

bboreham commented Oct 18, 2016

dcbw commented Oct 18, 2016

chenchun commented Oct 21, 2016

jieyu commented Dec 29, 2016

bboreham commented Mar 15, 2017

rosenhouse commented Mar 15, 2017 • edited Loading

bboreham commented Mar 15, 2017

rosenhouse commented Oct 17, 2016 •

edited

Loading

dcbw commented Oct 17, 2016 •

edited

Loading

rosenhouse commented Mar 15, 2017 •

edited

Loading