You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kurtosis should always fully clean up after itself.
The original clean logic was written like so:
Use the presence/absence of -a to decide which enclaves ought to remain (if -a is specified, no resources should remain)
After finding the enclaves-to-remain, find and kill all Kurtosis resources that aren't in the enclave-to-remain set in the proper order (containers -> volumes -> networks). We DON'T say "look for enclave networks, and then remove resources associated with them" because if the enclave network is removed without all the resources being removed for any reason (e.g. maybe a failure happened), then we permanently leak resources.
In other words, we first grab ALL Kurtosis resources, default them all to "destroy this resource", and then remove the ones that should be preserved. This guarantees that we process every single resource Kurtosis can create.
This algorithm is very important, because it guarantees that even if an enclave is in an inconsistent state (e.g. an error happened, or maybe the user messed with Docker), all the resources for the enclave can still get fully cleaned up.
I suspect that the logs-collector cleanup has not been written this way, and is instead written with, "first try to remove the enclave network and all the containers in it, then try to remove the logs-collector". This means that if a failure happens (e.g. because of Ctrl-C), then the enclave network is lost but the logs-collector hasn't been cleaned up.
What is the severity of this bug?
Critical; I am blocked and Kurtosis is unusable for me because of this bug.
What area of the product does this pertain to?
CLI: the Command Line Interface
The text was updated successfully, but these errors were encountered:
Confirming that the clean logic is as you described it should be (at least on the surface). I confirmed by ensuring that the log collector was one of the containers removed in the destroyContainersInEnclaves step that happens before destroying volumes then destroying the enclave networks.
Going to continue debugging what may be causing the leaked logs collectors.
## Description:
Fixes: #1833
The resource leak was being caused by a state where if a failure
occurred while creating an enclave (eg. ctrl+C while running `kurtosis
enclave add`) at a point where the log collector container was created
BEFORE it had been connected to the network AND before the necessary
`defer undo`s to clean up the log collector had been queued THEN the
network would get cleaned up by the `defer undo` from `CreateNetwork`,
but the log collector container would remain.
Any attempt to do a `kurtosis clean -a` would fail to clean log
collector container because the network(and thus enclave) the container
was created for had already been cleaned up.
After digging - I realized the log collector was being created in
`container-engine-lib` as opposed to in the `engine` Moving the logic to
the engine AFTER the enclave is created fixes the issue. If there is an
error at ANY point in the creation of the log collector container (even
if the log collectors `defer undo` hasn't been queued), the `defer undo`
from the `CreateEnclave` call will clean the log collector because it
has a label with the `enclaveUUID`.
## Is this change user facing?
NO
## References:
#1833
What's your CLI version?
0.85.29
Description & steps to reproduce
My local Docker-backed Kurtosis is empty:
kurtosis clean -a
doesn't clean anything, as expected.However, there are two zombie logs-collector containers hanging around that don't go away:
These do not correspond to any enclave networks, as there are none (as expected):
More details in #1832
Desired behavior
Kurtosis should always fully clean up after itself.
The original
clean
logic was written like so:-a
to decide which enclaves ought to remain (if-a
is specified, no resources should remain)In other words, we first grab ALL Kurtosis resources, default them all to "destroy this resource", and then remove the ones that should be preserved. This guarantees that we process every single resource Kurtosis can create.
This algorithm is very important, because it guarantees that even if an enclave is in an inconsistent state (e.g. an error happened, or maybe the user messed with Docker), all the resources for the enclave can still get fully cleaned up.
I suspect that the
logs-collector
cleanup has not been written this way, and is instead written with, "first try to remove the enclave network and all the containers in it, then try to remove the logs-collector". This means that if a failure happens (e.g. because of Ctrl-C), then the enclave network is lost but the logs-collector hasn't been cleaned up.What is the severity of this bug?
Critical; I am blocked and Kurtosis is unusable for me because of this bug.
What area of the product does this pertain to?
CLI: the Command Line Interface
The text was updated successfully, but these errors were encountered: