Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volumes wont be deleted since update to docker 23 #4028

Closed
Emporea opened this issue Feb 6, 2023 · 55 comments
Closed

volumes wont be deleted since update to docker 23 #4028

Emporea opened this issue Feb 6, 2023 · 55 comments

Comments

@Emporea
Copy link

Emporea commented Feb 6, 2023

Since the update to docker 23 unused volumes will not be deleted anymore with docker volume prune nor docker system prune --volumes.

The answer is always Total reclaimed space: 0B.
When I delete the volumes manually I even get these warning when running docker-compose.yml
Error response from daemon: open /var/lib/docker/volumes/docker_volume1/_data: no such file or directory

Whats happening?

@jcmcken
Copy link

jcmcken commented Feb 7, 2023

Also seeing this issue. We had to revert to 20.10 because it was filling up disks with no way to recover and causing outages.

@jcmcken
Copy link

jcmcken commented Feb 7, 2023

Also seeing this issue. We had to revert to 20.10 because it was filling up disks with no way to recover and causing outages.

It seems like it's this change in 23 (mentioned in the 23 changelog). When I run this all=true filter on docker volume prune it works. But docker system prune does not accept this filter, so now seems broken

Not clear why this default needed to change

@cpuguy83
Copy link
Collaborator

cpuguy83 commented Feb 7, 2023

Yes this sounds like it is related to the mentioned change.
The default change allows us to:

  1. Differentiate an anonymous volume from a named one
  2. Make docker volume prune safer to execute. Given that named volumes typically are named so they can be easily referenced

It does mean that upgrading causes volumes which were created prior to 23.0.0 to not be considered for pruning except for when specifying --all.
However "anonymous" volumes created after 23.0.0 will be considered for pruning... and of course again --all gives the old behavior.

Also if your client uses the older API version it will also get the old behavior.

@Emporea
Copy link
Author

Emporea commented Feb 7, 2023

Do you suggest to delete everything and recreate every volume ( obv. after backing up to restore after they have been recreated) to get rid of old obsolete configs?

@cpuguy83
Copy link
Collaborator

cpuguy83 commented Feb 7, 2023

@Emporea docker volume prune --all should give the old behavior. I understand docker system prune doesn't support this yet (not intentionally).

Do you suggest to delete everything and recreate every volume

No, I don't think that should be necessary, it would likely be simpler to use docker volume prune --filter all=1 in addition to docker system prune until your older volumes are no longer in use.

You can also use DOCKER_API_VERSION=1.40 docker system prune --volumes to get the old behavior.

@Emporea
Copy link
Author

Emporea commented Feb 7, 2023

root@server ~ [125]# docker volume prune --all
unknown flag: --all
See 'docker volume prune --help'.

root@server ~ [125]# docker --version
Docker version 23.0.0, build e92dd87

@cpuguy83
Copy link
Collaborator

cpuguy83 commented Feb 7, 2023

Sorry I gave the wrong command in the first line there, docker volume prune --filter all=1

@Emporea
Copy link
Author

Emporea commented Feb 7, 2023

Thank you. this works and helps me for now.

@jcmcken
Copy link

jcmcken commented Feb 8, 2023

The default change allows us to:

1. Differentiate an anonymous volume from a named one

2. Make `docker volume prune` safer to execute. Given that named volumes typically are named so they can be easily referenced

I guess that's fine reasoning, but the execution of this change was very strange for multiple reasons:

  • The CLI help output doesn't explain anything about these distinctions. (Still says "Remove all unused local volumes", nothing about anonymous vs named)
  • The documentation does not explain anything about these distinctions. (Same as above). We had to look in a specific change log to find the information.
  • As mentioned (and acknowledged by you), this change was not actually propagated to other parts of the CLI.
  • The previous default behavior is not a flag or something obvious to re-enable, but something non-obvious (set env vars? set a particular filter?). Seems like when you change something like this, "principal of least surprise" should apply. It seems like it would've been more conservative to at least use deprecation warnings over multiple releases, or not change it at all and add the 'anonymous' distinction as an opt-in flag.
  • The behavior simply isn't obvious. I don't care about anonymous volumes vs named volumes, for one. A novice Docker user is certainly not going to care or understand why the prune command isn't deleting their volume. I think trying to save the user from themselves using non-explicit behavior is awfully confusing. I think at the very least, the previous behavior either needs to be explicitly mentioned in the help output or provided as a standalone flag. That way, if a Docker user sees a volume isn't getting cleaned up, they can run -h, and hopefully notice mention of named vs anonymous volumes.

@neersighted
Copy link
Member

neersighted commented Feb 8, 2023

The CLI help output doesn't explain anything about these distinctions. (Still says "Remove all unused local volumes", nothing about anonymous vs named)

Yeah, the help needs to change to say 'anonymous.'

The documentation does not explain anything about these distinctions. (Same as above). We had to look in a specific change log to find the information.

Same as the help output, PRs welcome.

Seems like when you change something like this, "principal of least surprise" should apply.
I don't care about anonymous volumes vs named volumes, for one.

I'd like to point out you're a tiny minority there -- for the vast majority of users, "Docker deleted my data after I ran system prune -a" has been a sharp edge for years. Most users expect prune to 'clean up garbage,' not 'clean up the things I wanted Docker to keep.'

As mentioned (and acknowledged by you), this change was not actually propagated to other parts of the CLI.

The only part where this possibly needs to propagate is docker system prune -a and we're still not sure what the better behavior is.

That way, if a Docker user sees a volume isn't getting cleaned up, they can run -h, and hopefully notice mention of named vs anonymous volumes.

Agreed, there should be an example of using the all filter in the help text.


Please keep in mind that this has been a persistent pain for educators, commercial support, and undercaffinated experts for years. People are here in this thread because they find the behavior change surprising, and yeah, it looks like review missed the docs updates needed (and this is because of the historical incorrect split of client/server logic we are still cleaning up) -- however, please keep in mind that this thread represents the minority of users who find this behavior unexpected or problematic.

We certainly can improve here, and there are a lot of valid points, but the happy path for the majority of users is to stop pruning named volumes by default.

@neersighted
Copy link
Member

Also, as an aside: the behavior here changed in the daemon (and is dependent on API version) -- an older CLI against a current daemon will see the old behavior, and the new CLI against an older daemon will also see the old behavior.

So as we look at improving the docs & help output, we need to figure out how to explain that difference coherently.

@thaJeztah
Copy link
Member

This issue is not directly related to Docker Desktop for Linux; probably would've been best in the https://github.com/moby/moby issue tracker, but I can't move it there because that's in a different org.

Let me move this to the docker/cli repository, where another thread was started in #4015

@Re4zOon
Copy link

Re4zOon commented Mar 10, 2023

Hi,

For some reason anonymous volumes are not deleted by prune on our systems since the change:

[root@stuff~]# docker volume ls
DRIVER    VOLUME NAME
[root@stuff~]# cat asd.yml
version: '3'
services:

  redis_test:
    image: redis:alpine

  mysql_test:
    image: mysql:8
    environment:
      MYSQL_ROOT_PASSWORD: test
[root@stuff~]# docker-compose -f asd.yml up -d
Creating network "root_default" with the default driver
Creating root_redis_test_1 ... done
Creating root_mysql_test_1 ... done
[root@stuff~]# docker-compose -f asd.yml down
Stopping root_mysql_test_1 ... done
Stopping root_redis_test_1 ... done
Removing root_mysql_test_1 ... done
Removing root_redis_test_1 ... done
Removing network root_default
[root@stuff~]# docker volume ls
DRIVER    VOLUME NAME
local     6cb48bf0d12f5f9ec6ed0fe4a881a88690d17990ebc43acf16b5266b2a2cc7c3
local     6209ed575c411062992e3ea3e66ba496735346945602ff2a02a31566b2d381ed
[root@stuff~]# docker system prune -af --volumes
Deleted Images:
untagged: mysql:8
untagged: mysql@sha256:c7788fdc4c04a64bf02de3541656669b05884146cb3995aa64fa4111932bec0f
deleted: sha256:db2b37ec6181ee1f367363432f841bf3819d4a9f61d26e42ac16e5bd7ff2ec18
[...]
untagged: redis:alpine
untagged: redis@sha256:b7cb70118c9729f8dc019187a4411980418a87e6a837f4846e87130df379e2c8
deleted: sha256:1690b63e207f6651429bebd716ace700be29d0110a0cfefff5038bb2a7fb6fc7
[...]

Total reclaimed space: 577.6MB
[root@stuff~]# docker volume ls
DRIVER    VOLUME NAME
local     6cb48bf0d12f5f9ec6ed0fe4a881a88690d17990ebc43acf16b5266b2a2cc7c3
local     6209ed575c411062992e3ea3e66ba496735346945602ff2a02a31566b2d381ed
[root@stuff~]# docker volume prune
WARNING! This will remove all local volumes not used by at least one container.
Are you sure you want to continue? [y/N] y
Total reclaimed space: 0B
[root@stuff~]# docker volume ls
DRIVER    VOLUME NAME
local     6cb48bf0d12f5f9ec6ed0fe4a881a88690d17990ebc43acf16b5266b2a2cc7c3
local     6209ed575c411062992e3ea3e66ba496735346945602ff2a02a31566b2d381ed

Doesnt matter what kind of prune I run, the only one that works is the above mentioned filter one (that removes named volumes).

@cpuguy83
Copy link
Collaborator

What happens if you manually try to remove it?
What is the result of docker inspect <volume>?

Also keep in mind, prune will not prune volumes created before the upgrade (unless you set --filter all=1) since there was no way to know if a volume is an "anonymous" volume or not.

@Re4zOon
Copy link

Re4zOon commented Mar 11, 2023

Hi,

I just created these volumes with docker-compose (as you can see at the first line, no volume there).
The system is up-to-date, latest docker-ce.
Here is inspect output:

[root@stuff~]# docker inspect 1b957379dddae8abc21c3f469c966d48d83ecd54cd389c89bda0324739b18653
[
    {
        "CreatedAt": "2023-03-11T13:49:09+01:00",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/1b957379dddae8abc21c3f469c966d48d83ecd54cd389c89bda0324739b18653/_data",
        "Name": "1b957379dddae8abc21c3f469c966d48d83ecd54cd389c89bda0324739b18653",
        "Options": null,
        "Scope": "local"
    }
]

@sudo-bmitch
Copy link
Contributor

docker-compose is likely using an older API. Try creating the volumes from docker compose (space rather than dash) instead.

@mathieu-lemay
Copy link

mathieu-lemay commented Mar 11, 2023

@sudo-bmitch I'm getting the same behavior as described by @Re4zOon with docker compose.

Edit: With plain docker too. I can reproduce with docker run -e MARIADB_ROOT_PASSWORD=root mariadb for example. If I stop and remove the container, the volume stays and prune won't delete it, unless I specify --filter all=1.

$ docker volume inspect b8ef83d3ed925b7d92cfac80a095b047f20e8487749d3a30cd6700c17318ff95
[
    {
        "CreatedAt": "2023-03-11T13:54:58-08:00",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/b8ef83d3ed925b7d92cfac80a095b047f20e8487749d3a30cd6700c17318ff95/_data",
        "Name": "b8ef83d3ed925b7d92cfac80a095b047f20e8487749d3a30cd6700c17318ff95",
        "Options": null,
        "Scope": "local"
    }
]

$ docker volume prune -f
Total reclaimed space: 0B

$ docker volume prune -f --filter all=1
Deleted Volumes:
b8ef83d3ed925b7d92cfac80a095b047f20e8487749d3a30cd6700c17318ff95

Total reclaimed space: 156.5MB

Output of docker system info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  0.10.3
    Path:     /usr/lib/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  2.16.0
    Path:     /usr/lib/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 49
 Server Version: 23.0.1
 Storage Driver: btrfs
  Btrfs:
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1e1ea6e986c6c86565bc33d52e34b81b3e2bc71f.m
 runc version:
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.2.2-zen1-1-zen
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 20
 Total Memory: 31.03GiB
 Name: xenomorph
 ID: M6ZE:N2VB:3P6Z:7V55:H6W7:KQJA:QESD:MUPC:T762:4ENW:KUUX:2WU3
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

@cpuguy83
Copy link
Collaborator

Thanks for the reports.
I found the bug and will post a patch momentarily.

@cpuguy83
Copy link
Collaborator

cpuguy83 commented Mar 11, 2023

moby/moby#45147 should fix these cases.

--- edit --

To clarify, it should fix the case where a volume is created from the image config.

edmorley added a commit to edmorley/docker-cli that referenced this issue Aug 12, 2023
In previous versions of the Docker API, `system prune --volumes` and `volume prune`
would remove all dangling volumes. With API v1.42, this was changed so that only
anonymous volumes would be removed unless the all filter was specified.

Some of the docs were updated in docker#4218, however, there were a couple of places
left that didn't make the anonymous vs named volumes distinction clear.

This replaces docker#4079, which was bitrotted by docker#4218. See also docker#4028.

Closes docker#4079.

Signed-off-by: Ed Morley <[email protected]>
thaJeztah pushed a commit to thaJeztah/cli that referenced this issue Aug 25, 2023
In previous versions of the Docker API, `system prune --volumes` and `volume prune`
would remove all dangling volumes. With API v1.42, this was changed so that only
anonymous volumes would be removed unless the all filter was specified.

Some of the docs were updated in docker#4218, however, there were a couple of places
left that didn't make the anonymous vs named volumes distinction clear.

This replaces docker#4079, which was bitrotted by docker#4218. See also docker#4028.

Closes docker#4079.

Signed-off-by: Ed Morley <[email protected]>
(cherry picked from commit 6e2e92d)
Signed-off-by: Sebastiaan van Stijn <[email protected]>
@b-enoit-be
Copy link

It would be nice if the change would be cascaded into system prune and system df as raised by others.
prune still claims it removes all volumes not linked to any containers if passed --volumes.
df, similarly list those volumes as reclaimable space, which should now be taken with a grain of salt.

@neersighted
Copy link
Member

#4497 was accepted and cherry-picked, which addresses the docs/--help issue. It is intentional that system prune -a no longer affects named volumes; I think a --really-prune-everything flag is out of scope for this issue, but feel free to open a feature request if you think that it is useful in the 90% case. My earlier comments are still quite relevant, I think:

system prune -a is a command often fired indiscriminately, and has lead to much data loss and gnashing of teeth. While having to run two commands is a mild pain, it helps with preventing frustration and loss of data for new users copying commands out of tutorials.
We can certainly explore a system prune --all=with-named-volumes or something in the future for users who understand exactly what they are doing, but currently the need to run a separate command is by design

I'm going to close this for now, as the last set of sharp edges identified here are solved (and will be in the next patch release), but please feel free to continue the discussion or open that follow-up feature request.

@b-enoit-be
Copy link

b-enoit-be commented Aug 31, 2023

@neersighted just to be precise though, from what I know the -a, --all in system prune affect only images. For volumes, there is another option/flag, which is --volumes.
Thinking about it, I could definitely see a --volumes that would default to --volumes anonymous and a --volumes all allowing to cascade the need to delete named volume to the docker volume prune command, here.
But I also see the place where this "better safe than sorry" change is coming.

@wodyjowski
Copy link

@neersighted this is the worst update in docker history.
We've just spent 3h debugging duplicate keys in DB because for some reason docker system prune --volumes
suddenly stopped pruning the volumes.
I guess some people copy rm -rf / from stackoverflow and get an unpleasant suprise but if the commad is basically named docker remove everything please and asks you to type y if you're sure, you deserve to type your 5 SQL statements again.
It is also a pretty momorable lesson to not host production persistent stores in a container.

@itslooklike
Copy link

I spent half a day today on this, I expected that my volumes were being deleted... thank you for changing the behavior and not sending a message to the console that it no longer works as before (without negativity)

@wodyjowski
Copy link

Every time I have to type two commands to test my deployment I think about this thread 🔥

@drluckyspin
Copy link

Most annoying bug ever! At least mark it deprecated or something.

@boneitis
Copy link

boneitis commented Oct 31, 2023

I'm trying to wrap my head around what's going on. Did this behavior get rolled back?

In the environments I manage, there is a machine where our old script seems to be working as originally desired, and I noticed it is running version 24.0.2 / API version 1.43. Of course, the nodes with 23 / API 1.42 still exhibit the need to provide --filter all=true.

I'm not seeing anything that specifically mentions this issue in the Engine 24 or API 1.43 release notes.

@boneitis
Copy link

boneitis commented Oct 31, 2023

Seems like things may have got really whacked out, or I am whacked out trying to interpret the docs.

The 1.42 API (as reported upthread here) only considers anonymous volumes for deletion on a docker volume prune. To my understanding, this applies to volumes where the first param is omitted from -v, so it is no surprise that we are getting a lot of dangling volumes left behind. It also seems that the warning message was left unchanged:

root@fooHostnameDocker23:~# docker version | grep 'API version'
 API version:       1.42
  API version:      1.42 (minimum version 1.12)
root@fooHostnameDocker23:~# docker volume prune
WARNING! This will remove all local volumes not used by at least one container.
Are you sure you want to continue? [y/N] 

On the node with Docker 24/API 1.43, the warning message has been updated to match the behavior of the change rolled out in Docker 23/API 1.42; however, the actual behavior seems to have been rolled back and actually prunes our named volumes as we've desired all along:

root@fooHostnameDocker24:~# docker version | grep API
 API version:       1.43
  API version:      1.43 (minimum version 1.12)
root@fooHostnameDocker24:~# docker volume prune
WARNING! This will remove anonymous local volumes not used by at least one container.
Are you sure you want to continue? [y/N]

I'm just illustrating the warning message, not the pruning output. I am fully certain that our Docker 24 node still works as we'd like when typing docker volume prune to purge our named volumes, despite the purported changes.

I did also manage to find another mention of --filter all=true not being needed after 23 over at: #4218 (comment)

So, is this a bug? Is the "backport" referring to the -a option? Rolling back the behavior? Something else?

😵‍💫

I'm really sorry if I'm missing something here.

@cpuguy83
Copy link
Collaborator

@boneitis It definitely has not changed.

Note: The daemon cannot protect volumes that were created before version 23.

@boneitis
Copy link

Thank you @cpuguy83 for the response; the note is helpful to know.

However, our deployments, including our machine with Docker 24, fires up around a dozen containers over the course of a day.

All of the what-I-understand-to-be-a-variant-of-"named" volumes get purged (on the version 24 node) without the filter-all parameter, as they did pre-23, whereas the machines with Docker 23 still require it.

That is, the mounts of which "Type" is volume, "Name" is the 64-character hex string, and the "Source" is /var/lib/docker/volumes/<hex string>/_data are the ones that are left dangling, counter to our intended behavior (on version 23). I understand (maybe incorrectly?) that these are named volumes, which are not expected to be purged on 24?

@cpuguy83
Copy link
Collaborator

If its a hex string then it is most likely not a named volume (unless you have something generating strings when you create the volume).
You should be able to inspect the volume and check for the label: com.docker.volume.anonymous.
If this key exists (value is unused) then it may be collected by docker volume prune.

@havardox
Copy link

havardox commented Nov 23, 2023

Please, I just need a bash script to delete everything. No dangling images, "volumes not used by at least one container", or whatever. Everything. What sequence of commands do I need to run to reset Docker to a completely clean slate?

archlinux-github pushed a commit to archlinux/infrastructure that referenced this issue Nov 25, 2023
"docker system prune --volumes" does no longer prune named volumes in
Docker 23.0[1][2], so use "docker volume prune --all"[3] for pruning
named volumes.

[1] docker/cli#4028
[2] moby/moby#44259
[3] docker/cli#4229
@robcresswell
Copy link

@havardox

docker system prune --all --volumes --force && docker volume prune --all --force

I have that aliased as

alias dprune='docker system prune --all --volumes --force && docker volume prune --all --force'

@kuchaguangjie
Copy link

kuchaguangjie commented Mar 17, 2024

With Docker version 25.0.4, build 1a576c5.
In my test docker volume prune won't even remove unused anonymous volumes.
I've to add -a option to remove it.

@boneitis
Copy link

boneitis commented Mar 18, 2024

It's been a while since I had looked at this. My best interpretation to the best of my memory was that there were a few pieces that weren't all rolled out at the same time, which resulted in dangling volumes for me. Something like, if a volume was created pre-23, then they would resist pruning attempts on 23 due to missing some metadata label that you'd expect to see from docker inspect. Or maybe it was volume creation during 23, and pruning attempts post-23. Something along those lines.

All I remember was that once 23 was far enough back in the rear-view mirror and my production environments were eventually able to create everything (images, containers, artifacts, whathaveyou) afresh from Docker versions no earlier than 24, everything worked with the right single pruning command scripted, without having to accommodate edge cases in versioning or manually intervene with getting the incantation right with the correct flags.

Probably the easiest way to roll with it if you're still stuck in the mire of pre-23 post-23 mixed deployments is to peg your Docker Engine API version on your machines to an older value.

@cpuguy83
Copy link
Collaborator

@boneitis You are absolutely correct.

@mattwelke
Copy link

The info from @boneitis was helpful. We had a bunch of volumes that weren't being removed with docker volume prune. We had just upgrades Docker a few days earlier, so upgrading Docker may have been related. These volumes were just used by build steps so we didn't care about them. We listed all of them and deleted them one by one, resolving our disk space issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet