-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker ZFS driver creates hundreds of datasets and doesn’t clean them #41055
Comments
No, this breaks |
Yeah, we can't remove containers that are stopped, as the container isn't gone. If your use-case is to started "one-off" containers, but don't care about them after they've exited, you could either start the containers with the |
I understand, this is a complex issue then. Note that this is not my use case, but rather what our users on ubuntu reports. The main issue is that a lot of tools are listing the ZFS datasets and as docker is creating a bunch of them when the user never --rm them (do you have any kind of garbage collection for stopped container that didn’t run for X period of time?) and docker ps doesn’t show them by default, they think that those are gone. Most of users reporting this issue of too many datasets were not even aware this was docker creating them (nothing in the name is explicit) and thought that was another built-in backup functionality. Do you have any idea how we can mitigate this? Maybe the ZFS driver should be explicitely enabled by the user so that they know what their gain is and tradeoffs? |
zsys should just ignore Docker (, containerd, cri-o, and LXD) datasets |
Correct, hence my question, before reading (listing all the properties to get the correct one), how do I know that a given dataset will be a docker one? |
The first step is to hardcode /var/lib/docker. Eventually zsys should define convention to ignore fs that has a specific file, e.g. "/.zsysignore". |
Look at the example I gave above: there is no /var/lib/docker in rpool. Those are set as mountpoints by your driver. However, to get the mountpoint, you have to read the properties, and hence, getting the timeout. ZFS datasets name don’t necessarilly match mountpoint path. All that is even before mounting them (so nothing related to a .zsysignore in a dataset). |
No, docker does not automatically garbage collect those. Creating containers (without starting them), or keeping stopped containers around is used in various use-cases, so we cannot assume those containers can be removed.
The default storage drivers for most situations is |
Yeah, unfortunately (or fortunately ;)), we offer now a very simple way for people to install ZFS system on their ubuntu system and they (obviously from the bugs we received) don’t know that docker is going to use it and even how it uses it. They all experience the same effects: after a while (I think after having started some hundreds of containers without removing them), their whole system is slow to boot (mounting all datasets at boot), zfs and zpool commands are slow and so on. So, this is completely independent of them using ZSys or not. We patched the docker package to migrate them in rpool/var/lib/docker to avoid snapshotting them automatically and thus, creating even more datasets. There is obviously something to fix to avoid this behavior, but as you told, this isn’t obvious. I wonder what’s the gain in the docker case by creating one dataset for each containers (I think the idea is to have the diff between the base image and mount on top of that another dataset). I don’t know either if overlayfs works well on top of a pure ZFS system. |
I'm definitely not a zfs expert, so if anyone knows;
Then that information would be welcome 😅 |
ovl on zfs is here: openzfs/zfs#9414 |
Thanks for linking that! I guess it wouldn't "solve" the full issue if users don't cleanup containers, but we could consider defaulting to overlayfs and make zfs an opt-in option. |
I think this is the best course of action! 👍 |
ovl-on-zfs PR isn't merged yet and unlikely going to be available to 20.04 LTS users, so probably we should have some workaround on zsys side |
@AkihiroSuda: we have a workaround on the docker package side for now, but it’s not enough. As explained a couple of posts ago, this has nothing to do with ZSys, but rather the whole boot experience and various zpool and zfs commands due to this huge number of datasets created. We can work with ZFS upstream to have ovl-on-zfs merged (and this will be likely be backported to our LTS if we deem this important enough) |
OverlayFS over ZFS would probably work (eventually) but it isn't a great idea. Most likely what is 'killing your system' are the snapshots created by zsys whenever an update is applied (via apt-get). zfs improves performance (even when zsys is around) as long as it is properly used - and using OverlayFS - might improve boot time, but ZFS (used correctly) will improve everything. Hope this helps. |
@didrocks if you guys have some time to pitch in with some help writing some ZFS tests, that's actually all that's blocking that work to be merged. One major TODO would be OverlayFS tests themselves, which I am not sure that I'm competent enough to devise the methodology for, Any and all help in that direction is greatly appreciated. |
I think the issue I had with zsys and Docker is related to this: If zsys automatically creates a snapshot of a running container (or related image/layer), when that container is stopped & removed, docker attempts to remove all relevant datasets. The datasets can't be removed since they still have snapshots (effectively, At first I tried manually removing all of the zsys snapshots and removing the containers again, but docker didn't try to remove the ZFS datasets again (they should have already been removed, the command to remove them had already been issued; docker was just waiting for the datasets to disappear). I didn't have time to mess with it and instead just destroyed everything under Personally, I don't think the above issue issue is with Docker as much as it is with zsys not having an option to ignore particular datasets. An option in zsys similar to |
Sorry, what work is pending merging? Happy to see if we can spend some time helping you with this. @Rain: this isn’t exactly the case, there is no way for ZSys to know that a particular dataset is a container or not, having no property set on it. Note that anyone can create backup, not only ZSys (there is a bunch of sysadmin snapshot tools that people installs) and the result would be exactly the same. |
@didrocks openzfs/zfs#9600 (comment) Thank you for your offer! ;) Edit: also, the OverlayFS tests themselves in openzfs/zfs#9414 (currently @openzfs/zfs@5ce120c) are pretty weak. Those also need a bit of attention & love. |
I ran into this issue when trying to clean up my docker containers. For removal of a docker container wouldn't one of the solutions be to have the docker zfs storage know to use: zfs destroy -R rpool/ROOT/ubuntu_/var/lib/ and take the snapshots of the out with the dataset that is being removed as a part of the docker rm command? I'll see if I can make that change locally in my environment [most likely to take me a couple of years - I'm old and slow]. Likewise, couldn't there I script to cleanup snapshots on docker datasets? List all the docker datasets, find all the zfs snapshots on those datasets, remove those snapshots? Will see if I can figure that out. That would give a workaround to slow performance due to all the snapshots. |
The way I get round this issue to use overlay ontop of zfs is to create a zvol on the relevant pool, format it as ext4, then mount it on /var/lib/docker. Then you get the best of all worlds. eg
|
The solution posted by @kraduk worked better for me like this with sudo, might help someone else :)
|
Since the discussion in #40132 recently continued, and we cannot make Like @Rain suggested above, and as we have shown in ubuntu/zsys#200 (comment), the Docker ZFS driver suggests itself to delete the datasets with
Maybe this flag can become the default behaviour of the ZFS driver, for allowing Moby/Docker autonomy over its very own datasets, for it not to be distracted with the side-effects of eventual third-party snapshots? |
just ran into this on some hosts after upgrading to ubuntu 22.04. would love for @almereyda's comment to be implemented, even just with a docker zfs driver option, to force removal of dependent snapshots when removing a docker container. that at least would be sufficient for this to stop breaking systems. |
This isn't a proper workaround but for anyone who'd just like to able to check |
That will help but unfortunately not deal with the slow down from so many
datasets
…On Thu, 9 Feb 2023, 22:27 Callum Gare, ***@***.***> wrote:
This isn't a proper workaround but for anyone who'd just like to able to
check zfs list easily without having to sift though hundreds for docker
datasets I've been using sudo zfs list | grep -v -E '[0-9a-f]{64}' to
fillter them out.
—
Reply to this email directly, view it on GitHub
<#41055 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADE23XL7BZQHJBMXHYISPIDWWVVNDANCNFSM4NQO5WBA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Not an ideal solution, but it does the trick for now. I've been running it every 24h for a few weeks and it's working well.
This script is dangerous. It will not work on your system unless you modify it first.
For my system/my example, the following regex works: Verify the match by running this. It will give you a list of the datasets it intends to disable and destroy snapshots on.
|
Does a docker system prune -fa not do a lot of the work?
I usually have a global service defined that runs something like this
sh -c while true; do echo starting Run; sleep 86400; docker image prune -af
; done
In the docker image with a volume mount of the docker socket
…On Fri, 10 Feb 2023, 15:41 nonlinearsugar, ***@***.***> wrote:
Not an ideal solution, but it does the trick for now. I've been running it
every 24h for a few weeks and it's working well.
1. This script prune old docker images
2. It scans for the docker datasets. Once it finds them, it disables
snapshotting on them (which solves this problem into the future) and
destroys any existing snapshots.
*This script is dangerous. It will not work on your system unless you
modify it first.*
1. Install this first: https://github.com/bahamas10/zfs-prune-snapshots
2. Design the regular expression match to target your setup. I have a
dedicated filesystem called "var-lib-docker" on my pool "mainpool" for all
my docker datasets. Mine look like:
"mainpool/var-lib-docker/cf275df392ce2bb98d963de6274e231b589caa26563edbf93a9b7fef302dddf1"
"mainpool/var-lib-docker/cf275df392ce2bb98d963de6274e231b589caa26563edbf93a9b7fef302dddf1-init"
For my system/my example, the following regex works:
zfsDatasets=$(zfs list -o name | grep --extended-regexp
'mainpool\/var-lib-docker\/([a-z]|[0-9]){64}$|mainpool\/var-lib-docker\/([a-z]|[0-9]){64}-init$')
Verify the match by running this. It will give you a list of the datasets
it intends to disable and destroy snapshots on.
for zfsDataset in $zfsDatasets
do
echo $zfsDataset
done
1. Update the "zfsDatasets=" line in the script below, and give it a
rip. If you've thousands of snapshots, it'll operate for a while. It takes
a few seconds per snapshot while not being IO intensive so the script is
designed to operate on each dataset in parallel.
#/bin/bash
docker image prune --all --force --filter "until=168h"
zfsDatasets=$(zfs list -o name | grep --extended-regexp 'mainpool\/var-lib-docker\/([a-z]|[0-9]){64}$|mainpool\/var-lib-docker\/([a-z]|[0-9]){64}-init$')
for zfsDataset in $zfsDatasets
do
zfs set com.sun:auto-snapshot=false $zfsDataset &
done
for zfsDataset in $zfsDatasets
do
zfs-prune-snapshots 0s $zfsDataset &
done
—
Reply to this email directly, view it on GitHub
<#41055 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADE23XNATPBAF4FXOUXUIDDWWZOS5ANCNFSM4NQO5WBA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It's been about 3 weeks since I went through all the RND but if I recall correctly, prune only worked on unused images and the problem of accumulating datasets was impacting my running containers. Prune is a good maintenance item for docker generally but the script I wrote specifically addresses the problem of accumulating snapshots whether containers are running or not. |
System prune and image prune aren't quite the same. Iirc system prune will
remove all non running containers, images, and volumes Vs image prune just
does unused images etc
…On Fri, 10 Feb 2023, 16:04 nonlinearsugar, ***@***.***> wrote:
It's been about 3 weeks since I went through all the RND but if I recall
correctly, prune only worked on *unused* images and the problem of
accumulating datasets was impacting my running containers. Prune is a good
maintenance item for docker generally but the script I wrote specifically
addresses the problem of accumulating snapshots whether containers are
running or not.
—
Reply to this email directly, view it on GitHub
<#41055 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADE23XP673X7ZWGCCECXVW3WWZRHBANCNFSM4NQO5WBA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@kraduk > That will help but unfortunately not deal with the slow down from so many datasets So don't use the ZFS storage driver. Use the normal one and point it to /tank/docker or whatever. No need to create datasets for every container if you feel it's slow. I have a ~20 TB array consisting of 8 * 4 TB drives, 1 * 2 TB SSD cache drive, and 2 * 250 GB SSD mirrored log drives. I have thousands of datasets and several thousand snapshots on each.
|
@darkpixel it may be fast for you, but clearly it is not fast for others. 1281 docker datasets here and it takes >3s to list. It's not just about waiting a few seconds, as the overhead delays any scripts pulling dataset stats, etc. |
There are actually two parts to this bug. One if the performance issue, the other part is "not cleaning up datasets". One is fixed by changing your ZFS setup, adding RAM, a bigger CPU, faster disks, etc...the other can be worked-around temporarily (until part of the bug is fixed) by pointing docker to a storage location (that can be ZFS-backed or not) but doesn't use the ZFS driver. That's what I do on all my boxes because they don't have non-ZFS storage. I create |
ext4 on a zvol works great for me. It uses the |
I generally do that, but specifically as I want to use overlayfs driver and
don't mind loosing some of the zfa features. This may not be desirable in
many cases, eg people who have a requirement of snapshotting. Although I'm
not 100% sure why they would have so much state in their containers.
…On Sat, 11 Feb 2023, 16:35 timkgh, ***@***.***> wrote:
ext4 on a zvol works great for me
#41055 (comment)
<#41055 (comment)>
—
Reply to this email directly, view it on GitHub
<#41055 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADE23XONG4G53IMYKODOUQTWW65WDANCNFSM4NQO5WBA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Overlayfs works great with zfs as long as you use a version from the openzfs/master branch which will become OpenZFS version 2.2... |
Same problem here with Don't like using ZVOL with additional EXT4 and can't use openzfs/master, so a workaround might be to opt-in into auto-snap some datasets only. In my case those are less than what Docker creates:
|
I recommend checking out https://github.com/jimsalterjrs/sanoid. You can set how/when snapshots are taken fairly easily and you can exclude certain datasets. Anyways, I only have ZFS filesystems available, and for operational reasons I have to use the zfs storage provider, I add this to my sanoid.conf file:
|
According to the docs,
Another example from the Sanoid issue tracker:
|
finally, zfs 2.2 will support overlay |
What does this mean? The datasets created are no longer listed? |
No need to use the zfs driver |
Not strictly true, it gives you more options for your solution, your design
goals dictate if you use the overlay backed by zfs or pure zfs driver. It
does mean you don't have to create a zvol formated to ext4 as a work around
though.
…On Thu, 9 Mar 2023, 11:28 陈杨文, ***@***.***> wrote:
finally, zfs 2.2 will support overlay
What does this mean? The datasets created are no longer listed?
No need to use the zfs driver
—
Reply to this email directly, view it on GitHub
<#41055 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADE23XKWZJDWDJZRIM7T6XLW3G5GLANCNFSM4NQO5WBA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
FYI OpenZFS 2.2.0 is out now, which supports Overlay2! |
Description
We started to receive a lot of bug reports against ZSys )like ubuntu/zsys#102 and ubuntu/zsys#112) because the number of datasets created by docker.io goes quickly out of control as people don’t remove stopped containers (via
docker rm
)Steps to reproduce the issue:
Describe the results you received:
zfs list
-> the dataset associated to this stopped container are still there. After a few days, the list grows out of control:This creates timeouts and very slow ZFS related commands on the system
Describe the results you expected:
I think docker should clean up for stopped containers the ZFS datasets that it creates.
Output of
docker version
:Output of
docker info
:The text was updated successfully, but these errors were encountered: