Comparison with EroFS #28
Replies: 5 comments 10 replies
-
I'm actually surprised you got EROFS to produce an image on such a large corpus. I gave up after 15 hours, concluding that there might be some sort of accidentally quadratic complexity built into
That should at least improve with the switch to jemalloc. |
Beta Was this translation helpful? Give feedback.
-
This will likely reduce the achievable compression ratio and will likely make compression slower.
These really only make sense if the source files do in fact belong to different uids/gids. In case of a git checkout, I would assume that it's only one user/group, so exactly zero bits will be dedicated to the storage of uid/gid per file system entry.
Setting the time resolution doesn't make any difference here, as the time is already set to zero. The same is true, btw, if the time is set to any other constant. That time base constant will be stored exactly once, and all file system entry times will be stored relative to the time base. If they're all the same, again, zero bits will be used per file system entry. |
Beta Was this translation helpful? Give feedback.
-
FWIW, from cdnjs @ 5b47a977
So it's not actually that much smaller. |
Beta Was this translation helpful? Give feedback.
-
I noticed that there is ongoing development for adding multi-threading support to mkfs.erofs |
Beta Was this translation helpful? Give feedback.
-
I've updated the comparison using the latest version of erofs-utils from the dev branch and using different options for building the file system. |
Beta Was this translation helpful? Give feedback.
-
So I've tried to compare DwarFS with EroFS in the use-case of CDNJS (i.e. creating an image from the ~260 GiB that is the entire CDNJS repository checkout).
In what follows I'll paste the output of the
time
for the following format (I use the externaltime
command as it supports more reporting fields):The following are times it took Git to do various tasks:
Basically from these commands the
git-optimize:repack
would be the equivalent of themkdwarfs
. The same forgit-optimize:fsck
equivalent todwarfsck
.The following is the time that it took to create the DwarFS image:
The following is the time that it took to create the EroFS image:
Unfortunately I wasn't able to mount the EroFS image (it mounted, but accessing files it gave an error, which most likely means incompatible tools with the kernel).
However I was able to test DwarFS:
For comparison the following are equivalent runs for the files on Ext4:
The hashing was done with my own
md5-tools
(https://github.com/volution/volution-md5-tools), which run 64 I/O threads (thew64
), either in parallel (thea
afterw64
) or in alternate (thes
after thew64
) with the walking thread.The following are the sizes for the various folders, images, archives:
My short conclusion based on this data is:
-o cachesize
issue #9);Beta Was this translation helpful? Give feedback.
All reactions