-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] Remove apt cache from the docker images #11470
Conversation
40daa30
to
5899e70
Compare
5899e70
to
c3043e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gigiblender thanks! i added some thoughts here.
@@ -38,10 +39,12 @@ sudo apt-key add kitware-archive-latest.asc | |||
|
|||
echo deb https://apt.kitware.com/ubuntu/ bionic main\ | |||
>> /etc/apt/sources.list.d/kitware.list | |||
sudo apt-get update | |||
sudo apt-get update --fix-missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also it'd be great if we could somehow not apt update if no sources.list.d changed. maybe we could make our script run e.g. find /etc/apt/sources.list.d /etc/apt/sources.list | xargs md5sum | md5sum
and store the output in /var/apt-update-md5 and only rerun when there is a mismatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be simpler if we just didn't apt update
at all and just leave it to happen in the Dockerfile level, would that also work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only thing is that various install scripts add apt sources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if omitting apt-update
s is possible. As far as I understand, performing an apt-get install
will do a lookup in /var/lib/apt/lists/
for the desired package (and the relevant dependencies). However, after each apt-get install
, we clear that directory to reduce the docker layer size (resulting in a smaller image). Therefore, I think running apt-get update
on each install is necessary in this case.
I tried to implement the md5 sum check and it resulted in packages not being located.
Also, the best practices guide for docker files seems to suggest something similar with regards to running apt-get update
and apt-get install
in the same layer.
Please let me know if you have other thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, reducing the image size (by removing the files under /var/lib/apt/lists/
) and decreasing the image build times (and therefore CI times) by not running apt-get update
each time is not achievable. Also, the recommendation is to run apt-get update
and clear the cache on every docker layer where apt is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay looking over all of this i agree there are some challenges then. I don't think we should take an approach that requires us to run apt-get update for each layer. we do need to run it:
- one time before the first apt-get install for each docker image
- any time a script adds to /etc/apt/sources.list.d
i think we could work around this either by:
- not removing /var/lib/apt/lists (i just built ci_cpu and it's 40M. that seems small relative to any accumulated packages)
- attempting to mount that dir in a docker volume (this doesn't actually look super-feasible from docker build, but perhaps a workaround exists with e.g. buildx or something)
- doing something a bit crazy like making a tar of /var/lib/apt/lists and uploading that to the build host each time. i actually think we should not do this one, but if a simpler similar thing exists and the lists are larger than 40M, maybe it's worth it.
feel free to investigate a bit here. the main thing i'm interested in is removing any cached debian packages
21b6186
to
08af83f
Compare
08af83f
to
bb2c81f
Compare
blocked on #11644 |
bb2c81f
to
83faecc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @gigiblender this is basically ready. it'd be great to add a test, we could do that in this PR or in a follow-on. lmk your preference here.
83faecc
to
fbdbcd2
Compare
thanks @gigiblender ! |
This PR removes the apt-cache whenever possible from our docker images.
cc @Mousius @areusch @driazati