Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Remove apt cache from the docker images #11470

Merged
merged 1 commit into from
Jun 16, 2022

Conversation

gigiblender
Copy link
Contributor

@gigiblender gigiblender commented May 26, 2022

This PR removes the apt-cache whenever possible from our docker images.

cc @Mousius @areusch @driazati

@gigiblender gigiblender marked this pull request as ready for review May 26, 2022 17:23
@gigiblender
Copy link
Contributor Author

@areusch @driazati

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gigiblender thanks! i added some thoughts here.

docker/Dockerfile.ci_i386 Outdated Show resolved Hide resolved
@@ -38,10 +39,12 @@ sudo apt-key add kitware-archive-latest.asc

echo deb https://apt.kitware.com/ubuntu/ bionic main\
>> /etc/apt/sources.list.d/kitware.list
sudo apt-get update
sudo apt-get update --fix-missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it'd be great if we could somehow not apt update if no sources.list.d changed. maybe we could make our script run e.g. find /etc/apt/sources.list.d /etc/apt/sources.list | xargs md5sum | md5sum and store the output in /var/apt-update-md5 and only rerun when there is a mismatch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be simpler if we just didn't apt update at all and just leave it to happen in the Dockerfile level, would that also work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only thing is that various install scripts add apt sources

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if omitting apt-updates is possible. As far as I understand, performing an apt-get install will do a lookup in /var/lib/apt/lists/ for the desired package (and the relevant dependencies). However, after each apt-get install, we clear that directory to reduce the docker layer size (resulting in a smaller image). Therefore, I think running apt-get update on each install is necessary in this case.

I tried to implement the md5 sum check and it resulted in packages not being located.

Also, the best practices guide for docker files seems to suggest something similar with regards to running apt-get update and apt-get install in the same layer.

Please let me know if you have other thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly, reducing the image size (by removing the files under /var/lib/apt/lists/) and decreasing the image build times (and therefore CI times) by not running apt-get update each time is not achievable. Also, the recommendation is to run apt-get update and clear the cache on every docker layer where apt is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay looking over all of this i agree there are some challenges then. I don't think we should take an approach that requires us to run apt-get update for each layer. we do need to run it:

  • one time before the first apt-get install for each docker image
  • any time a script adds to /etc/apt/sources.list.d

i think we could work around this either by:

  • not removing /var/lib/apt/lists (i just built ci_cpu and it's 40M. that seems small relative to any accumulated packages)
  • attempting to mount that dir in a docker volume (this doesn't actually look super-feasible from docker build, but perhaps a workaround exists with e.g. buildx or something)
  • doing something a bit crazy like making a tar of /var/lib/apt/lists and uploading that to the build host each time. i actually think we should not do this one, but if a simpler similar thing exists and the lists are larger than 40M, maybe it's worth it.

feel free to investigate a bit here. the main thing i'm interested in is removing any cached debian packages

@gigiblender gigiblender force-pushed the remove-apt-cache branch 10 times, most recently from 21b6186 to 08af83f Compare June 8, 2022 13:19
@areusch
Copy link
Contributor

areusch commented Jun 10, 2022

blocked on #11644

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @gigiblender this is basically ready. it'd be great to add a test, we could do that in this PR or in a follow-on. lmk your preference here.

tests/lint/docker-format.sh Outdated Show resolved Hide resolved
@areusch areusch merged commit ada4c46 into apache:main Jun 16, 2022
@areusch
Copy link
Contributor

areusch commented Jun 16, 2022

thanks @gigiblender !

areusch added a commit to areusch/tvm that referenced this pull request Jun 16, 2022
areusch added a commit to areusch/tvm that referenced this pull request Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants