Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Docker image is 5+ GB #46

Open
bzinberg opened this issue Mar 24, 2020 · 2 comments
Open

The Docker image is 5+ GB #46

bzinberg opened this issue Mar 24, 2020 · 2 comments

Comments

@bzinberg
Copy link
Contributor

Partly due to having both TensorFlow and (once #45 is merged) PyTorch installs. Makes it pretty cumbersome to install without a strong internet connection. Not sure what we can do about this.

@bzinberg bzinberg changed the title The image is 5+ GB The Docker image is 5+ GB Mar 24, 2020
@fplk
Copy link
Contributor

fplk commented Mar 24, 2020

Afaict, not much you can do about it. The current image does not delete its sources after package installation - you can add rm -rf /var/lib/apt/lists/* to do so. You can also delete the Julia archive after you have downloaded it, i.e. rm julia-1.3.1-linux-x86_64.tar.gz. Finally, each time you add another RUN line you create another layer in the layered file system, so you might want to summarize a few lines, e.g. for the Julia installation in https://github.com/probcomp/gen-quickstart/blob/master/Dockerfile#L20 Finally, it seems unnecessary to use virtualenv inside a container which is already well compartmentalized.

But the short answer is that none of these methods will have tremendous impact. It's not super optimized, but I pull a few tricks like this in https://github.com/probcomp/gen-quickstart/blob/master/Dockerfile.ubuntu-2004 and it does not make the image much smaller. And it would get even worse if you added GPU acceleration with nvidia/cuda-based images (expect 6-10GB). The methods to really shrink this down further like building in one image and then only pushing binaries into the production image as well as using a smaller base image like Alpine are not ideal for a developer image. So I can push some optimizations if you want, but if you wanted to shrink this to 1GB or so I'm not too optimistic. We can try a few things, but the low hanging fruit probably won't suffice to significantly reduce its size.

PS:

  • You might not need the git dependency, since you copy the sources in
  • Not sure what python-tk is needed for. Isn't Tcl/Tk just relevant for GUIs?
    => Might be able to reduce dependencies a bit.

@bzinberg
Copy link
Contributor Author

Yeah, I figured as much -- thanks for shedding some light on this @fplk. Given that there are no known low-hanging fruit, and the current situation is tolerable, I think we should leave it as-is and keep the issue open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants