Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why containers use hundreds of MBs for Vim/Perl/OpenGL? #225

Open
eero-t opened this issue May 30, 2024 · 11 comments · May be fixed by #1031
Open

Why containers use hundreds of MBs for Vim/Perl/OpenGL? #225

eero-t opened this issue May 30, 2024 · 11 comments · May be fixed by #1031
Assignees
Labels

Comments

@eero-t
Copy link
Contributor

eero-t commented May 30, 2024

Many of the Dockerfiles install Vim and/or Mesa OpenGL/X packages:

$ git grep -l -B1 -e mesa-glx -e '\bvim\b'
AudioQnA/langchain/docker/Dockerfile
ChatQnA/deprecated/langchain/docker/Dockerfile
ChatQnA/docker/Dockerfile
CodeGen/deprecated/codegen/Dockerfile
CodeGen/docker/Dockerfile
CodeTrans/deprecated/langchain/docker/Dockerfile
CodeTrans/docker/Dockerfile
DocSum/deprecated/langchain/docker/Dockerfile
DocSum/docker/Dockerfile
Translation/langchain/docker/Dockerfile

Why?

They take lot of space in the containers; Mesa's LLVM dependency alone adds >100MB, Vim adds 40MB, and I suspect they're reason why full Perl gets installed:

$ docker images | grep chatqna
<MY_REGISTRY>/dgpu-enabling/opea-chatqna       latest      4b71cbea8ab6   36 minutes ago   727MB

$ docker run -it --rm --entrypoint /bin/sh opea/chatqna -c "du -ks /usr/*/*/* | sort -nr"
214284	/usr/local/lib/python3.11
114564	/usr/lib/x86_64-linux-gnu/libLLVM-15.so.1
40520	/usr/share/vim/vim90
30532	/usr/lib/x86_64-linux-gnu/libicudata.so.72.1
25936	/usr/lib/x86_64-linux-gnu/perl
25164	/usr/lib/x86_64-linux-gnu/dri
22736	/usr/lib/x86_64-linux-gnu/libz3.so.4
20732	/usr/share/perl/5.36.0
...

If containers really need text-editor, e.g. nano would be user-friendlier and much smaller (1MB) than vim.

@eero-t
Copy link
Contributor Author

eero-t commented May 30, 2024

"GenAIComps" repo Dockerfiles have the same issue:

$ git grep -l -B1 -e mesa-glx -e '\bvim\b'
.github/workflows/docker/ut.dockerfile
comps/dataprep/qdrant/docker/Dockerfile
comps/dataprep/redis/docker/Dockerfile
comps/embeddings/langchain/docker/Dockerfile
comps/guardrails/langchain/docker/Dockerfile
comps/llms/summarization/tgi/Dockerfile
comps/llms/text-generation/tgi/Dockerfile
comps/reranks/langchain/docker/Dockerfile
comps/retrievers/langchain/docker/Dockerfile

@eero-t
Copy link
Contributor Author

eero-t commented May 30, 2024

Full Perl version gets added as git package dependency (minimal Python image already included few MB minimal Perl, as that's POSIX requirement).

There are so many dependencies between "GenAIComps" and "GenAIExamples" repos that I think it would make sense to merge them. Then git, and therefore also Perl, could be dropped from (almost) all images.

Another alternative would be having separate "fetch" phase in the Dockerfile which would install Git, do git pull (using -depth 1 option to speed it), and remove .git dir afterwards, so that its not left there when final Dockerfile phase copies the GenAIComps dir content from "fetch" phase.

@eero-t
Copy link
Contributor Author

eero-t commented May 30, 2024

Dropping libgl1-mesa-glx and replacing vim with nano in Dockerfile, reduces chatqna container size by 253MB i.e. 35%:

$ docker images|grep chatqna
opea/chatqna             latest       4b71cbea8ab6   About an hour ago   727MB
opea/chatqna-test        latest       9aadb869edaf   11 minutes ago      474MB

@eero-t eero-t changed the title Why containers waste space for Vim and OpenGL/X installs? Why containers waste hundreds of MBs for Vim/Perl/OpenGL? May 30, 2024
@eero-t eero-t changed the title Why containers waste hundreds of MBs for Vim/Perl/OpenGL? Why containers use hundreds of MBs for Vim/Perl/OpenGL? May 30, 2024
@eero-t
Copy link
Contributor Author

eero-t commented May 30, 2024

Another alternative would be having separate "fetch" phase in the Dockerfile which would install Git, do git pull (using -depth 1 option to speed it), and remove .git dir afterwards, so that its not left there when final Dockerfile phase copies the GenAIComps dir content from "fetch" phase.

Tried doing Git cloning in separate step and copying just repo content to final image:

FROM python:3.11-slim AS base
RUN useradd -m -s /bin/bash user && mkdir -p /home/user && chown -R user /home/user/

FROM base AS fetch
RUN apt-get install -y --no-install-recommends git
RUN cd /home/user/ &&  git clone --depth 1 https://github.com/opea-project/GenAIComps.git
RUN rm -r /home/user/GenAIComps/.git

FROM base AS final
COPY --from=fetch /home/user/GenAIComps /home/user/GenAIComps
...

=> It reduced final image size by additional 108MB, to 366MB, which is half of the original 727MB size.

@srinarayan-srikanthan srinarayan-srikanthan self-assigned this Jun 3, 2024
@yinghu5 yinghu5 added help wanted Extra attention is needed aitce labels Jun 11, 2024
@srinarayan-srikanthan
Copy link
Collaborator

Will validate for all the examples and then incorporate this.

@yinghu5 yinghu5 added feature New feature or request Escalated and removed help wanted Extra attention is needed labels Jun 12, 2024
@eero-t
Copy link
Contributor Author

eero-t commented Jun 28, 2024

All common dependencies should be on a shared base layer, see: opea-project/GenAIComps#265

That way these optimizations need to be done only once.

@kevinintel
Copy link
Collaborator

will improve it in the future

@eero-t
Copy link
Contributor Author

eero-t commented Aug 28, 2024

Once the base images have been cleaned of extra content, it's easy to generate additional, separate "devel" images where those (Vim, Perl, Git etc) tools are added back.

All it needs is:

  • Dockerfile taking the base image as variable, and adding those tools on top of it
  • Script that loops over desired base images, using that Dockerfile to build tool versions of them, and pushing generated images to repository

Which both are pretty trivial...

@eero-t
Copy link
Contributor Author

eero-t commented Oct 21, 2024

Another alternative would be having separate "fetch" phase in the Dockerfile which would install Git, do git pull (using -depth 1 option to speed it), and remove .git dir afterwards, so that its not left there when final Dockerfile phase copies the GenAIComps dir content from "fetch" phase.

Tried doing Git cloning in separate step and copying just repo content to final image:

FROM python:3.11-slim AS base
RUN useradd -m -s /bin/bash user && mkdir -p /home/user && chown -R user /home/user/

FROM base AS fetch
RUN apt-get install -y --no-install-recommends git
RUN cd /home/user/ &&  git clone --depth 1 https://github.com/opea-project/GenAIComps.git
RUN rm -r /home/user/GenAIComps/.git

FROM base AS final
COPY --from=fetch /home/user/GenAIComps /home/user/GenAIComps
...

=> It reduced final image size by additional 108MB, to 366MB, which is half of the original 727MB size.

Instead of removing .git dir in fetch phase, final image could copy just needed pieces, for example:

ENV HOME=/home/user
COPY --from=fetch $HOME/GenAIComps/comps $HOME/GenAIComps/comps
COPY --from=fetch $HOME/GenAIComps/*.* $HOME/GenAIComps/

(git could be better name for the intermediate container stage/image rather than fetch.)

@kevinintel
Copy link
Collaborator

please submit pr

@eero-t eero-t linked a pull request Oct 25, 2024 that will close this issue
1 task
@eero-t
Copy link
Contributor Author

eero-t commented Oct 25, 2024

please submit pr

Ok, here's an example of doing that for GenAIExamples repo containers: #1031

@kevinintel Do you want me to write example PR also for GenAIComps repo containers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants