Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trim the sandbox image and install plugin dependencies in agnostic image #2792

Merged
merged 7 commits into from
Jul 6, 2024

Conversation

Shimada666
Copy link
Contributor

@Shimada666 Shimada666 commented Jul 4, 2024

What is the problem that this fixes or functionality that this introduces? Does it fix any open issues?

https://opendevin.slack.com/archives/C06QKSD9UBA/p1719019334249829

As mentioned in the Slack discussion, our current sandbox image is a bit large and has some unnecessary dependencies. I want to remove these unnecessary dependencies to make the sandbox image smaller.

Also, we will install plugin dependencies in the dockerfile of the agnostic image (just like we do in the current sandbox image). This will significantly improve the startup speed for users using custom sandbox container image.

Give a brief summary of what the PR does, explaining any non-trivial design decisions

remove these:

  • nano: maybe we can use vim to instead
  • bash: system built-in
  • libgl1-mesa-glx: opencv-python need it, opencv is only used for parse_video in agentskill. It's usually not needed, but it takes up many space. When it's really needed, we can have the LLM install it temporarily.

Other references

@tobitege
Copy link
Collaborator

tobitege commented Jul 4, 2024

  • Is nano and bash that large?
  • Is there a way to actually get a list of packages sorted by "used" amount of MB?
  • By how much do we want to decrease the overall size?
  • Would it be an option to have a dev-ready build with npm, node and the like, and get away from the "pull always" to a "pull daily" maybe? (out of scope of this PR)

@Shimada666
Copy link
Contributor Author

Shimada666 commented Jul 4, 2024

Is nano and bash that large?

No. The are unnecessary dependencies. We don't need install vim and nano at the same time. and bash is a system built-in dependency.

Is there a way to actually get a list of packages sorted by "used" amount of MB?

Yes, but it's not necessary. Removing these packages is mostly based on my experience. I will balance the image size and usability. For example, build-essential is large, but I still choose to keep it because it might be useful in some cases. I will only consider removing dependencies that are truly unnecessary. This is the first round of PR, and it only removes truly unnecessary dependencies.

By how much do we want to decrease the overall size?

I want to reduce the size as much as possible while ensuring usability. A size under 1GB is ideal.

Would it be an option to have a dev-ready build with npm, node and the like, and get away from the "pull always" to a "pull daily" maybe? (out of scope of this PR)

I don't want any always-pull strategy. I want to prompt users to repull their image only when there's a breaking change.

@iFurySt
Copy link
Collaborator

iFurySt commented Jul 4, 2024

@Shimada666 Thanks for your effort. Here are some rough estimates for your reference:

build-essential: ~263 MB
g++: ~209 MB
gcc: ~138 MB

miniforge3: ~1.1G
agentskills dependencies(installed by pip): ~300 MB

I'm curious about why we need the g++ and gcc.

if you add this to the apt stage, we can save about 100MB of space~

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    ... \
    && apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

The miniforge3 is the root reason for the large image size. The official docker seems smaller than directly installed through the shell script. you can look at it here https://hub.docker.com/r/condaforge/miniforge3/tags
image

# docker run -it condaforge/miniforge3:24.3.0-0 bash
(base) root@35901b54d139:/# uname -a
Linux 35901b54d139 5.15.0-1061-aws #67~20.04.1-Ubuntu SMP Wed Apr 17 15:09:54 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
(base) root@35901b54d139:/# cat /etc/issue
Ubuntu 20.04.6 LTS \n \l

But it is based on the Ubuntu 20.04.6.

May we consider using the condaforge/miniforge3's official image as the base image to build our sandbox.

@Shimada666
Copy link
Contributor Author

@iFurySt
Thank you very much for providing the data! It is very useful for the next step of trimming the image.

In fact, I am not sure if we still need gcc and g++ now, as their usage seems limited. The only scenario I encountered was during the Jupyter installation. Before, psutil would be installed in Jupyter installation. and the old version of psutil required gcc for installation. But in the new version of psutil, gcc is no longer needed. So, removing gcc should theoretically be feasible.

Reference: #2536

I need to think about whether to use condaforge/miniforge3 as the base image. It's a great suggestion! My next task is to change the default Python interpreter back to /usr/bin/python to completely isolate the runtime client from the system Python. I need to complete this task first, then reconsider how to trim the image size.

Thanks again for your contribution!

@Shimada666 Shimada666 changed the title trim the sandbox image trim the sandbox image and install plugin dependencies in agnostic image Jul 6, 2024
@Shimada666
Copy link
Contributor Author

@tobitege
Hi, can you take a look at this PR? I've added some new stuff:
we will install plugins dependencies in the dockerfile of the agnostic image (just like we do in the current sandbox image). This will significantly improve the startup speed for users using custom image. 😄

@tobitege
Copy link
Collaborator

tobitege commented Jul 6, 2024

I'll miss nano a little, it's a nice alternative for Windows users to not break their fingers with wild linux key combos 😂

@Shimada666
Copy link
Contributor Author

I'll miss nano a little, it's a nice alternative for Windows users to not break their fingers with wild linux key combos 😂

😂 ok, nano isn't large, I'll add it back.

@tobitege
Copy link
Collaborator

tobitege commented Jul 6, 2024

I'll miss nano a little, it's a nice alternative for Windows users to not break their fingers with wild linux key combos 😂

😂 ok, nano isn't large, I'll add it back.

Yay! 🤗

@Shimada666
Copy link
Contributor Author

@tobitege Done. please take a look again!

Copy link
Collaborator

@tobitege tobitege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If issues arise, we know where to look. 😀

@tobitege tobitege merged commit 82f256b into All-Hands-AI:main Jul 6, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants