-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull/Push cached environment using storage #6763
Conversation
Before starting clonnig the repository we check of a cached environment in the storage. If there is one, we pull it and extract it under `project.doc_path` (using the same structure that we have been using) and use those cached files. Once the build has finished and it was successful, we compress all the environment files (pip cache, checkout, conda and virtualenv environments) and push it to the storage. The cached is cleaned up when the version is wiped. This feature is under a Feature flag (CACHED_ENVIRONMENT), so we can select specific projects to start testing out this cached environment. It can help on projects with big/huge repositories or that install a lot of dependencies.
Some notes to keep in mind while reviewing this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is looking great! I noted some changes to be a bit more defensive, but it would be good to get feedback from someone that has touched the task operations more recently for feedback on the order of operations there. It looks okay to me, but I'm a little rusty on potential issues that could stem from tar file operations or storage calls failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the approach I was thinking would work and not be a ton of effort. This feels like enough to at least start testing, and then we can iterate and improve on it as we go. Hopefully this will at least get us at parity with our scaling builders.
local_fd.write(remote_fd.read()) | ||
|
||
with tarfile.open(tmp_filename) as tar: | ||
tar.extractall(self.version.project.doc_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth abstracting the archive format here. It looks like shutil
has an abstraction over it. I imagine we'll want to play with tar
and gzip
at least, because I imagine bandwidth and CPU will be a tradeoff here that we want to be able to tweak.
https://docs.python.org/3/library/shutil.html#archiving-operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm... Interesting! I didn't know about shutil.make_archive
. I will take a deeper look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an abstraction is a good idea to be able to play here. Although, it seems that shutil.make_archive
does not gives us too much ability for this since it's not too customizable: we can't change lower level attributes.
Besides, we would need Python 3.8 because since that version they are using modern PAX (--xattrs
equivalent):
Changed in version 3.8: The modern pax (POSIX.1-2001) format is now used instead of the legacy GNU format for archives created with format="tar".
On the other hand, tarfile
allows us to play with gzip
, bz2
and lzma
and with compression level as well.
+ refactor how cached directories are managed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Do we need a comparable corporate change?
@agjohnson there is a small PR that just defines the required settings on docker compose environment and other in -ops I don't think there is any other differences compared with community. |
Before starting clonig the repository we check of a cached environment in the storage. If there is one, we pull it and extract it under
project.doc_path
(using the same structure that we have been using) and use those cached files.Once the build has finished and it was successful, we compress all the environment files (pip cache, checkout, conda and virtualenv environments) and push it to the storage.
The cached is cleaned up when the version is wiped.
This feature is under a Feature flag (
CACHED_ENVIRONMENT
), so we can select specific projects to start testing out this cached environment. It can help on projects with big/huge repositories or that install a lot of dependencies.