-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache dependencies in CI runner #108
Comments
It's mostly poetry that cannot resolve the dependencies very quickly. That seems to be a common issue: python-poetry/poetry#2094. |
Relevant pre-commit stuff: https://tobiasmcnulty.com/posts/caching-pre-commit/ |
The setup-python action for GitHub Actions supports caching poetry dependencies out of the box. Minimal example to make things run:
I don't know why but |
Btw, the code (structure) of this project is really nice. Once, every few months I have a peek at how it's evolving! If you have to get some more inspiration for good coding practices (in Python) I suggest to have a look at the work of Tiangolo. He's the author of the excellent FastAPI and Typer packages. Next to the code, his packages also have very clear and extensive documentation from which you could get some inspiration. |
@markkvdb thanks! I tried this a few days ago in a commit in #107, but the issue is that we do not ship a lockfile (ALNS is a library, not an application, so we do not require fixed versions of our dependencies). I have been thinking of testing with fixed dependency versions, but haven't gotten around to anything yet. |
You don't actually need a lock file ( |
The caching will be done based on the hash of the python version and the |
If I'm wrong about that, you can use some version of the CI file here: Let me know if none of my suggestions work because then I'll have a look as well to understand what I'm not understanding 😃 |
Are you sure? Based on my understanding of the action, it'll also hash in the lock file contents. And that will be resolved anew by poetry every time (which is very slow), and likely not match an existing cached set of dependencies (since our dependencies have their own dependencies, and if any one of those has recently been updated, our cache is invalidated). But I could be misunderstanding this. I'll have a look around at how other projects solve this (soon-ish). |
This looks promising, thanks! |
I'm quite surprised that setup-python does not save the environment (with temporary lock file) across runs. I never tested it but if it's really like that, then adding a manual caching step should definitely work. In my opinion, this is cleaner than expecting developers to consistently run the If you do want to use the |
@markkvdb I don't think I understand the link between the dependencies in the |
Never mind, I misunderstood how you are using pre-commit. I was thinking that It seems that manually handling the caching might the best if setup-python doesn't handle this. It does require you to wait for the dependency resolver once but that's what it is in that case. One note regarding long resolving times is that it might make sense to have a look which package poetry has difficulty resolving. You can get an idea by running poetry lock in verbose mode with |
@markkvdb thanks! I'm not entirely sure how we should approach dependency pinning yet. We have some minimum versions we could pin, to at least make sure those work. At the same time, we also want to make sure recent(-ish) versions of our dependencies work well too. I'm not sure if we should pin those, or leave it to pypi to get us whatever's most recent. I'll have a look at how scipy, statsmodels, and the like handle this soon! |
Just brainstorming ideas in this thread here to make things faster in the CI. Without the lockfile this seems to be a little more challenging.
I'm not sure if the cache from the previous cron run in step 2 would need to be invalidated or if the generated Per the first item I'm happy to open a PR for this if it's a useful addition. |
@markkvdb @fahaddd-git thanks for your inputs! I have added some caching in #119 that reduces CI runtimes around 10x to one minute or less. That is more than sufficient for now, so I closed this issue. |
The CI runner takes an increasing amount of time to resolve package dependencies. This has resulted in build times running up to 10min for the Python 3.10 build. We should cache the dependencies for a while on the CI runner, so as to avoid these very long runtimes.
EDIT: I'm pretty sure scipy/statsmodels solved this in some way, so their CI scipts would be a good place to start.
The text was updated successfully, but these errors were encountered: