-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebuild occuring every time in dockerized build, with cache #4493
Comments
Hi @pwaller It would be helpful to see the full log — would you kindly paste that? It looks, though, like you aren't caching enough. The subproject directories have their own '.stack-work' directories which contain the build output. Incidentally, instead of the Stack has docker support. Did you consider this? |
Thanks for the quick and very helpful reply!
This relates to my inability to make a quick reproducer, the log contains lots of information I would rather not disclose. The lack of caches mounted in the subproject directories might well be the reason! Is there a way to redirect the caches to just one path fragment? It is awkward to need to redirect many of them! Thanks for the tip about I'm reluctant to use the docker abstraction built into stack - naively, it seems to increase the complexity, since a developer would need to understand how stack interacts with docker in addition to the complexity docker brings. I also don't expect it will help with the problem of caching things at a fine grained level? So I haven't looked into it yet. |
Basically, no. At least you only have to do it once. I guess you could cache the entire directory, then copy in your source, and rely on stack not needing to rebuild unnecessary stuff? Something like
I'd be interested to know whether that works, by the way — I've been needing to do something like this for a while. Also it would make a good addition to the |
No, this won't work, or at least it would be very non-idiomatic, since you'd have to copy it in within the run command:
Any chance to add environment configuration which controls the caching? It's nice to not pollute the source directories in any case! |
Surely you'd be copying the source directly into the cached directory outside of docker?
I don't quite understand.
The simplest method is just to extend the list of caches for your final RUN. Everything up to and including the |
The way the cache works is that it applies to a
Thanks for that clarification. Makes sense.
Sorry, I was not specific! I was wondering if you could allow the cache directory root to be specified (for example through configuration or the environment), so that it's possible to avoid polluting the source directory tree. |
I can report success with your suggested approach, thanks! I had to put a lot of cache lines on the final build, but I guess it isn't /that/ bad. |
Happy at this point to close the issue unless there is anything else in here you'd like to track. |
Might be worth adding to the documentation — I don't think it explains that there are multiple
The build cache (the ones in the subproject folders) has to be in directories below the project cabal files — it's all explained in #1178. |
Looks like this is resolved, closing |
General summary/comments (optional)
For me, stack often seems to rebuild things when it seems as though it shouldn't. See for example the kind of behavior described by someone else I don't know in #4490. In this issue I'm describing something a bit different - in this case the project (and subprojects) are being rebuilt, not the dependencies.
I can only apologise that I don't have the time to make a straightforward reproducer. I have tried my best to rule out things which might interfere, the most obvious candidate would be say, an editor/IDE which modifies files and causes a rebuild.
This could well be user error on my part, but it is difficult to understand what's going wrong. I'm inexperienced in the Hask ecosystem. Only for a short time I'm maintaining a medium sized project which currently takes 30 minutes to build from cold on a beefy machine. Full rebuilds are expensive and worth avoiding.
I thought to work around the issue in part by dockerizing the build, thus providing isolation from anything else going happening on the system.
Now I need you to suspend disbelief for a moment if this is new to you - docker has recently introduced a new caching mechanism which enables you to retain files across docker builds. They're revamping the way docker build works. The feature is in recent docker versions (18.09 I believe) and can be turned on from the docker client if you have
DOCKER_BUILDKIT=1
in your environment. This means for example that we can retain the.stack
and.stack-work
directories across docker rebuilds, whilst still rebuilding layers.However, we can still avoid rebuilding layers too, so let's do that. My execution plan in the
Dockerfile
is something like this (dockerfile):(With
$HOME/.stack
and$PWD/.stack-work
cached):stack update
..cabal
files.stack build --dependencies-only
stack build --test --bench --no-run-benchmarks --no-run-tests
This way the external dependencies only get invalidated in principle if the cabal files are invalidated.
My hope was that given that we have the caches enabled, if only a few haskell files changed, step 6 would be 'fast', i.e. only rebuild what changed. However, if there is a trivial change anywhere, all of the subprojects get rebuilt. (The external dependencies don't, which is good, because those are huge too).
The issue seems to be that stack always thinks the files have changed (seen in
stack build --verbose
):All of the other projects are then unregistered and rebuilt because one of the dependencies has changed.
One possible cause is that the
Changed Time
field shown bystat
indicates that the 'inode-changed time' reflects the moment that the dockerCOPY
occurred, not the moment the file was modified. Docker copies acrossatime
andmtime
, but thectime
cannot be changed without unmounting the filesystem and poking the bits in the filesystem image. I took a brief look atstack
's source code to see if the 'inode-changed time' was leaking in, but I could not quickly find it.One possible resolution to this issue would be to stop using the 'inode-changed time' as a factor for build cache invalidation. If possible, keying the validation on file contents would a be better alternative in this regard.
Steps to reproduce
Here is a
Dockerfile
which reproduces the problem. Unfortunately I haven't yet been able to make a minimal example project yet - I will update the issue if I get time to do so.If I fail to update the issue with a reproducer, and no one else encounters the problem, I won't be offended if the issue is closed.
Stack yaml:
Expected
When I rebuild the Dockerfile described above, only projects which are changed (or whose dependencies are changed) get rebuilt.
So for example, below, I would expect 'project' to be rebuilt if a file within 'project' is changed, but not 'subproject1' to be rebuilt in the same case. It should only be updated if something in subproject1 changes.
Actual
Stack version
Method of installation
The text was updated successfully, but these errors were encountered: