-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eval cache: fix cache regressions #11086
Conversation
I too have noticed this issue, thank you so much for opening this PR! |
src/libflake/flake/flake.cc
Outdated
auto fingerprint = flake.lockedRef.input.getFingerprint(store) | ||
// Fingerprints are null for relative path references so we use | ||
// the flake path as a fallback to avoid losing caching. | ||
.value_or(flake.path.to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is either safe by accident or unsafe, because this undocumented 💢 to_string()
is calling methods that are for display and not guaranteed to be a functional key for the sources it represents.
However, I can imagine flake.path
here to be a store path, in which case we could use it after making sure that it is one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's intentional because computing a fingerprint for path flakes requires reading the entire flake.
That seems to apply here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this to_string()
is exactly how fingerprints used to be calculated 😅
I will remove this part of the PR but it definitely feels like a footgun that relative path expression caching silently fails. I would like to make a separate PR that either:
- Prints a warning if
nix develop
, etc. is called with a relative path expression. - Tries to resolves the path from CWD in the
getFingerprint
function.
Intuitively #2 sounds like less of a regression but I would appreciate guidance on which of the two options (or perhaps a different, third one) is the best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just about relative paths. We can't refer to anything mutable in the fingerprint, because this is the "always correct" kind of cache; not the kind whose entries expire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some time spent learning about Nix's current eval caching behavior and looking at lazy trees later, I understand why my original suggestion doesn't make sense. Thanks for your patience. I also deleted these changes from the PR and it's ready for another look :)
I believe a much more advanced evaluation cache could use exclusively the things that have been read as keys, but that would be more like a research project. This would make it not just capable of caching path: inputs, but also for it to be effective at caching dev shells, which tends to suffer from small changes in unrelated parts of the source.
IMO a good caching system for local flakes is very important. Relatively small flakes can take 10+ seconds to evaluate on modern hardware and this is a big barrier to relying on flakes for dev shells. I would love to learn more about what existing solutions to this are. In the meantime I can use a Bash wrapper that naively caches nix print-dev-env
outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the eval cache is not being used at all for commands such as nix develop path:.
That's intentional because computing a fingerprint for path flakes requires reading the entire flake. As part of (slowly) merging lazy trees, we're getting rid of that kind of behavior. The fact that we're currently still copying them to the Nix store should not be relied on, since that will go away.
The other parts of this PR look good.
@@ -697,6 +697,10 @@ struct CmdDevelop : Common, MixEnvironment | |||
} | |||
} | |||
|
|||
// Release our references to eval caches to ensure they are persisted to disk, because | |||
// we are about to exec out of this process without running C++ destructors. | |||
getEvalState()->evalCaches.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should also do this in nix shell
, nix run
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to all commands.
i tried to come up with a good abstraction but couldn't come up with anything that didn't add a dependency from libcmd
to nix
, so i just went with the copypasta route. i also renamed runProgramInStore
to execProgramInStore
to better highlight the nature of the function.
I believe a much more advanced evaluation cache could use exclusively the things that have been read as keys, but that would be more like a research project. This would make it not just capable of caching |
- Fix eval cache not being persisted in `nix develop` (since NixOS#10570) - Don't attempt to commit cache transaction if there is no active transaction, which will spew errors in edge cases - Drive-by: trivial typo fix
5f36230
to
e764ed3
Compare
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2024-07-17-nix-team-meeting-minutes-162/49255/1 |
Successfully created backport PR for |
Motivation
Few lines of bugfixes for the eval cache.
Context
Regression in Share evaluation caches across installables #10570: the eval cache is not being persisted in
nix develop
becauseevalCaches
retains references to the caches and is never freed.No longer including in this PR:
Regression in LockedFlake::getFingerprint(): Use Input::getFingerprint() #10149 (see also: PathInputScheme::getFingerprint(): Don't barf on relative paths #10176): the eval cache is not being used at all for commands such asnix develop path:.
becausegetFingerprint
returns null for relative paths. Before the regression, the/nix/...
path was being included in the hash — I brought this back as a fallback.Nix can sometimes try to commit the eval cache sqlite transaction without there being an active transaction, which fails and prints errors that should be non-errors.
Priorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.