-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent evaluation cache primop #6228
Comments
It's only safe to persistently memoize functions whose closure does not have free variables, so I believe this would only be possible in a Flakes (if hermetic and without --impure) world
Here is a counter example:
if I changed /data/alejandra/default.nix after caching the result, the cached result would become invalid, so we would not be able to cache this persistently, only during this evaluation |
@kamadorueda That would indeed have been a loophole. Thanks for pointing that out. |
Yes and no. The bottleneck isn’t the use of flakes, but the fact that the cache is tied to the CLI. If the flake was really per-flake, then that would mean that each call to Now this isn’t trivial because it means hooking the cache system into the evaluator (and do so without having a too negative impact on uncached evaluations). I’ve given it a try in #4511, but had to give up because the implementation was unbearably slow. |
@thufschmitt Did you consider adding a new type of thunk instead of modifying regular attr access? That way the performance overhead only applies to cacheable values and not to intermediate values or the values produced by functions.
Are you referring to the ~10% cold cache overhead? Doesn't seem so bad and I think it could be improved with the thunk approach. I would expect the |
For flakes it may not be, but restricting the cache to flake boundaries is unnecessary and makes the caching of other functions harder, such as custom nixpkgs invocations or improvements to the structure of flakes, especially the introduction of a function that takes |
I did, yes. I don’t remember precisely why I didn’t go that way eventually, but I’m fairly confident that it wasn’t making things any faster (The extra word in
Well, maybe, but I’ve yet to see a semantics that would allow for a guaranteed-correct caching and be significantly more flexible than using flakes as the boundaries |
A new type of thunk only adds a branch to
I think I'm pretty close, but I can't make a confident claim without more research. To summarize: switch temporarily to a mode that enforces referential transparency, throwing an exception on addToStore(mutablePath), etc. The exception then triggers regular evaluation.
It must be similar in terms of what is allowed inside the cached function without resorting to normal evaluation. It can't really work without referential transparency. Users may have their own use cases for caching, perhaps nixos configurations in a multi-node deployment, and flakes can use this for niche architecture support as mentioned before. It wouldn't be nice to have to add custom caching code for that. I'll stop responding for a while because of priorities; sorry. If anything, the thunk approach seems worth a shot even with flake boundary caching. |
random thoughts: This is already pretty close via?
The key proposal here is to make the capability flake-oblivious. If you squint a bit the "inputs" in a flake is the
It's not the getFlake call, but the evaluation of its attributes that need to be cached. Does this imply something like?
that isn't too far off from the proposed
where it is specifically the attrPath traversal that is cached, in the same way the current eval caching works: lazy determination of what attrs are available and their values. I think |
This code seems to be about in-memory caching of ast and root
💯
That is correct. Specifically, we want most referentially transparent code to be cached and non-referentially transparent code not to be cached.
This is in the context of @thufschmitt's PR where he implemented a cache per getFlake call that returned special attrsets that query the cache transparently. So your example
... is actually called transparently on attribute access.
This is a good thing. It is more powerful than what the current flake format needs, but the current flake format isn't good, as it creates problems for niche If this caching primop is implemented, changing the flake format amounts to changing the
I believe this concern is addressed by the special attrsets. Those make the |
I was not aware of the special attrsets. Is caching then limited to only that behavior? My intent was to limit caching only to attribute path walking. This is still more powerful than flakes due to decoupling the |
In his implementation, yes. I think we'll want to cache lists too, for (*) value as in actual value, not thunk. |
Somewhat related: https://github.com/DavHau/nix-eval-cache |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/get-flake-input-store-path/20202/10 |
Data Point: In a scenario where a CLI interacts with a dirty flake for fetching metadata out of the repo, flake-level caching is not enough * and a caching primop seems to make the experience bearable. * iirc, UX research gives us 200ms max. |
cc @Gabriella439 maybe she has some good input on memoization in a lazy config language. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/what-would-you-like-to-see-improved-in-nix-cli-experience/24012/9 |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: |
Just noticed that I haven't linked this before, but I did a brainstorming about this here: https://github.com/tweag/epcb It's very drafty right now, but I'd like to work on this more, maybe with a PoC |
Is your feature request related to a problem? Please describe.
The flake-based evaluation cache has too coarse granularity to be useful for caching during development, updates, etc.
Having a more powerful solution will also benefit flakes, as it becomes feasible to use between flakes, not just at the cli boundary.
A language-integration cache may also allow easier experimentation than a cache that is tied to cli concepts.
Describe the solution you'd like
A new primop that performs persistent caching of any Nix function.
Fully general or transparent caching (memoization) of functions is too hard. We need to strike a balance by restricting the cache so that we avoid the hard problems.
My suggestion is to memoize roughly* this function:
ie import from store path, then apply. The type is something like
By using a primop, we avoid the hard problem of having to determine what to cache.
By restricting the arguments to serializable data, such as a store path and json-serializable data, we avoid the serialization of functions.
By restricting cached-eval-thunk to serializable data, we can again avoid serialization of functions.
A cached-eval-thunk references the
f
andj
arguments, and contains a path. The path is a list of attribute names and/or list indexes.It also contains a regular thunk representing the real thunk
import f j
.When the thunk is forced, it looks up the value in the cache db. If no entry is available, it performs looks up the result in the real thunk and stores the result in the cache db. If a result exists, it the thunk turns into a primitive or into an attrset or list of cached-eval-thunks.
*: for all of this to work, the expression in
f
must not be allowed to perform any action that wouldn't be referentially transparent. This can be done by performing the evaluation of the real thunk in an extra pure mode that forbidsreadFile
,addToStore
(except on paths already in the store), etc.Describe alternatives you've considered
import f j
seems powerful enough, but maybe another function is more efficient?We'll generate a store path for each cached function, but don't think that's a problem, because the primop should be used with care anyway. Db lookups aren't free. A non-storepath representation of functions without free variables may also be suitable, but more work.
Additional context
I swear that I wrote about this before, but I couldn't find it in the issue tracker 😕
The text was updated successfully, but these errors were encountered: