Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent evaluation cache primop #6228

Open
roberth opened this issue Mar 10, 2022 · 18 comments
Open

Persistent evaluation cache primop #6228

roberth opened this issue Mar 10, 2022 · 18 comments

Comments

@roberth
Copy link
Member

roberth commented Mar 10, 2022

Is your feature request related to a problem? Please describe.

The flake-based evaluation cache has too coarse granularity to be useful for caching during development, updates, etc.

Having a more powerful solution will also benefit flakes, as it becomes feasible to use between flakes, not just at the cli boundary.
A language-integration cache may also allow easier experimentation than a cache that is tied to cli concepts.

Describe the solution you'd like

A new primop that performs persistent caching of any Nix function.
Fully general or transparent caching (memoization) of functions is too hard. We need to strike a balance by restricting the cache so that we avoid the hard problems.
My suggestion is to memoize roughly* this function:

f: j: import f j

ie import from store path, then apply. The type is something like

path -> json -> cached-eval-thunk

By using a primop, we avoid the hard problem of having to determine what to cache.
By restricting the arguments to serializable data, such as a store path and json-serializable data, we avoid the serialization of functions.
By restricting cached-eval-thunk to serializable data, we can again avoid serialization of functions.

A cached-eval-thunk references the f and j arguments, and contains a path. The path is a list of attribute names and/or list indexes.
It also contains a regular thunk representing the real thunk import f j.
When the thunk is forced, it looks up the value in the cache db. If no entry is available, it performs looks up the result in the real thunk and stores the result in the cache db. If a result exists, it the thunk turns into a primitive or into an attrset or list of cached-eval-thunks.

*: for all of this to work, the expression in f must not be allowed to perform any action that wouldn't be referentially transparent. This can be done by performing the evaluation of the real thunk in an extra pure mode that forbids readFile, addToStore (except on paths already in the store), etc.

Describe alternatives you've considered

import f j seems powerful enough, but maybe another function is more efficient?

We'll generate a store path for each cached function, but don't think that's a problem, because the primop should be used with care anyway. Db lookups aren't free. A non-storepath representation of functions without free variables may also be suitable, but more work.

Additional context

I swear that I wrote about this before, but I couldn't find it in the issue tracker 😕

@kamadorueda
Copy link
Member

It's only safe to persistently memoize functions whose closure does not have free variables, so I believe this would only be possible in a Flakes (if hermetic and without --impure) world

import f j

Here is a counter example:

nix-repl> json = { }

nix-repl> storePath = builtins.toFile "file" "args: \"\${/data/alejandra/default.nix}\""

nix-repl> builtins.readFile storePath
"args: \"${/data/alejandra/default.nix}\""

nix-repl> import storePath json
"/nix/store/qy5kmb8xf7brinv2sznik7ng3dr8nk4q-default.nix"

if I changed /data/alejandra/default.nix after caching the result, the cached result would become invalid, so we would not be able to cache this persistently, only during this evaluation

@roberth
Copy link
Member Author

roberth commented Mar 10, 2022

@kamadorueda That would indeed have been a loophole. Thanks for pointing that out.
I still think it's feasible to counteract this by evaluating the cached function in an even stricter pure mode; perhaps "referentially transparent mode". This mode only needs to be active while computing cached values. When the cached-eval-thunk's forceValue is done, the EvalState could switch back to its normal state (pure/restricted/impure). I've updated the text a little.

@thufschmitt
Copy link
Member

The flake-based evaluation cache has too coarse granularity to be useful for caching during development, updates, etc.

Yes and no. The bottleneck isn’t the use of flakes, but the fact that the cache is tied to the CLI. If the flake was really per-flake, then that would mean that each call to builtins.geFlake (or each flake you depend on) would be cached independently, which would solve 90% of the problem.
To take the most common use-case, I’m fairly confident that the vast majority of the eval time for a standard flake is spent evaluating the packages imported from the nixpkgs flake. So if this had its own cache independent from the toplevel flake, then re-evaluating the flake would be nearly free.

Now this isn’t trivial because it means hooking the cache system into the evaluator (and do so without having a too negative impact on uncached evaluations). I’ve given it a try in #4511, but had to give up because the implementation was unbearably slow.

@roberth
Copy link
Member Author

roberth commented Mar 11, 2022

@thufschmitt Did you consider adding a new type of thunk instead of modifying regular attr access? That way the performance overhead only applies to cacheable values and not to intermediate values or the values produced by functions.

unbearably slow.

Are you referring to the ~10% cold cache overhead? Doesn't seem so bad and I think it could be improved with the thunk approach. I would expect the noCache case to reach 0% as the thunk based approach doesn't seem to affect existing Value subclasses. Optimistically, this could halve the cold cache overhead, assuming old cold cache = old no cache + new cold cache.

@roberth
Copy link
Member Author

roberth commented Mar 11, 2022

The bottleneck isn’t the use of flakes,

For flakes it may not be, but restricting the cache to flake boundaries is unnecessary and makes the caching of other functions harder, such as custom nixpkgs invocations or improvements to the structure of flakes, especially the introduction of a function that takes system as in input, which we ought to do for niche architecture support.

@thufschmitt
Copy link
Member

@thufschmitt Did you consider adding a new type of thunk instead of modifying regular attr access? That way the performance overhead only applies to cacheable values and not to intermediate values or the values produced by functions.

I did, yes. I don’t remember precisely why I didn’t go that way eventually, but I’m fairly confident that it wasn’t making things any faster (The extra word in Attrs is totally acceptable given that an Attrs will generally be big-ish, and the cost of a no-op lookup is also very small compared to any operation you’ll want to perform on the attr). Tbh I think that the slowdown wasn’t really due to anything fundamental, it’s more that I had to refactor things a bit to make the implementation decent, and that this refactoring made things slower.

For flakes it may not be, but restricting the cache to flake boundaries is unnecessary and makes the caching of other functions harder, such as custom nixpkgs invocations or improvements to the structure of flakes, especially the introduction of a function that takes system as in input, which we ought to do for niche architecture support.

Well, maybe, but I’ve yet to see a semantics that would allow for a guaranteed-correct caching and be significantly more flexible than using flakes as the boundaries

@roberth
Copy link
Member Author

roberth commented Mar 11, 2022

it’s more that I had to refactor things a bit to make the implementation decent, and that this refactoring made things slower.

A new type of thunk only adds a branch to forceValue and a few switch cases. Seems more benign to me.


a semantics that would allow for a guaranteed-correct caching

I think I'm pretty close, but I can't make a confident claim without more research. To summarize: switch temporarily to a mode that enforces referential transparency, throwing an exception on addToStore(mutablePath), etc. The exception then triggers regular evaluation.

and be significantly more flexible than using flakes as the boundaries

It must be similar in terms of what is allowed inside the cached function without resorting to normal evaluation. It can't really work without referential transparency.

Users may have their own use cases for caching, perhaps nixos configurations in a multi-node deployment, and flakes can use this for niche architecture support as mentioned before. It wouldn't be nice to have to add custom caching code for that.
A more general caching feature supports these things naturally.


I'll stop responding for a while because of priorities; sorry.

If anything, the thunk approach seems worth a shot even with flake boundary caching.

@tomberek
Copy link
Contributor

random thoughts:

This is already pretty close via?

The key proposal here is to make the capability flake-oblivious. If you squint a bit the "inputs" in a flake is the j parameter and "outputs" is f. Many of the restrictions and considerations are the same.

If the flake [sic? cache] was really per-flake, then that would mean that each call to builtins.geFlake (or each flake you depend on) would be cached independently, which would solve 90% of the problem.

It's not the getFlake call, but the evaluation of its attributes that need to be cached. Does this imply something like?

builtins.getInstallable("path:./.",["legacyPackages" system "hello"])

that isn't too far off from the proposed

builtins.cache ./some-dir ./trivial-inputs
# or
bulitins.cachedEvalAttr ./some-func.nix ./trivial-inputs attrPath

where it is specifically the attrPath traversal that is cached, in the same way the current eval caching works: lazy determination of what attrs are available and their values.
where ./some-func.nix and ./some-inputs can imply the cache key.

I think import f j is a bit too powerful because f can keep changing and the surrounding context can also access arbitrary parts of the structure. Instead I'm thinking about something like: lib.getAttrByPath (import f j) attrPath where specifically the getAttrByPath recursion specifies what is cached.

@roberth
Copy link
Member Author

roberth commented Mar 15, 2022

This is already pretty close via? [...]

This code seems to be about in-memory caching of ast and root Value. In-memory caching aka memoization is a much easier problem to solve and iirc a solution is already sitting on a branch somewhere. I'd like to have that, because it could replace trie-based solutions I've been using, but that's not what I want to discuss here, as it doesn't help for the caching of dependency expressions (inputs etc).

The key proposal here is to make the capability flake-oblivious. If you squint a bit the "inputs" in a flake is the j parameter and "outputs" is f.

💯

Many of the restrictions and considerations are the same.

That is correct. Specifically, we want most referentially transparent code to be cached and non-referentially transparent code not to be cached.

If the flake [sic? cache] was really per-flake, then that would mean that each call to builtins.geFlake (or each flake you depend on) would be cached independently, which would solve 90% of the problem.

It's not the getFlake call, but the evaluation of its attributes that need to be cached. Does this imply something like?

This is in the context of @thufschmitt's PR where he implemented a cache per getFlake call that returned special attrsets that query the cache transparently. So your example

builtins.getInstallable("path:./.",["legacyPackages" system "hello"])

... is actually called transparently on attribute access.

that isn't too far off from the proposed

builtins.cache ./some-dir ./trivial-inputs
# or
bulitins.cachedEvalAttr ./some-func.nix ./trivial-inputs attrPath

where it is specifically the attrPath traversal that is cached, in the same way the current eval caching works: lazy determination of what attrs are available and their values. where ./some-func.nix and ./some-inputs can imply the cache key.

I think import f j is a bit too powerful because f can keep changing

This is a good thing. It is more powerful than what the current flake format needs, but the current flake format isn't good, as it creates problems for niche systems (see #3843 (comment)).

If this caching primop is implemented, changing the flake format amounts to changing the call-flake.nix expression rather than overhauling a flake-coupled cache in C++.

and the surrounding context can also access arbitrary parts of the structure. Instead I'm thinking about something like: lib.getAttrByPath (import f j) attrPath where specifically the getAttrByPath recursion specifies what is cached.

I believe this concern is addressed by the special attrsets. Those make the getAttrByPath part transparent.

@tomberek
Copy link
Contributor

I believe this concern is addressed by the special attrsets. Those make the getAttrByPath part transparent.

I was not aware of the special attrsets. Is caching then limited to only that behavior? My intent was to limit caching only to attribute path walking. This is still more powerful than flakes due to decoupling the f and j.

@roberth
Copy link
Member Author

roberth commented Mar 15, 2022

Is caching then limited to only that behavior?

In his implementation, yes. I think we'll want to cache lists too, for drv.outputs, but those can follow a similar pattern, although I don't know if their representation permits his approach as easily. A thunk-based implementation would support both data structures in the same way without having to change the actual list or attrset value* representations.

(*) value as in actual value, not thunk. Value should have been called Object, because a thunk is arguably not a value, but still a heap object.

@tomberek
Copy link
Contributor

tomberek commented Apr 6, 2022

Somewhat related: https://github.com/DavHau/nix-eval-cache

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/get-flake-input-store-path/20202/10

@blaggacao
Copy link
Contributor

blaggacao commented Sep 11, 2022

Data Point:

In a scenario where a CLI interacts with a dirty flake for fetching metadata out of the repo, flake-level caching is not enough * and a caching primop seems to make the experience bearable.

* iirc, UX research gives us 200ms max.

@Profpatsch
Copy link
Member

cc @Gabriella439 maybe she has some good input on memoization in a lazy config language.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/what-would-you-like-to-see-improved-in-nix-cli-experience/24012/9

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/can-i-use-flakes-within-a-git-repo-without-committing-flake-nix/18196/33

@infinisil
Copy link
Member

Just noticed that I haven't linked this before, but I did a brainstorming about this here: https://github.com/tweag/epcb

It's very drafty right now, but I'd like to work on this more, maybe with a PoC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants