Prototype: accumulate attrset updates, perform k-way merge #11290

roberth · 2024-08-13T19:41:14Z

This seems to be slower as of yet. Main point was showing an implementation strategy for an optimization that applies also to other semigroups, like string concatenation and lists.
I didn't originally plan to pursue this, as mentioned, so if you're interested in algorithms and performance optimization you are more than welcome to work on this.

TODO

remove merge pass 2 into 1:
- pass 1: fill UpdateQueue
- pass 2: stats (like max output size)
- pass 3: k-way merge
maybe tracking sortedness, if useful and cheap? in pass 1, building the UpdateQueue
make it actually fast; different algorithm depending on size distribution and number of update ops?
- currently solution is the min-heap or priority queue approach
- divide and conquer (see also Nixpkgs which manually implements this for pkgs/by-name) binary tree-shaped sorted merge operations
  - can we take the size of attrsets into account to balance the work? 1000 + (1 + (1 + 1)) is better than ((1000+1)+1)+1 (where number refers to attrset size)
- iterative pairwise merging is more or less what we had, the tree of updates is flattened. Probably worse.
- perhaps some combination of solutions, depending on a heuristic
split callFunction into two parts, effectively - call2Thunk (create the new scope's Env etc) - eval that thunk (evaluate the body)
implement ExprApply::evalForUpdate - easy after the split
make foldl' (//) work?
optimize listToAttrs and other primops?

Context

Yesterday's meeting (will post notes in a minute)
@tomberek

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

This seems to be slower as of yet. TODO - [ ] remove merge pass 2 into 1: - pass 1: fill UpdateQueue - pass 2: stats (like max output size) - pass 3: k-way merge - [ ] maybe tracking sortedness is useful and cheap? in pass 1, building the UpdateQueue - [ ] make it actually fast; different algorithm depending on size distribution and number of update ops? - currently solution is the min-heap or priority queue approach - divide and conquer (see also Nixpkgs which manually implements this for pkgs/by-name) binary tree-shaped sorted merge operations - can we take the size of attrsets into account to balance the work? 1000 + (1 + (1 + 1)) is better than ((1000+1)+1)+1 (where number refers to attrset size) - iterative pairwise merging is more or less what we had, the tree of updates is flattened. Probably worse. - perhaps some combination of solutions, depending on a heuristic - [ ] split callFunction into two parts, effectively - call2Thunk (create the new scope's Env etc) - eval that thunk (evaluate the body) - [ ] implement ExprApply::evalForUpdate - easy after the split - [ ] make foldl' (//) work? - [ ] optimize listToAttrs and other primops?

roberth · 2024-08-13T19:41:45Z

src/libexpr/eval.cc

+    e1->evalForUpdate(state, env, q);
+    e2->evalForUpdate(state, env, q);


Suggested change

e1->evalForUpdate(state, env, q);

e2->evalForUpdate(state, env, q);

evalForUpdate(state, env, q);

roberth · 2024-08-13T19:42:06Z

src/libexpr/eval.cc

+    eval(state, env, vTmp);
+    // TODO add pos and errorCtx params
+    state.forceAttrs(vTmp, noPos, "while evaluating an attribute set merge operand");


evalAttrs?

roberth · 2024-08-13T19:42:46Z

src/libexpr/nixexpr.hh

+        return pos;
+    }
+
+    virtual void evalForUpdate(EvalState & state, Env & env, UpdateQueue & q) override;


This is the only new code in this declaration. The rest is just an expansion of the MakeBinOp macro.

Should make a MakeBinOpMembers macro that MakeBinOp this can both uses?

nixos-discourse · 2024-08-13T19:49:16Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-08-12-nix-team-meeting-minutes-168/50561/1

roberth commented Aug 13, 2024

View reviewed changes

roberth changed the title ~~WIP: accumulate attrset updates, use k-way merge~~ WIP: accumulate attrset updates, perform k-way merge Aug 13, 2024

roberth changed the title ~~WIP: accumulate attrset updates, perform k-way merge~~ Prototype: accumulate attrset updates, perform k-way merge Aug 14, 2024

roberth mentioned this pull request Nov 15, 2024

builtins.mergeMapAttrs (or builtins.concatMapAttrs) #11887

Open

roberth added language The Nix expression language; parser, interpreter, primops, evaluation, etc performance labels Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype: accumulate attrset updates, perform k-way merge #11290

Prototype: accumulate attrset updates, perform k-way merge #11290

roberth commented Aug 13, 2024

roberth Aug 13, 2024

roberth Aug 13, 2024

roberth Aug 13, 2024

Ericson2314 Aug 14, 2024

nixos-discourse commented Aug 13, 2024

		e1->evalForUpdate(state, env, q);
		e2->evalForUpdate(state, env, q);

	e1->evalForUpdate(state, env, q);
	e2->evalForUpdate(state, env, q);
	evalForUpdate(state, env, q);

Prototype: accumulate attrset updates, perform k-way merge #11290

Are you sure you want to change the base?

Prototype: accumulate attrset updates, perform k-way merge #11290

Conversation

roberth commented Aug 13, 2024

Context

Priorities and Process

roberth Aug 13, 2024

Choose a reason for hiding this comment

roberth Aug 13, 2024

Choose a reason for hiding this comment

roberth Aug 13, 2024

Choose a reason for hiding this comment

Ericson2314 Aug 14, 2024

Choose a reason for hiding this comment

nixos-discourse commented Aug 13, 2024