Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path library design #200718

Closed
wants to merge 4 commits into from
Closed

Path library design #200718

wants to merge 4 commits into from

Conversation

infinisil
Copy link
Member

@infinisil infinisil commented Nov 11, 2022

Description of changes

There is a need for path operations in nixpkgs and third-party tools. This PR drafts a design document for what such a path library should look like.

Also ping @aakropotkin

This work is sponsored by Antithesis

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Nov 11, 2022
@infinisil infinisil requested a review from roberth November 11, 2022 15:57
@roberth
Copy link
Member

roberth commented Nov 11, 2022

For path values, having a single representation was a great design choice. It reduces the number of possibilities that users need to consider. It reduces the intrinsic complexity of the path operations. It reduces the extrinsic complexity by avoiding the possibility of doing something slightly surprising when a path happens to be relative instead of absolute. It simplifies code that deals with path values.

Of course a relative path pops up every now and then, but not to the point where I've felt the need more than single small operations.
By designing a shadow path library that aims to "make relative paths easy", we run the risk of increasing the prevalence of relative paths. This effect makes the whole effort counterproductive when considering that the goal of a library is to simplify its callers' code.

What are the use cases? Do they really need to work with relative paths?


Small braindump:

  • a unified representation of (thing, path) objects. Path-like objects can be relative to /., a store path, a source object (with source combinators: extend, trace, cutAt, focusAt and more #112083). A lazy tree is already such a pair. Relative paths are not that, because the "thing" part is unknown, and because the "path" part can be negative (../) if I may use that term liberally.

@infinisil
Copy link
Member Author

infinisil commented Nov 11, 2022

@roberth Relative paths are needed on a very basic level of just appending path components, e.g.

join [ "/foo/bar" "baz/qux" ]
== "/foo/bar/baz/qux"

should work, the second element is a relative path. And the opposite:

split "/foo/bar"
== [ "/" "foo" "bar" ]

outputs relative paths. Another case is for your source combinators, where you want to pass a subpath via e.g. ./foo/bar, which then gets converted to an absolute path by Nix. To get back the relative path you need to

relativeTo ./. ./foo/bar
== "foo/bar"

That's like your absolutePathComponentsBetween function.

Regarding .., I'm not yet sure whether that should be supported at all (should it throw an error or be ignored?). See the document for arguments for/against it. The safer default would be to throw an error at least for now.

@aakropotkin
Copy link
Contributor

Boy do I have some path routines for you 😁
I'll post my libs in a bit. I expect we can cherry pick and tweak what I have written for compatibility.

@infinisil
Copy link
Member Author

infinisil commented Nov 11, 2022

@aakropotkin I'd rather discuss the design of such a library beforehand, that's the hard part to get right before implementing it :) Please take a look at the document and what you think of it. I'm also very interested in use cases for such a library, let me know about yours! We should have justification for each part of this library

@aakropotkin
Copy link
Contributor

Gotcha. One I'm a fan of for relative paths in cases where I'm loading several files is:

{ lib }: let
  pathrel = {
    basedir = throw "you must set basedir before reading files" ;
    __functor = self: x: let
      p   = toString x;
      abs = if lib.isAbspath p then p else "${self.basedir}/${p}";
  in ( self.__innerFunction or ( y: y ) ) abs;

  readRel = pathrel // {
    __InnerFunction = builtins.readFile;
  };

  importRel = pathRel // {
    __InnerFunction = p:
      if ( builtins.match ".*\\.json" p )
      then builtins.fromJSON ( builtins.readFile p )
      else import p;
  };
in ...

Mostly this cuts down boilerplate, but I also like being able to override basedir later with a simple //.

@roberth
Copy link
Member

roberth commented Nov 12, 2022

@infinisil

@roberth Relative paths are needed on a very basic level of just appending path components, e.g.

join [ "/foo/bar" "baz/qux" ]
== "/foo/bar/baz/qux"

This is not a use case.

should work, the second element is a relative path. And the opposite:

split "/foo/bar"
== [ "/" "foo" "bar" ]

You're putting an absolute path in a list of path components?

Again, this is not a use case. Why would I need this?

outputs relative paths. Another case is for your source combinators, where you want to pass a subpath via e.g. ./foo/bar, which then gets converted to an absolute path by Nix. To get back the relative path you need to

relativeTo ./. ./foo/bar
== "foo/bar"

That's like your absolutePathComponentsBetween function.

Exactly. I didn't need a relative path library for this. Why would you make it stringly typed? That's not something we should encourage.

Regarding .., I'm not yet sure whether that should be supported at all (should it throw an error or be ignored?). See the document for arguments for/against it. The safer default would be to throw an error at least for now.

If we throw it out, path components are all we need, and those are just lists of strings.

Please, please, show me a use case where users need a relative path library. Relative paths are a mess, with tons of corner cases and silly representations. You've shown that very clearly in the design document. By engaging with bad data types or representations, you'll only make code more confusing and error prone.

If everyone jumps off a cliff, that's not a reason for us to jump off a cliff.

@roberth
Copy link
Member

roberth commented Nov 12, 2022

@aakropotkin

Mostly this cuts down boilerplate,

What was the boilerplate?

@roberth
Copy link
Member

roberth commented Nov 12, 2022

Alright, alternative proposition:

Use Nix-provided paths first and foremost. Avoid the concept of relative paths, because it comes with a lot of baggage that we'd rather avoid. Instead, work with path differences represented as lists of path component strings. Prefer to keep this as an internal representation only, returning Nix paths instead. This way we don't create a split between functions that take paths as usual and functions that take relative paths, and we aren't tempted to do add a ton of dynamic type matching and half-baked implicit conversion code to each and every filesystem related function. (We already have half a ton of this in lib.sources, and the interactions between every path-like representation will make this quadratically worse)

@aakropotkin
Copy link
Contributor

aakropotkin commented Nov 12, 2022

@aakropotkin

Mostly this cuts down boilerplate,

What was the boilerplate?

In practice I usually pack a number of processing routines in the base functor. For example simplifying ././../foo/../bar -> ../bar. I also have markers in the base functor for "is this subtree locked?" and "make sure nobody uses .. above basedir". I pack an instance of filter = to be used to pre-process dirs in some cases.

Having that stuff centered around a common structure was helpful to me.

Having said that those things might not be useful in Nixpkgs.

@roberth
Copy link
Member

roberth commented Nov 12, 2022

@aakropotkin You're talking about your functions, but you didn't show me any boilerplate. Should I assume that you just use the term very liberally?

For example simplifying ././../foo/../bar -> ../bar.

These are already equivalent path values. Why would they be strings?

markers in the base functor for "is this subtree locked?"

What does this mean?

"make sure nobody uses .. above basedir"

I have a well tested pathHasPrefix function for this in #112083. It does not require relative paths.

pre-process dirs

What kind of preprocessing?

@infinisil
Copy link
Member Author

join [ "/foo/bar" "baz/qux" ]
== "/foo/bar/baz/qux"

This is not a use case.

While "joining a path" is not a use-case on its own, such an operation is needed for a lot of things. These are ones I can think of right now:

  • Selecting a subdirectory to build, sourceRoot
  • Getting NixOS modules, modulesPath + "/profiles/minimal.nix"
  • Getting a specific output directory, someDrv + "/etc/systemd/system"

Note that such a join operation is essentially just + but with normalisation.

split "/foo/bar"
== [ "/" "foo" "bar" ]

You're putting an absolute path in a list of path components?

This is mainly for split to be inversible and the split of a normalised path to be unique. We could have an anchor and components function that return / and [ "foo" "bar" ] respectively.

Again, this is not a use case. Why would I need this?

It's not a use case on its own no, but it's mainly there to allow people to implement their own path functions without having to use strings for that. Ideally the path library provides everything that's needed though. Can't think of a use case right now though. The design document talks about perhaps making this function internal because of that.

That's like your absolutePathComponentsBetween function.

Exactly. I didn't need a relative path library for this. Why would you make it stringly typed? That's not something we should encourage.

Even if it's represented as a list, it's still a relative path. This is definitely a use case. Note that in your source combinators PR there's a "manual" lib.concatStringsSep "/" to undo the list representation of the relative path into a string.

@infinisil
Copy link
Member Author

infinisil commented Nov 14, 2022

Edit for clarity: This comment is about whether Nix paths or strings should be used for a path library. It's not related to the above arguments about support for relative paths.

Use Nix-provided paths first and foremost.

Here's some arguments against Nix paths (should be added to the document for sure):

  • They get imported into the store when interpolated
    • Notably unfiltered, which is often not what is wanted
    • Only makes sense for eval-time paths, so the library would be problematic for buildtime/runtime paths
      • Poor error messages when accidentally using interpolation with buildtime/runtime paths (because they might not exist)
      • And if the path does exist there's no error, which is never intended for buildtime/runtime paths
        • Could even be a security problem with secrets being accidentally imported
  • They can't be used with paths containing string context, so any path referring to derivations
  • They normalise /foo/bar/../baz to /foo/baz, which may be wrong when bar is a symlink
    • Additionally note that builtins accepting paths do resolve .. correctly when passed strings
  • They normalise away trailing /, which is not a problem in of itself, but when doing /some/path + "/" + "foo", it turns to /some/pathfoo, which is not right

Here's the arguments I can think of for Nix paths:

  • Implement normalisation, so we don't have to (though it messes with ..)
  • It's faster than string processing, because it's builtin

@roberth
Copy link
Member

roberth commented Nov 14, 2022

This is not a use case.

While "joining a path" is not a use-case on its own, such an operation is needed for a lot of things. These are ones I can think of right now:

  • Selecting a subdirectory to build, sourceRoot

Already supported without a relative path library.

  • Getting NixOS modules, modulesPath + "/profiles/minimal.nix"

  • Getting a specific output directory, someDrv + "/etc/systemd/system"

+ works fine for these.

Note that such a join operation is essentially just + but with normalisation.

No, because that would let you add an absolute path after a relative path, which is not a valid operation, but with the air of a legitimate operation because we have a function for it, obscuring the nastiness that's going on. You could forbid paths starting with / on the right hand side, but that'd probably cause more pain than it solves, especially as we're already used to + "/foo".

Use Nix-provided paths first and foremost.

Here's some arguments against Nix paths (should be added to the document for sure):

It's the devil we know.

  • They get imported into the store when interpolated

Useful. Where it's not desirable, you're not going to fix that with a relative path library.

  • Notably unfiltered, which is often not what is wanted

Often not a big deal, especially since flakes filter out at least the minimum, which is the gitignored stuff. A relative path library does not fix this user error.

  • Only makes sense for eval-time paths, so the library would be problematic for buildtime/runtime paths

Ok, show me a real world example.

* Poor error messages when accidentally using interpolation with buildtime/runtime paths (because they might not exist)

I don't think I've encountered these. Could you explain?

  * Could even be a security problem with secrets being accidentally imported

Not a problem that a relative path library solves.

  • They can't be used with paths containing string context, so any path referring to derivations

Where would you use these? In IFD? Does it get complicated enough to warrant a library?

  • They normalise /foo/bar/../baz to /foo/baz, which may be wrong when bar is a symlink

Or it may be correct. The behavior you call correct is still very surprising, even if POSIX(?) suggests it. Most people expect to operate on file system trees, not graphs. In a tree (no symlinks), the normalisation always holds up.

  • Additionally note that builtins accepting paths do resolve .. correctly when passed strings

Relative paths would be strings, according to your proposal, so this is a reason not to implement a relative path library.

  • They normalise away trailing /, which is not a problem in of itself, but when doing /some/path + "/" + "foo", it turns to /some/pathfoo, which is not right

This is unfortunate, but with a relative path library, you'll have the opposite problem that users may append a redundant slash.

Created NixOS/nix#7301

Here's the arguments I can think of for Nix paths:

  • Implement normalisation, so we don't have to (though it messes with ..)
  • It's faster than string processing, because it's builtin
  • Fewer types means fewer incompatible functions

@infinisil
Copy link
Member Author

infinisil commented Nov 14, 2022

@roberth Please note that my arguments in this comment are not related to the discussion of supporting relative paths. This is about a path library in general, irrespective of whether relative paths are supported in any way. I'm not replying to all your arguments where that's the counter.

Note that such a join operation is essentially just + but with normalisation.

No, because that would let you add an absolute path after a relative path, which is not a valid operation, but with the air of a legitimate operation because we have a function for it, obscuring the nastiness that's going on. You could forbid paths starting with / on the right hand side, but that'd probably cause more pain than it solves, especially as we're already used to + "/foo".

Yes I simplified it.. The join described in the design document always adds a / between and only allows the first argument to be an absolute path. We should err on the side of caution.

Often not a big deal, especially since flakes filter out at least the minimum, which is the gitignored stuff.

Nix can't assume Flakes is used, but yes they do make this a bit better.

  • Poor error messages when accidentally using interpolation with buildtime/runtime paths (because they might not exist)

I don't think I've encountered these. Could you explain?

nix-repl> "${/var/lib/someService}"
error: getting status of '/var/lib/someService': No such file or directory

This would happen if a path is accidentally interpolated when not meant to

  • They can't be used with paths containing string context, so any path referring to derivations

Where would you use these? In IFD? Does it get complicated enough to warrant a library?

No just a simple join [ pkgs.hello "bin/hello" ], wouldn't work if this was converted to a Nix path. This is a generic path library, it shouldn't be limited to just "eval-time paths you're intending to import"

  • They normalise /foo/bar/../baz to /foo/baz, which may be wrong when bar is a symlink

Or it may be correct. The behavior you call correct is still very surprising, even if POSIX(?) suggests it. Most people expect to operate on file system trees, not graphs. In a tree (no symlinks), the normalisation always holds up.

Or it may be correct, but the Nix path type has no business assuming it doesn't matter. Yes, realpath supports two modes, --logical (that's what Nix does), or --physical (the default). Strong disagree that --logical is what people expect. If anything it should be --physical because it's the default. But in any case, it's really not up for the Nix path type to determine that. The code reading from that path should determine that.

  • Fewer types means fewer incompatible functions

Such as?

@roberth
Copy link
Member

roberth commented Nov 14, 2022

@roberth Please note that my arguments in this comment are not related to the discussion of supporting relative paths. This is about a path library in general, irrespective of whether relative paths are supported in any way. I'm not replying to all your arguments where that's the counter.

Yeah, we're discussing multiple things at once. Maybe I was too focused on relative paths.

Yes I simplified it.. The join described in the design document always adds a / between and only allows the first argument to be an absolute path. We should err on the side of caution.

That seems like a sensible function then.

Often not a big deal, especially since flakes filter out at least the minimum, which is the gitignored stuff.

Nix can't assume Flakes is used, but yes they do make this a bit better.

True. I don't think there's much we can do on the library side to improve this unless we reject Nix paths altogether, turn them into a library implementation detail and forbid path expressions in user code. This seems excessive.

  • Poor error messages when accidentally using interpolation with buildtime/runtime paths (because they might not exist)

I don't think I've encountered these. Could you explain?

nix-repl> "${/var/lib/someService}"
error: getting status of '/var/lib/someService': No such file or directory

Seems fixable: NixOS/nix#7303

This would happen if a path is accidentally interpolated when not meant to

  • They can't be used with paths containing string context, so any path referring to derivations

Where would you use these? In IFD? Does it get complicated enough to warrant a library?

No just a simple join [ pkgs.hello "bin/hello" ], wouldn't work if this was converted to a Nix path.

Build outputs can not be converted to path values. This would tend to cause IFD for no good reason.

This is a generic path library, it shouldn't be limited to just "eval-time paths you're intending to import"

I'm ok with a path join function, but I am not convinced that we need an extensive path library.

  • They normalise /foo/bar/../baz to /foo/baz, which may be wrong when bar is a symlink

Or it may be correct. The behavior you call correct is still very surprising, even if POSIX(?) suggests it. Most people expect to operate on file system trees, not graphs. In a tree (no symlinks), the normalisation always holds up.

Or it may be correct, but the Nix path type has no business assuming it doesn't matter. Yes, realpath supports two modes, --logical (that's what Nix does), or --physical (the default). Strong disagree that --logical is what people expect. If anything it should be --physical because it's the default. But in any case, it's really not up for the Nix path type to determine that. The code reading from that path should determine that.

So symlinks are a leaky abstraction. This is more reason to avoid over-abstraction in the form of a path library.

  • Fewer types means fewer incompatible functions

Such as?

This is a general observation. If your (total) functions support one type, all functions can be combined arbitrarily. This is useful. Often this is not quite achievable, but an approximation of this ideal is already pleasant to work with. This is the "algebraic" or "combinator" flavor of libraries. On the other end of the spectrum, you have libraries with many types, and few functions that operate on each particular type. For a library that is about a "single" type, paths, the latter would be a disaster.

A concrete example would be src in stdenv.mkDerivation. It supports strings with context, or paths. If we add a third type of path, support would have to be added for this new type of path. If it's a representation of relative paths, it is not even possible to reasonably implement it, because it doesn't contain enough information. In other cases, support for a new type of path may simple not have been added yet, and the extra code complexity imposed by matching the expectations around the new path type is far from ideal. It's also possible for such functions to consume the new path type in a wrong or inconsistent way that does not match other interpretations of the new path type.

@roberth
Copy link
Member

roberth commented Nov 14, 2022

  • only one new function liability (lib.path.join or something) which seems manageable
  • two suggested Nix improvements

I like the progress. If we keep this up, we can solve our problems without creating a mess for ourselves.

@infinisil
Copy link
Member Author

infinisil commented Nov 14, 2022

Build outputs can not be converted to path values. This would tend to cause IFD for no good reason.

There's no problem with join [ hello "bin/hello" ] if a string is returned. And there's no reason why that would cause IFD, path operations don't cause IFD. Only calling builtins that read from the filesystem can cause IFD, irrespective of whether you pass them Nix paths or strings.

I'm ok with a path join function, but I am not convinced that we need an extensive path library.

This proposal is actually kept fairly minimal imo, which functions do you think are unnecessary?

So symlinks are a leaky abstraction. This is more reason to avoid over-abstraction in the form of a path library.

Perhaps symlinks are leaky, but people are still using them and expect them to work in a certain way. There's no abstraction introduced by this current proposal, it's really just the same absolute and relative paths strings used all throughout Linux, so not sure what you mean by over-abstracted.

A concrete example would be src in stdenv.mkDerivation. It supports strings with context, or paths. If we add a third type of path, support would have to be added for this new type of path. If it's a representation of relative paths, it is not even possible to reasonably implement it, because it doesn't contain enough information. In other cases, support for a new type of path may simple not have been added yet, and the extra code complexity imposed by matching the expectations around the new path type is far from ideal. It's also possible for such functions to consume the new path type in a wrong or inconsistent way that does not match other interpretations of the new path type.

That is an argument. Specifically something like mkDerivation { src = join [ ./. "src" ]; } wouldn't work, which is unexpected if you don't know that join returns a string that isn't imported into the store. A mild counter-argument is that in most cases you don't want to import a directory unfiltered and without a constant store path name. If you use lib.sources.* this all isn't a problem since it explicitly imports the path into the store using builtins.path.

This does give me an idea though: How about making it such that this path library returns the same data type as it gets as an input. So

  • join [ ./. "src" ] == ./src
  • join [ "/var/lib" "someService" ] == "/var/lib/someService"

Then src = join [ ./. "src" ] would work again without extra effort and as expected. Similarly src = join [ "/var/lib" "someService" ] wouldn't work as expected (you're just passing a string, just like src = "/var/lib/someService").

@infinisil
Copy link
Member Author

I almost forgot about this, since recently, Nix supports paths with interpolation:

nix-repl> /foo/${"bar"}                  
/foo/bar

Only works when starting with a / though, so if you want to append to an absolute path in a variable:

nix-repl> foo = /base

nix-repl> /${foo}/bar 
/base/bar

Not sure about this, it's not very pretty, but a / path operator also has problems (I left a comment). An explicit function for this like join might be better after all, and it wouldn't have the problem you described with the proposal from my previous comment):

join [ /foo "bar" ] -> /foo/bar
# We could also do
join /foo "bar"

@aameen-tulip
Copy link

One example of boilerplate is this example. Here I have wrappers around any import and read* routines to allow emulating pure and no IFD modes even if Nix is running with --impure.

  coerceJSON' = { pure ? lib.inPureEvalMode, ifd ? true }: x: let
    isPjs = ( builtins.isAttrs x ) &&
            ( ! ( ( x ? outPath ) || ( x ? __toString ) ) );
    p'   = lib.coercePath x;
    pjs  = lib.importJSON p;
    msgD = "coercePjs: Cannot read path '${p'}' when `ifd' is disable.";
    forD = if ifd then pjs else throw msgD;
    msgP = "coercePjs: Cannot read unlocked path '${p'}' in pure mode.";
    forP = if ( ! pure ) || ( lib.isStorePath p' ) then pjs else throw msgP;
  in if isPjs then x else
     if ( lib.isDerivation x ) then forD else forP;

Nothing too special, but this would be tedious to write in multiple routines, so often times I was wrapping things in a path construct.

Again though, probably not useful in Nixpkgs, just wanted to provide an example of what I was talking about.

@infinisil
Copy link
Member Author

infinisil commented Nov 23, 2022

The current path problems related to commitIdFromGitRepo in #199772 (and its follow-ups, #201779, #202370 and some related issues) are a really good example of why a string-based path library (not doing any store imports) would be very useful.

Looking at commitIdFromGitRepo, these functions from the current proposal could be used:

  • join
  • normalise
  • isAbsolute
  • relativeTo

But I'm also just realizing that lazy trees in its current state essentially breaks such a string-based path library, since it makes it such that toString doesn't just return the string version of a path anymore:

nix run github:edolstra/nix/lazy-trees -- repl
Welcome to Nix 2.12.0pre20221116_561440b. Type :? for help.

nix-repl> normalPath = ./.

nix-repl> normalPath
/home/tweagysil/src/nix

nix-repl> toString normalPath
"/home/tweagysil/src/nix"

nix-repl> lazyPath = (builtins.fetchTree { type = "git"; url = ./.; }).outPath

nix-repl> lazyPath
/home/tweagysil/src/nix/

nix-repl> toString lazyPath
"/nix/store/virtual000000000000000000000001c-source"

This notably also breaks the normalization function you're using in the source combinators PR:

nix-repl> toString (/. + lazyPath)
"/nix/store/virtual0000000000000000000000004-source"

Why is it broken? Because the lazy trees PR deprecates anything related to the /nix/store/virtual* paths. You can't read from it directly:

nix-repl> builtins.readDir (toString lazyPath)
error: cannot decode virtual path '/nix/store/virtual0000000000000000000000006-source'

And while you seem to be able to access a subdirectory, it's deprecated:

nix-repl> builtins.readDir (toString lazyPath + "/doc")
warning: applying 'toString' to path '/home/tweagysil/src/nix/doc' and then accessing it is deprecated, at «string»:1:1
{ manual = "directory"; }

(Oh also the error is wrong, I'm applying toString to /home/tweagysil/src/nix, not the doc subdirectory)

I can't see why this should be deprecated though, I'll mention this in the lazy trees PR: NixOS/nix#6530 (comment)

@aakropotkin
Copy link
Contributor

aakropotkin commented Nov 23, 2022

Earlier in the thread I was asked why I always immediately toString ./. I had no idea lazy trees were going to do this to "virtual paths", but the rationale was similar.

I simply didn't trust Nix to keep my path as a regular FS path outside of the store, but if I toString before anything else I could "${mypath}/foo" to my heart's content without being concerned.

I'm pretty glad I went that way now that I see how lazy paths are going to blow up for any "late" attempts to stringize.

EDIT: I read the other PR thread. If I'm understanding it correctly it looks like they are going to flood my log with depreciation warnings for my approach... This is all well and good as long as the brand new replacement they just took out of the oven is well documented, backwards compatible, and bug free 😅. It's been a while since I locked into an old release but I might let the dust settle on this one for a few months. Hopefully I misunderstood the PR.

@roberth
Copy link
Member

roberth commented Nov 23, 2022

The current path problems related to commitIdFromGitRepo in #199772 (and its follow-ups, #201779, #202370 and some related issues) are a really good example of why a string-based path library (not doing any store imports) would be very useful.

No, they're examples of why Nix needs to be backwards compatible.

Looking at commitIdFromGitRepo, these functions from the current proposal could be used:

  • join
  • normalise
  • isAbsolute
  • relativeTo

I do see a couple of uses for join. Could you elaborate on the need for the other functions?

  • isAbsolute

The idea of "absoluteness" is only relevant to file system paths, not lazy trees. isRoot could apply to both traditional and lazy tree paths, but changes between instantiated and uninstantiated lazy trees. Perhaps store paths should also be considered "root"?

lazy trees

This introduces a lot of uncertainty.
If you ask me, it must prioritize compatibility over immediate performance improvements, but even if this was the case we have to take it into consideration; not for correctness but for performance.
Without more information about lazy tree usage, performance and compatibility, this design effort is a fool's errand.
I'll talk with Eelco about this.

@infinisil
Copy link
Member Author

The current path problems related to commitIdFromGitRepo in #199772 (and its follow-ups, #201779, #202370 and some related issues) are a really good example of why a string-based path library (not doing any store imports) would be very useful.

No, they're examples of why Nix needs to be backwards compatible.

That too. Something can be an example of multiple issues. The issue of being unsure about which paths get imported into the store being hard to reason about is something that can be seen from these PRs. E.g. this diff causes the result to be a path instead of a string which can cause it to be imported into the store when it didn't previously, which is entirely non-obvious from this code:

-      gitRepo      = "${toString ./..}/.git";
+      gitRepo      = ./.. + "/.git";

I do see a couple of uses for join. Could you elaborate on the need for the other functions?

diff --git a/lib/sources.nix b/lib/sources.nix
index 3ad7dc63355..d8d15a9333b 100644
--- a/lib/sources.nix
+++ b/lib/sources.nix
@@ -185,12 +185,12 @@ let
   # not exported, used for commitIdFromGitRepo
   _commitIdFromGitRepoOrError =
     let readCommitFromFile = file: path:
-        let fileName       = path + "/${file}";
-            packedRefsName = path + "/packed-refs";
+        let fileName       = join [ path file ];
+            packedRefsName = join [ path "packed-refs" ];
             absolutePath   = base: path:
-              if lib.hasPrefix "/" path
+              if isAbsolute path
               then path
-              else toString (/. + "${base}/${path}");
+              else join [ base path ];
         in if pathIsRegularFile path
            # Resolve git worktrees. See gitrepository-layout(5)
            then
@@ -199,12 +199,11 @@ let
                 then { error = "File contains no gitdir reference: " + path; }
                 else
                   let gitDir      = absolutePath (dirOf path) (lib.head m);
-                      commonDir'' = if pathIsRegularFile "${gitDir}/commondir"
-                                    then lib.fileContents "${gitDir}/commondir"
+                      commonDir' = if pathIsRegularFile "${gitDir}/commondir"
+                                    then normalise (lib.fileContents "${gitDir}/commondir")
                                     else gitDir;
-                      commonDir'  = lib.removeSuffix "/" commonDir'';
                       commonDir   = absolutePath gitDir commonDir';
-                      refFile     = lib.removePrefix "${commonDir}/" "${gitDir}/${file}";
+                      refFile     = relativeTo commonDir (join [ gitDir file ]);
                   in readCommitFromFile refFile commonDir
 
            else if pathIsRegularFile fileName

@roberth
Copy link
Member

roberth commented Nov 23, 2022

E.g. this diff causes the result to be a path instead of a string which can cause it to be imported into the store when it didn't previously, which is entirely non-obvious from this code:

Adding more code isn't going to make this problem go away.

E.g. this diff causes the result to be a path instead of a string which can cause it to be imported into the store when it didn't previously, which is entirely non-obvious from this code:

Conversion to a string actually is a problem in this function, but the fix wasn't complete, causing trouble.
It could be fixed and simplified by working with path values exclusively; perhaps converting to paths at the boundary.

- Changes the agreed-upon design slightly to make types of functions
  clearer:
  - Previously `path.join` worked on a list of paths, but required all but
    the first component to relative. This is now split into two functions:
    - `path.append <path> <string>` takes care of appending a relative
      path to an absolute path.
    - `path.relative.join [ <string> ]` takes care of joining relative
      paths together
  - `path.normalise` -> `path.relative.normalise`, because we don't need
    normalisation on absolute paths, Nix already takes care of that, and
    we use the `path.relative` namespace for anything only relating to
    relative paths

- Some more bikeshedding for the `relativeTo` name. I think `relativeTo`
  is pretty good, but @fricklerhandwerk likes other suggestions more
- Adds some suggestions for partial ordering checks on paths
- Adds a `difference` function, which can take care of common prefix and
  subpath calculations between any number of paths.
@infinisil
Copy link
Member Author

I paired with @fricklerhandwerk today and made some changes:

  • Changes the agreed-upon design slightly to make types of functions
    clearer:

    • Previously path.join worked on a list of paths, but required all but
      the first component to relative. This is now split into two functions:
      • path.append <path> <string> takes care of appending a relative
        path to an absolute path.
      • path.relative.join [ <string> ] takes care of joining relative
        paths together
    • path.normalise -> path.relative.normalise, because we don't need
      normalisation on absolute paths, Nix already takes care of that, and
      we use the path.relative namespace for anything only relating to
      relative paths
  • Some more bikeshedding for the relativeTo name. I think relativeTo
    is pretty good, but @fricklerhandwerk likes other suggestions more

  • Adds some suggestions for partial ordering checks on paths

  • Adds a difference function, which can take care of common prefix and
    subpath calculations between any number of paths.

@fricklerhandwerk
Copy link
Contributor

Yeah, stripPrefix or removePrefix says what it does and matches with hasPrefix.

@infinisil argued this opens potential for confusion with string functions, but that's what namespaces are for.

Getting the local scope consistent and unsurprising is more important for me.

I also like his idea of structuring the namespace by purpose in a fine-grained manner, such that the symbols are self-explanatory. That is really helpful reading such code, given we don't have a type system to assist us.

difference { left = /foo; right = /foo; } = { commonPrefix = /foo; suffix = { left = "."; right = "."; }; }

# Requires at least one path
difference {} = <error>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function reminds me more of the greatest common divisor than difference.

The word difference applied to paths makes me think of set difference.

tangent

..., which something I've considered implementing but purposefully haven't

(noting that trees are quite different from sets of paths)

Set difference is not very different from sources.filter and might suggest a usage that may not perform well.

```nix
difference { path = /foo/bar } = { commonPrefix = /foo/bar; suffix = { path = "."; }; }
difference { left = /foo/bar; right = /foo/baz; } = { commonPrefix = /foo; suffix = { left = "bar"; right = "baz"; }; }
difference { left = /foo; right = /foo/bar; } = { commonPrefix = /foo; suffix = { left = "."; right = "bar"; }; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
difference { left = /foo; right = /foo/bar; } = { commonPrefix = /foo; suffix = { left = "."; right = "bar"; }; }
difference { a = /foo; b = /foo/bar; c = /foo/baz } = { commonPrefix = /foo; suffix = { a = "."; b = "bar"; c = "baz"; }; }

Illustrate that the names are up to the caller, for those who don't normally read types.

### `difference`

```haskell
difference :: AttrsOf Path -> { commonPrefix :: Path; suffix :: AttrsOf String; }
Copy link
Member

@roberth roberth Dec 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideas:

commonAncestry :: f Path -> { commonAncestor :: Path; relativePaths :: f String; }
  where f is one of
    attrsOf
    listOf

Copy link
Contributor

@fricklerhandwerk fricklerhandwerk Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a path is a sequence of nodes in a tree, the correct term for the object of interest is "prefix", not "ancestor". The latter applies to the relations of nodes, not elements in a path.

commonPrefix :: AttrsOf Path -> { prefix :: Path; suffix :: AttrsOf String }

Using suffix does not only save us characters, but also reads more natural when accessing it:

(commonPrefix { a = /abc/def; b = /abc/xzy; }).suffix.b

I'm not the biggest fan of taking a list. With @infinisil we crystallised a few key ideas about the library, also based on your feedback (specifically: keeping the type surface small), and one of them was that it should be as general as possible such that one can implement convenience functions on top of it in a straighforward manner.

That way we can keep the library itself slim, since consumers can just make their own thing easily, and we can adopt the convenience functions into the library once they start to proliferate.

A list has strictly less information than an attrset, and can always be projected down from the attrset, but not the other way around.

{ 
commonPrefixList = xs:
  let
    intermediate = commonPrefix (listToAttrs (imap0 (i: x: { ${toString i} = x; })));
  in { prefix = intermediate.prefix; suffix = attrValues intermediate.suffix; };
}

Sure that's slow, but right now I don't even know if we have a use case for that interface.

Also lays down the assumptions we're making about paths, assumptions
which notably also make the library work with the lazy trees Nix PR
(without relying or interfering with any of its bugs)
Comment on lines +21 to +30
This library makes only these assumptions about paths and no others:
- `dirOf path` returns the path to the parent directory of `path`, unless `path` is the filesystem root, in which case `path` is returned
- There can be multiple filesystem roots: `p == dirOf p` and `q == dirOf p` does not imply `p == q`
- While there's only a single filesystem root in stable Nix, the [lazy trees PR](https://github.com/NixOS/nix/pull/6530) introduces [additional filesystem roots](https://github.com/NixOS/nix/pull/6530#discussion_r1041442173)
- `path + ("/" + string)` returns the path to the `string` subdirectory in `path`
- If `string` contains no `/` characters, then `dirOf (path + ("/" + string)) == path`
- If `string` contains no `/` characters, then `baseNameOf (path + ("/" + string)) == string`
- `path1 == path2` returns true only if `path1` points to the same filesystem path as `path2`

Notably we do not make the assumption that we can turn paths into strings using `toString path`.
Copy link
Member Author

@infinisil infinisil Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR with such a commonAncestry function now (just picked @roberth's recent suggestion for now, I don't have a strong opinion on the name yet).

I also updated the PR to lay down the assumptions about the path type that the library is based on. Notably also including one small assumption which makes it harmonize with the eventual lazy trees PR.

As a next step I'll split this PR into smaller parts

@infinisil
Copy link
Member Author

I created a smaller PR for just the relativeNormalise function along with the design document now: #205190

infinisil added a commit that referenced this pull request Dec 8, 2022
Adds initial work towards a `lib.path` library

Originally proposed in #200718, but has
since gone through some revisions

Co-Authored-By: Valentin Gagarin <[email protected]>
infinisil added a commit that referenced this pull request Dec 23, 2022
Adds initial work towards a `lib.path` library

Originally proposed in #200718, but has
since gone through some revisions

Co-Authored-By: Valentin Gagarin <[email protected]>
Co-Authored-By: Robert Hensing <[email protected]>
infinisil added a commit that referenced this pull request Jan 3, 2023
Adds initial work towards a `lib.path` library

Originally proposed in #200718, but has
since gone through some revisions

Co-Authored-By: Valentin Gagarin <[email protected]>
Co-Authored-By: Robert Hensing <[email protected]>
@infinisil
Copy link
Member Author

#205190 is now merged with a much better design document and initial lib.path.subpath.{isValid,normalise} functions. Based on this I now opened #208887 for lib.path.append and #209099 for lib.path.subpath.join.

I might use some API descriptions and implementations from this draft PR, but this one can be closed.

@infinisil infinisil closed this Jan 4, 2023
@infinisil infinisil deleted the lib.path branch January 4, 2023 22:29
@infinisil infinisil mentioned this pull request Jan 6, 2023
2 tasks
github-actions bot pushed a commit to nix-community/nixpkgs.lib that referenced this pull request Jan 8, 2023
Adds initial work towards a `lib.path` library

Originally proposed in NixOS/nixpkgs#200718, but has
since gone through some revisions

Co-Authored-By: Valentin Gagarin <[email protected]>
Co-Authored-By: Robert Hensing <[email protected]>
@infinisil infinisil added the significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Apr 19, 2023
github-actions bot pushed a commit to nix-community/nixpkgs.lib that referenced this pull request May 31, 2023
Adds initial work towards a `lib.path` library

Originally proposed in NixOS/nixpkgs#200718, but has
since gone through some revisions

Co-Authored-By: Valentin Gagarin <[email protected]>
Co-Authored-By: Robert Hensing <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants