Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tilde (~) in path literal is illegal #7742

Open
aherrmann opened this issue Feb 3, 2023 · 23 comments
Open

Tilde (~) in path literal is illegal #7742

aherrmann opened this issue Feb 3, 2023 · 23 comments

Comments

@aherrmann
Copy link
Member

aherrmann commented Feb 3, 2023

Describe the bug

Path literals in the Nix language do not support ~ characters within the path (apart from the first character). However, such characters may occur in valid paths.

Steps To Reproduce

  1. Open nix repl
  2. Enter ./foo~
  3. Observe error: syntax error, unexpected invalid token, expecting end of file.

Expected behavior

Nix accepts path literals containing ~ characters in other positions than the beginning as legal paths.

nix-env --version output

nix (Nix) 2.11.1

Additional context

This is about ~ in locations other than the very beginning of the literal, where in stands for $HOME, ~/foo is accepted by Nix 2.11.1 and resolves to the same path as $HOME/foo.

Possible workarounds thanks to @thufschmitt and @shlevy:

./${"foo~"}
./. + "/foo~"

Possibly relevant part of the code thanks to @layus.

The issue occurs in the Bazel extension rules_nixpkgs in the context of nixopts location expansion when expanding labels to external workspaces in the new bzlmod dependency system. In that case the workspace directory is defined by the mangled name of the external workspace which includes ~ characters, e.g. external/nixpkgs_location_expansion_test_file~override/test_file for @nixpkgs_location_expansion_test_file//:test_file. The above workaround would have to be performed by users of rules_nixpkgs, meaning the issue is user-facing for rules_nixpkgs.

Priorities

Add 👍 to issues you find important.

@aherrmann aherrmann added the bug label Feb 3, 2023
@cole-h
Copy link
Member

cole-h commented Feb 3, 2023

This diff appears to allow this to work just fine without expanding the ~ to $HOME:

diff --git a/src/libexpr/lexer.l b/src/libexpr/lexer.l
index 462b3b602..1d2421f8a 100644
--- a/src/libexpr/lexer.l
+++ b/src/libexpr/lexer.l
@@ -114,7 +114,7 @@ ANY         .|\n
 ID          [a-zA-Z\_][a-zA-Z0-9\_\'\-]*
 INT         [0-9]+
 FLOAT       (([1-9][0-9]*\.[0-9]*)|(0?\.[0-9]+))([Ee][+-]?[0-9]+)?
-PATH_CHAR   [a-zA-Z0-9\.\_\-\+]
+PATH_CHAR   [a-zA-Z0-9\.\_\-\+~]
 PATH        {PATH_CHAR}*(\/{PATH_CHAR}+)+\/?
 PATH_SEG    {PATH_CHAR}*\/
 HPATH       \~(\/{PATH_CHAR}+)+\/?
vin@scadrial ~/w/v/nix (master)> nix repl
Welcome to Nix 2.14.0pre20230126_37b4a9e. Type :? for help.

nix-repl> ./foo~
error: syntax error, unexpected invalid token, expecting end of file

       at «string»:1:6:

            1| ./foo~
             |      ^

# vs

vin@scadrial ~/w/v/nix (master)> ./outputs/out/bin/nix repl
Welcome to Nix 2.14.0. Type :? for help.

nix-repl> ./foo~
/home/vin/workspace/vcs/nix/foo~

But that list still looks fairly restrictive (e.g. disallows @, %, etc). I wonder if this is intended / desired, that special characters require that stringification workaround?

@Radvendii
Copy link
Contributor

One thing we have to be careful of when adding ~ to PATH_CHAR is that ~/foo doesn't change meanings. I think we have to reverse the order of each of these pairs of rules:

@sivizius
Copy link

sivizius commented Feb 4, 2023

There are many characters, that are valid in actual paths but not in nix’ path-literals because it might collide with other syntax. We might want to allow the escape character \ in literal paths?

@infinisil
Copy link
Member

@aherrmann How are these path values passed to rules_nixpkgs used in the end?

@aherrmann
Copy link
Member Author

@infinisil On the command-line, e.g. as an --arg, but how exactly is left to the user. These paths are exposed through a WORKSPACE scope implementation of Bazel's location expansion, you can see an example here, the corresponding Nix expression is here. In that particular case the workaround is the following patch ($$ escapes $):

       "external_file",
-      "$(location @nixpkgs_location_expansion_test_file//:test_file)"
+      './$${"$(location @nixpkgs_location_expansion_test_file//:test_file)"}',
     ],

What's unfortunate is that this is not hidden in some implementation detail of rules_nixpkgs but exposed to its users.

@infinisil
Copy link
Member

@aherrmann I see, then I think an --argstr should be used/recommended instead, which handles escaping correctly no matter the argument you pass, it always gets turned into a string. You can complain to users of --arg by confirming that the checked value is a string using builtins.isString.

There are two differences that might need to be fixed then:

  • Relative path expressions automatically get transformed to absolute path values at parse time. This won't happen anymore with --argstr. E.g. whereas previously --arg foo ./. would've set foo = /some/path, the new --argstr foo ./. only sets foo = "./.". If you need an absolute path again you'll have to prepend the base you need to it, e.g. "/some/path" + ("/" + foo). You won't need this if you only handle absolute paths though.
  • Paths get implicitly imported into the store, which won't happen anymore with a string. But you can still do it explicitly using builtins.path, assuming you have an absolute path as a string:
    builtins.path {
      path = "/some/path";
    }
    This is needed here

@aherrmann
Copy link
Member Author

@infinisil Thanks for looking into this!

  • But you can still do it explicitly using builtins.path, assuming you have an absolute path as a string:

Bazel usually deals in relative paths and we try to avoid absolute paths to Bazel managed files because they are generally not reproducible: They'll typically have a prefix of the form /home/$USER/.cache/bazel/_bazel_$USER/$HASH/.... The advantage of relative path literals is that they'll get turned into absolute paths into the Nix store, which are reproducible. If we expose non-reproducible absolute paths to user defined code, there's a high risk they'll find their way into generated files and will ultimately cause cache misses or other issues. Is there a way to avoid that risk with --argstr?

@infinisil
Copy link
Member

Oh then that's great, because the current code using --arg expands relative paths to absolute ones during Nix evaluation, potentially causing such problems. But with --argstr no such expansion takes place (because you pass a string directly), so there's no risk using --argstr

@aherrmann
Copy link
Member Author

@infinisil I'm not sure I understand. When I try to use the stringly relative path then I get errors of the form

error: string './tests/location_expansion/test_file' doesn't represent an absolute path

So, IIUC I have to convert it to an absolute path first somehow. With a path literal that happens automatically and turns it into an absolute path in the Nix store, which is reproducible. But, if I want to do the same with builtins.path then I have to manually prepend the absolute prefix to the source file outside the Nix store, and that would not be reproducible. So, wouldn't the use of a path literal be safer?

@infinisil
Copy link
Member

Ohh I think you just need this actually (using --argstr):

cp ${./. + ("/" + local_file)} $out/out/local_file
cp ${./. + ("/" + external_file)} $out/out/external_file

That's also what Nix would do underneath if local_file/external_file were relative path expressions, they get evaluated relative to the current Nix file's directory (./.).

Note however that + is a bit unsafe, I'd recommend the new lib.path.append if that's already available for you (was merged very recently though):

cp ${lib.path.append ./. local_file} $out/out/local_file
cp ${lib.path.append ./. external_file} $out/out/external_file

This new function prevents appending absolute paths.

@aherrmann
Copy link
Member Author

@infinisil Thank you for the pointer! lib.path.append looks like a good addition.

Sorry to be going back and forth on this so much. I gave this a shot, --argstr here and corresponding path_append here.

One issue I encountered when passing --argstr and then performing lib.path.append in the .nix file is that the path is now interpreted relative to the .nix file instead of the workspace root. That means users now need to remember to prefix the right amount of ../ to climb back to the workspace root (e.g. path_append ../. argstr_local_file). Using --arg instead doesn't have that problem, since it's evaluated at the nix-build invocation, which happens in the workspace root.

I keep having the impression that --arg exposes the user to fewer foot-guns... (The ~ issue being the only one for the rules_nixpkgs use-case.)

@infinisil
Copy link
Member

@aherrmann No problem! This is interesting detail about --arg I wasn't aware of, and imo the best way to fix it would be this:

# --arg arg_base ./.
cp ${path_append arg_base argstr_local_file} $out/out/argstr_local_file

Note that --arg arg_base ./. doesn't run into the same problem because it's always a correctly escaped expression. Alternatively (but then not using lib.path.append since you don't have a path expression anymore):

# --argstr argstr_base "$PWD"
cp ${builtins.path { path = "${argstr_base}/${argstr_local_file}"; }} $out/out/argstr_local_file

But since these arguments are essentially user-facing, this does make it more annoying to call. So here's an alternative that doesn't need passing an extra base directory argument, at the expense of being an impure expression (which doesn't matter much since the base directory is impure in any case and you're not using pure evaluation which would forbid it):

cp ${builtins.path { path = "${builtins.getEnv "PWD"}/${argstr_local_file}"; }} $out/out/argstr_local_file

I keep having the impression that --arg exposes the user to fewer foot-guns... (The ~ issue being the only one for the rules_nixpkgs use-case.)

Yeah I agree that it seems that way, but extending the Nix grammar just to allow ~ in paths so this one use-case works is just a hack. I agree that ~ is a somewhat common character in paths, but in the end you're still passing a Nix expression using --arg, which can still cause problems for a whole range of other paths, since only a small subset of characters are allowed. Here's some examples that break:

# Spaces
$ nix-instantiate --eval -E '{ x }: x' --arg x './foo bar'
error: undefined variable 'bar'

       at «string»:1:7:

            1| ./foo bar
             |       ^

# Dollar curly
$ nix-instantiate --eval -E '{ x }: x' --arg x './foo${}'
error: syntax error, unexpected '}'

       at «string»:1:8:

            1| ./foo${}
             |        ^

Passing a dynamic Nix expression to --arg would only be safe if you make sure to escape it properly beforehand, but this is rather cumbersome to do:

$ escapeNixString() {
  nix-instantiate --eval --json -E '{ str }: (import <nixpkgs/lib>).strings.escapeNixString str' \
    --argstr str "$1" | jq . -r
}
$ evalPath() {
  nix-instantiate --eval -E '{ x }: x' --arg x "./. + (\"/\" + $(escapeNixString "$1"))"
}
$ evalPath 'foo bar'
/home/tweagysil/foo bar
$ evalPath 'foo${}'
/home/tweagysil/foo${}

@Ericson2314 Ericson2314 added feature Feature request or proposal and removed bug labels Feb 20, 2023
@Ericson2314
Copy link
Member

This is not a bug, because Nix intentionally restricts what characters are allowed in store paths. However, it is a valid feature request --- we could relax those rules.

@aherrmann
Copy link
Member Author

Yeah I agree that it seems that way, but extending the Nix grammar just to allow ~ in paths so this one use-case works is just a hack. I agree that ~ is a somewhat common character in paths, but in the end you're still passing a Nix expression using --arg, which can still cause problems for a whole range of other paths, since only a small subset of characters are allowed.

Fair enough. Thank you for the discussion and suggestions on workarounds!
I've updated the rules_nixpkgs docs to include a recommendation on using --argstr in conjunction with --arg workspace_root ./. and path_append. The $${"..."} approach is still listed as well. In the common case it is the easiest option. Too exotic characters in paths are not common in Bazel projects, e.g. spaces in paths are generally avoided as they tend to cause problems with existing Bazel extensions. But, in principle, symbols are allowed in target names.

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Feb 24, 2023

Triaged in the Nix team meeting:

  • @fricklerhandwerk: this looks like a bug
    • @tomberek: there was a related bug where ~ was expanded to $HOME within paths, but this here is something else
  • @thufschmitt: it would be fine to accept it
    • @edolstra: that would be ad hoc, and we forbid many other characters
    • @fricklerhandwerk: that's what all practical shells do, too, it could be enough to be consistent with convention here
  • to discuss

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-02-24-nix-team-meeting-minutes-35/25757/1

@fricklerhandwerk
Copy link
Contributor

Discussed in the Nix team meeting 2022-02-27:

  • @roberth: we have to take into account that [~/a~/b] evaluates to [ /home/user/a /home/user/b ]
  • agreement: this is lower priority and has some potential for debate, but we're open for a pull request

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-02-27-nix-team-meeting-minutes-36/25890/1

@thomasfire
Copy link

Is there any workaround for prefetching URLs that contain tilde? Like https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Apr 17, 2024

@thomasfire this seems to be a different issue, because URLs are not file system paths, and should always be passed as a string. The tilde is merely not accepted as the symbolic part of the store path, which can be worked around by setting name:

nix-repl> builtins.fetchurl "https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst"     
error: store path '0nk02idx70nv1dqi5dliibddznyx8wd6-msys2-keyring-1~20240410-1-any.pkg.tar.zst' contains illegal character '~'

nix-repl> builtins.fetchurl { name = "foo"; url="https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst";}
"/nix/store/q2jzkl6xvya8znhg0rqvf8vrz5xlgm1m-foo"

@roberth
Copy link
Member

roberth commented Apr 17, 2024

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/illegal-character-error-trying-to-use-fetchurl/47287/2

@roberth
Copy link
Member

roberth commented Jun 25, 2024

#10941 has improved the error message to become:

$ nix run nix -- repl
Nix 2.24.0pre20240625_ccb679e
Type :? for help.
nix-repl> builtins.fetchurl "https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst"
error:
       … while calling the 'fetchurl' builtin
         at «string»:1:1:
            1| builtins.fetchurl "https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst"
             | ^

       error: invalid store path name when fetching URL 'https://mirror.msys2.org/msys/x86_64/msys2-keyring-1~20240410-1-any.pkg.tar.zst': name 'msys2-keyring-1~20240410-1-any.pkg.tar.zst' contains illegal character '~'. Please pass an attribute set with 'url' and 'name' attributes to 'fetchurl',  so that it can create a valid store path.

However, the documentation still needs to be fixed.

@roberth roberth removed the feature Feature request or proposal label Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants