Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex behavior differs across platforms #1537

Open
copumpkin opened this issue Aug 29, 2017 · 16 comments
Open

Regex behavior differs across platforms #1537

copumpkin opened this issue Aug 29, 2017 · 16 comments
Assignees
Labels
bug language The Nix expression language; parser, interpreter, primops, evaluation, etc

Comments

@copumpkin
Copy link
Member

I seem to remember seeing some regex-related changes in 1.12 so this might no longer be an issue, but I want to capture it in case it is. If I fire up nix-repl on 1.11.x on Linux and Darwin and run

builtins.match "^((|\..*)\.sw[a-z]|.*~)$" "foo"

Darwin will complain that error: compiling pattern ‘^((|..*).sw[a-z]|.*~)$’: empty (sub)expression, whereas Linux will return null.

@copumpkin
Copy link
Member Author

This is slightly painful because that happens to be the regex we use to filter junk from nixpkgs in the standard NixOS image builder, and I need to be able to evaluate the image build process on Darwin 😦

cc @edolstra

@copumpkin
Copy link
Member Author

Yeah, so I guess we moved from 1.11's calls to regcomp which I guess varies from platform to platform, to std::regex in 1.12, which I assume is consistent across platforms.

@copumpkin
Copy link
Member Author

@edolstra do you have some sort of plan for the cutover to 1.12 with differing behavior in builtins.match? Do we need to bump the minimum Nix version on nixpkgs to make it work? Otherwise, silently changing behavior could cause a lot of woe.

@edolstra
Copy link
Member

I don't have a plan. I would suggest rewriting that regexp into something that doesn't rely on Linux-specific behavior.

Also, builtins.match was undocumented in 1.11 so using it was at your own risk anyway ;-)

@edolstra
Copy link
Member

Anyway, the behavior no longer differs across platforms, so I'll close this.

@clacke
Copy link

clacke commented Mar 31, 2018

Still an issue on 2.0pre:

Darwin:

$ nix-instantiate --version
nix-instantiate (Nix) 2.0pre5968_a6c0b773
$ nix-instantiate --eval -E 'builtins.match "(|.*/)([^/]*)" "path/to/blah"'
error: invalid regular expression '(|.*/)([^/]*)', at (string):1:1

Linux:

$ nix-instantiate --version
nix-instantiate (Nix) 2.0pre5914_48c192ca
$ nix-instantiate --eval -E 'builtins.match "(|.*/)([^/]*)" "path/to/blah"'
[ "path/to/" "blah" ]

@clacke
Copy link

clacke commented Mar 31, 2018

Verified that Darwin nix 2.0 also has this issue.

@copumpkin
Copy link
Member Author

That's surprising! The full source to match is this:

nix/src/libexpr/primops.cc

Lines 1771 to 1806 in cfdbfa6

static void prim_match(EvalState & state, const Pos & pos, Value * * args, Value & v)
{
auto re = state.forceStringNoCtx(*args[0], pos);
try {
std::regex regex(re, std::regex::extended);
PathSet context;
const std::string str = state.forceString(*args[1], context, pos);
std::smatch match;
if (!std::regex_match(str, match, regex)) {
mkNull(v);
return;
}
// the first match is the whole string
const size_t len = match.size() - 1;
state.mkList(v, len);
for (size_t i = 0; i < len; ++i) {
if (!match[i+1].matched)
mkNull(*(v.listElems()[i] = state.allocValue()));
else
mkString(*(v.listElems()[i] = state.allocValue()), match[i + 1].str().c_str());
}
} catch (std::regex_error &e) {
if (e.code() == std::regex_constants::error_space) {
// limit is _GLIBCXX_REGEX_STATE_LIMIT for libstdc++
throw EvalError("memory limit exceeded by regular expression '%s', at %s", re, pos);
} else {
throw EvalError("invalid regular expression '%s', at %s", re, pos);
}
}
}

which suggests that std::regex doesn't behave consistently across platforms? Seems odd!

@copumpkin copumpkin reopened this Mar 31, 2018
@copumpkin
Copy link
Member Author

Probably worth including the regex_error in the "invalid regular expression" error message to help track this down further?

@shlevy
Copy link
Member

shlevy commented Mar 31, 2018

Probably the example case here is enough to reproduce independent of Nix?

@clacke
Copy link

clacke commented Mar 31, 2018

I'm assuming std::regex on Linux comes from libstdc++ and the one on Darwin comes from LLVM?

@edolstra
Copy link
Member

edolstra commented Apr 3, 2018

This appears to be a bug in libstdc++. Empty regular expressions are not allowed according to the POSIX extended regex grammar (http://pubs.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap09.html#tag_09_05_03).

This also affects grep:

linux$ echo foo | grep -E "(|x)" 
foo

macos$ echo foo | grep -E "(|x)"
grep: empty (sub)expression

@LnL7
Copy link
Member

LnL7 commented Apr 3, 2018

We could use libcxxStdenv for all platforms 😄

@tomberek
Copy link
Contributor

Another variation of the same problem:
builtins.split "" "abcd"

@peti peti added bug and removed backlog labels Apr 27, 2018
adisbladis added a commit to nix-community/pnpm2nix that referenced this issue Aug 21, 2018
adisbladis added a commit to nix-community/poetry2nix that referenced this issue Jan 1, 2020
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-regex-match/7946/9

@stale
Copy link

stale bot commented Feb 13, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 13, 2021
vcunat pushed a commit to NixOS/nixpkgs that referenced this issue Mar 28, 2021
Regex supported by `builtins.match` differ on Linux and Darwin
(see NixOS/nix#1537) and the empty match group errors on Darwin.
But simply removing it does not change the logic in the module in any
way.
vcunat pushed a commit to NixOS/nixpkgs that referenced this issue Apr 1, 2021
Regex supported by `builtins.match` differ on Linux and Darwin
(see NixOS/nix#1537) and the empty match group errors on Darwin.
But simply removing it does not change the logic in the module in any
way.

(cherry picked from commit ab94ea6, PR #100592)
montchr added a commit to montchr/dotfield that referenced this issue Apr 19, 2022
on darwin, setting a custom gpg home directory results causes `nix flake
check` to fail when using my customized gpg-agent module. this does not
necessarily appear to be an issue with my implementation (though there
are certainly issues with my implementation), but appears to be a result
of a known nix language issue where regular expression matching behaves
differently between linux and darwin. the issue only presents itself
during *checks* for linux systems on darwin, which *should* be okay
since i'm not trying to *build* or run nixos modules or systemd services.

see NixOS/nix#1537

while that github issue and other issues/PRs referencing it provide
numerous examples of how one might work around the regex issue,
unfortunately the custom hashing functions in home-manager's gpg-agent
module are pretty opaque and i haven't a clue about where to begin
applying a possible fix.

fortunately, the error is easy to avoid by using the default gnupg home
directory. considering that the only real benefit to a custom home
directory is decluttering $HOME, the solution for my immediate needs is
clear. however, the underlying hash function logic will need to change
before my custom gpg-agent module can ever be merged upstream...
@fricklerhandwerk fricklerhandwerk added the language The Nix expression language; parser, interpreter, primops, evaluation, etc label Sep 13, 2022
@stale stale bot removed the stale label Sep 13, 2022
ntninja added a commit to ntninja/nixpkgs that referenced this issue Jul 6, 2024
…ixOS/nix#1537)

Also avoid trimming single-line string values unnecessarily.
2xsaiko pushed a commit to 2xsaiko/nixpkgs that referenced this issue Dec 2, 2024
…ixOS/nix#1537)

Also avoid trimming single-line string values unnecessarily.
2xsaiko pushed a commit to 2xsaiko/nixpkgs that referenced this issue Dec 6, 2024
…ixOS/nix#1537)

Also avoid trimming single-line string values unnecessarily.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug language The Nix expression language; parser, interpreter, primops, evaluation, etc
Projects
None yet
Development

No branches or pull requests

10 participants