Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(^|x) causes error "empty expression" #427

Closed
xrat opened this issue Sep 24, 2024 · 5 comments
Closed

(^|x) causes error "empty expression" #427

xrat opened this issue Sep 24, 2024 · 5 comments
Labels
enhancement New feature or request problem Something isn't working due to a (minor) problem

Comments

@xrat
Copy link

xrat commented Sep 24, 2024

I tried to replace my GNU grep with ugrep and found that the pattern (^|x) which I happen to use at times causes error "empty expression":

$ ugrep --version | head -n 1
ugrep 6.5.0 x86_64-pc-linux-gnu +sse2; -P:pcre2jit
$ cat /etc/debian_version
11.11
$ grep --version | head -n 1
grep (GNU grep) 3.6
$ grep -E '(^|x)' <<< foobar
foobar
$ ugrep -E '(^|x)' <<< foobar
ugrep: error: error at position 6
(?m)(^|x)
      \___empty expression
@genivia-inc
Copy link
Member

Will take a look at this. It works fine with the $ anchor, but ^ is handled differently internally.

@genivia-inc
Copy link
Member

Just a quick follow-up note: I'm just theorizing here, but it appears that grep just outputs all lines with grep -E '(^|x)', so the ^ is just like any other empty pattern that matches all input lines. It doesn't do anything special, because grep -E -o '(^|x)' only outputs matches of x, nothing else, which means that it's internal machinery isn't using ^ at all in this case. Perhaps this some GNU/BSD grep peculiarity? Will check it out.

@xrat
Copy link
Author

xrat commented Oct 1, 2024

Please note that I was just providing a minimal example. My use case is (^|x)y.

@stephentalley
Copy link

I am also affected by this. IIRC, I was trying to match '(^|\s)x', which GNU grep handles without issue.

There may be more efficient ways to architect that expression, but I guess if ugrep is intended to be a drop-in replacement for GNU grep, it seems like it should support these types of expressions.

@genivia-inc
Copy link
Member

This limitation of the ^ anchor is no longer present in the upcoming ugrep update:

$ ugrep -c '(^|\s)y' enwik8 --stats
23930
Searched 1 file in 0.087 seconds: 1 matching (100%)

GNU grep is 10x slower in this case:

$ /usr/bin/time ggrep -c -E '(^|\s)y' enwik8 
23930
        0.84 real         0.82 user         0.01 sys

See also #426

@genivia-inc genivia-inc added enhancement New feature or request problem Something isn't working due to a (minor) problem labels Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request problem Something isn't working due to a (minor) problem
Projects
None yet
Development

No branches or pull requests

3 participants