Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ripgrep in semantic-symref-tool-grep #39

Closed
wants to merge 1 commit into from
Closed

Conversation

dickmao
Copy link
Collaborator

@dickmao dickmao commented Sep 6, 2021

Creating a separate bug report from bug#49731 because this is a real problem.

Now grep.el completely supports ripgrep when 'grep-find-template'
is customized to a command line that uses 'rg' such as e.g.

"find -type f -print0 | sort -z | xargs -0 -e
rg -nH --no-heading -j8 --sort path -M 200 --max-columns-preview -e "

But such grep setting breaks the command 'xref-find-references':

  1. while xref-find-references works fine in emacs -Q,
    I don't know why with my customization typing e.g.
    'M-? isearch-lazy-highlight RET' reports
    "No references found for: isearch-lazy-highlight".

Try and see which of the "tools" semantic-symref-perform-search ends
up using.

Thanks for the pointers to semantic-symref-perform-search.
It prepends "-n " to my customized pattern "rg -nH",
so the arg "-n" is duplicated on the command line:
rg -n -nH
and signals the error:
error: The argument '--line-number' was provided more than once, but
cannot be used multiple times
This error is caused by the bug in the command line parser used by
ripgrep:
clap-rs/clap#2171
that was fixed only 6 months ago, so it will take much time
before this fix will reach ripgrep, and this bug will be closed:
BurntSushi/ripgrep#1701

The above might be worked around with creating a symref-grep specific user
option for grep-find-template which would default to the "global" value of
that variable.

Maybe like the existing option 'semantic-symref-grep-shell', e.g.:

(defcustom semantic-symref-grep-program 'grep
"The program to use for regexp search inside files."
:type `(choice
(const :tag "Use Grep" grep)
(const :tag "Use ripgrep" ripgrep)
(symbol :tag "User defined"))
:version "28.1")

But the problem is that for users it's hard to see the connection
between the broken 'xref-find-references' and the need to customize an option
with unrelated name 'semantic-symref-grep'.

But even without duplicated "-n" semantic-symref-perform-search
doesn't work with ripgrep because it doesn't find such pattern:
\\\(\^\\\|\\W\\\)isearch-lazy-highlight\\\(\\W\\\|\$\\\)
Maybe semantic-symref-perform-search could be improved to
support ripgrep?
Because without these two problems it works fine with ripgrep.

...but the above tells us (I think) that semantic-symref-perform-search is
trying to use the basic regexp syntax, and ripgrep doesn't support that
(only Extended, or PCRE).

For your personal consumption, perhaps the best approach is to create
a separate "tool", like Grep (by copying symref/grep.el and tweaking some
of its definitions), and then register it in semantic-symref-tool-alist.

I don't know if ripgrep is that much faster for this particular purpose. So
maybe it's too much work for little benefit.

A more general solution would be to add to grep.el the same options
that you added to xref:

xref-search-program grep/ripgrep
xref-search-program-alist
'((grep . "xargs -0 grep -snHE -e ")
(ripgrep . "xargs -0 rg -nH --no-messages -g '!*/' -e | sort -t: -k1,1 -k2n,2"))

This means to turn the existing variable 'grep-program' into the user option
as the following patch does.

Also later grep.el could use the value "rg" of 'grep-program'
to create the corresponding grep-find-template in grep-compute-defaults.

But I don't know if it's ok to mention rigrep in grep.el?

Anyway, here is the patch that fixes 'xref-find-references':

@dickmao
Copy link
Collaborator Author

dickmao commented Sep 6, 2021

tags 49836 + patch
quit

This means to turn the existing variable 'grep-program' into the user option
as the following patch does.

Also later grep.el could use the value "rg" of 'grep-program'
to create the corresponding grep-find-template in grep-compute-defaults.

But I don't know if it's ok to mention rigrep in grep.el?

So here is the complete support for rigrep in grep.el:

@dickmao
Copy link
Collaborator Author

dickmao commented Sep 6, 2021

I think we can improve this part:

On 03.08.2021 11:10, Juri Linkov wrote:

diff --git a/lisp/cedet/semantic/symref/grep.el b/lisp/cedet/semantic/symref/grep.el
index 180d779a78..034f797076 100644
--- a/lisp/cedet/semantic/symref/grep.el
+++ b/lisp/cedet/semantic/symref/grep.el
@@ -150,15 +150,14 @@ semantic-symref-perform-search
"-l ")
((eq (oref tool searchtype) 'regexp)
"-nE ")

  •                      (t "-n ")))
    
  •                      (t (if (equal grep-program "rg") "" "-n "))))
    

It might be cleaner to see whether grep-find-template already includes
that flag, and if so, omit it. Though the search might be non-trivial if
it's in the form like "-abcn", still, that's searchable by regexp.

       (greppat (cond ((eq (oref tool searchtype) 'regexp)
                       (oref tool searchfor))
                      (t
                       ;; Can't use the word boundaries: Grep
                       ;; doesn't always agree with the language
                       ;; syntax on those.
  •                     (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
    
  •                             (oref tool searchfor)))))
    
  •                     (format "\\b%s\\b" (oref tool searchfor)))))
     ;; Misc
     (b (get-buffer-create "*Semantic SymRef*"))
     (ans nil)
    

I think the original idea (surrounding with \W) is sound: after all, not
every symbol boundary in Emacs sense is a word boundary in Grep or RG.
If a method, say, ends with ?, then it won't be.

The problem with the above regexp is that it uses the basic syntax,
instead of Extended. But we can flip it.

As long as we're able to ask Grep to search with Extended syntax, we can
use (format "(^|\W)%s(\W|$)" (oref tool searchfor)). And that can be
achieved with the same method as is used in xref-matches-in-directory:

Something like (replace-regexp-in-string "grep " "grep -E"
grep-find-template t t),
to be sure it's not ripgrep in there.

The new user option can be used too, but I'd probably prefer a more
independent solution here.

@dickmao
Copy link
Collaborator Author

dickmao commented Sep 6, 2021

@@ -150,15 +150,14 @@ semantic-symref-perform-search
"-l ")
((eq (oref tool searchtype) 'regexp)
"-nE ")

  •                      (t "-n ")))
    
  •                      (t (if (equal grep-program "rg") "" "-n "))))
    

It might be cleaner to see whether grep-find-template already includes that
flag, and if so, omit it. Though the search might be non-trivial if it's in
the form like "-abcn", still, that's searchable by regexp.

Indeed, but such a hack is a temporary measure and can be removed later
after ripgrep will be fixed.

                       ;; Can't use the word boundaries: Grep
                       ;; doesn't always agree with the language
                       ;; syntax on those.
  •                     (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
    
  •                             (oref tool searchfor)))))
    
  •                     (format "\\b%s\\b" (oref tool searchfor)))))
    

I think the original idea (surrounding with \W) is sound: after all, not
every symbol boundary in Emacs sense is a word boundary in Grep or RG. If
a method, say, ends with ?, then it won't be.

I tried to search for 'soap-type-is-array?' in the Emacs tree,
and ripgrep can find it with "\b%s\b", but Grep can't.

The problem with the above regexp is that it uses the basic syntax, instead
of Extended. But we can flip it.

As long as we're able to ask Grep to search with Extended syntax, we can
use (format "(^|\W)%s(\W|$)" (oref tool searchfor)). And that can be
achieved with the same method as is used in xref-matches-in-directory:

Something like (replace-regexp-in-string "grep " "grep -E"
grep-find-template t t), to be sure it's not ripgrep in there.

The new user option can be used too, but I'd probably prefer a more
independent solution here.

It would be more preferable not to change the existing default logic
to avoid possible troubles. Since Grep with Basic syntax works fine,
then better not to switch to Extended syntax.

The new user option is already used in many places in grep.el
in the previous patch, so it should be ok to use it in semantic-symref
as well:

@dickmao
Copy link
Collaborator Author

dickmao commented Sep 6, 2021

On 05.08.2021 00:23, Juri Linkov wrote:

I think the original idea (surrounding with \W) is sound: after all, not
every symbol boundary in Emacs sense is a word boundary in Grep or RG. If
a method, say, ends with ?, then it won't be.
I tried to search for 'soap-type-is-array?' in the Emacs tree,
and ripgrep can find it with "\b%s\b", but Grep can't.

Did you search through symref, or in console? If the former, it seems
some regexp-quoting is missing somewhere (the question mark was no
escaped). Because see what the console says:

$ rg "\bsoap-type-is-array?\b"
lisp/net/soap-client.el
950:(defun soap-type-is-array? (type)
990: (if (soap-type-is-array? type)

ChangeLog.2
19080: * lisp/net/soap-client.el (soap-type-is-array?): new defun

$ rg "\bsoap-type-is-array?\b"

^^ no matches

And

$ rg "\bsoap-type-is-array?"

has matches, of course.

It would be more preferable not to change the existing default logic
to avoid possible troubles. Since Grep with Basic syntax works fine,
then better not to switch to Extended syntax.

See above. But also consider what happens if a user sees that
grep-program is now customizable and ripgrep is an officially supported
value. They change it to "rg", and then suddenly their 'M-x rgrep' input
has to use the extended regexp format?

Worse than that, any third-party package that uses grep-find-template
will suddenly have a high chance of failing if they pass any nontrivial
regexps to it, especially if those have groupings or alternations.

It's a hard problem: grep.el is not prepared for abstracting like that.
If we at least standardized it internally on Extended format, that would
at least remove one source of uncertainty and bugs. The user still can
input basic regexps interactively, we can translate them easily.

Further: seeing xref-search-program-alist, people asked for support for
other similar programs, such as ag and pt. Any solution we end up with,
we should try to ensure they are valid values of grep-program as well.

The new user option is already used in many places in grep.el
in the previous patch, so it should be ok to use it in semantic-symref
as well:

diff --git a/lisp/cedet/semantic/symref/grep.el b/lisp/cedet/semantic/symref/grep.el
index 180d779a78..b7d08409aa 100644
--- a/lisp/cedet/semantic/symref/grep.el
+++ b/lisp/cedet/semantic/symref/grep.el
@@ -150,15 +150,22 @@ semantic-symref-perform-search
"-l ")
((eq (oref tool searchtype) 'regexp)
"-nE ")

  •                      (t "-n ")))
    
  •                      (t (if (equal grep-program "rg")
    
  •                             ;; TODO: remove this after ripgrep is fixed (bug#49836)
    
  •                             (unless (string-search "rg <C> -nH" grep-find-template)
    
  •                               "-n ")
    
  •                           "-n "))))
    

I'm actually fine with this part.

       (greppat (cond ((eq (oref tool searchtype) 'regexp)
                       (oref tool searchfor))
                      (t
                       ;; Can't use the word boundaries: Grep
                       ;; doesn't always agree with the language
                       ;; syntax on those.
  •                     (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
    
  •                             (oref tool searchfor)))))
    
  •                     (if (equal grep-program "rg")
    
  •                         (format "(^|\\W)%s(\\W|$)"
    
  •                                 (oref tool searchfor))
    
  •                       (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
    
  •                               (oref tool searchfor))))))
    

This can work. Except the comparison should be with "grep", I think: all
other alternatives only work with the Extended format.

@dickmao
Copy link
Collaborator Author

dickmao commented Sep 6, 2021

I think the original idea (surrounding with \W) is sound: after all, not
every symbol boundary in Emacs sense is a word boundary in Grep or RG. If
a method, say, ends with ?, then it won't be.
I tried to search for 'soap-type-is-array?' in the Emacs tree,
and ripgrep can find it with "\b%s\b", but Grep can't.

Did you search through symref, or in console? If the former, it seems some
regexp-quoting is missing somewhere (the question mark was no
escaped). Because see what the console says:

$ rg "\bsoap-type-is-array?\b"
lisp/net/soap-client.el
950:(defun soap-type-is-array? (type)
990: (if (soap-type-is-array? type)

ChangeLog.2
19080: * lisp/net/soap-client.el (soap-type-is-array?): new defun

$ rg "\bsoap-type-is-array?\b"

^^ no matches

And

$ rg "\bsoap-type-is-array?"

has matches, of course.

semantic-symref-grep-use-template constructs such command line:

"rg ... -e \\bsoap-type-is-array\?\\b"

that finds matches.

It would be more preferable not to change the existing default logic
to avoid possible troubles. Since Grep with Basic syntax works fine,
then better not to switch to Extended syntax.

See above. But also consider what happens if a user sees that grep-program
is now customizable and ripgrep is an officially supported value. They
change it to "rg", and then suddenly their 'M-x rgrep' input has to use the
extended regexp format?

This difference could be explained in the documentation.

Worse than that, any third-party package that uses grep-find-template will
suddenly have a high chance of failing if they pass any nontrivial regexps
to it, especially if those have groupings or alternations.

This already happened after trying to customize grep-find-template
to use rg broke xref-find-references, so the problem already exists
and needs to be solved.

It's a hard problem: grep.el is not prepared for abstracting like that. If
we at least standardized it internally on Extended format, that would at
least remove one source of uncertainty and bugs. The user still can input
basic regexps interactively, we can translate them easily.

Is there a package that can translate between them reliably?

Further: seeing xref-search-program-alist, people asked for support for
other similar programs, such as ag and pt. Any solution we end up with, we
should try to ensure they are valid values of grep-program as well.

Why not, semantic-symref already supports alternative tools
such as cscope, global, idutils. So xref could support more too.

  •                     (if (equal grep-program "rg")
    
  •                         (format "(^|\\W)%s(\\W|$)"
    
  •                                 (oref tool searchfor))
    
  •                       (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
    
  •                               (oref tool searchfor))))))
    

This can work. Except the comparison should be with "grep", I think: all
other alternatives only work with the Extended format.

I'm worried about the case when the user customizes
'grep-program' to e.g. an absolute path "/bin/grep"
or "/usr/local/bin/grep", etc.

@dickmao
Copy link
Collaborator Author

dickmao commented Sep 6, 2021

On 06.08.2021 03:35, Juri Linkov wrote:

semantic-symref-grep-use-template constructs such command line:

"rg ... -e \\bsoap-type-is-array\?\\b"

that finds matches.

The correct one will probably look like

"rg ... -e \\bsoap-type-is-array\\?\\b"

(same number of backslashes before '?' as before 'b'), and it won't find
any. The one you mentioned will find false positives.

E.g., try searching for 'file-name-as-directory?'. Or 'carr?'.

See above. But also consider what happens if a user sees that grep-program
is now customizable and ripgrep is an officially supported value. They
change it to "rg", and then suddenly their 'M-x rgrep' input has to use the
extended regexp format?

This difference could be explained in the documentation.

If it comes to that, yes, but it's usually better to fix usability
problems that just document them.

Worse than that, any third-party package that uses grep-find-template will
suddenly have a high chance of failing if they pass any nontrivial regexps
to it, especially if those have groupings or alternations.

This already happened after trying to customize grep-find-template
to use rg broke xref-find-references, so the problem already exists
and needs to be solved.

The problem exists, and has been for a long time: grep.el doesn't
properly support the "alternative" search programs, which are very
popular now. Its abstraction is leaky and doesn't work with anything but
grep. But I think that means we need a better abstraction.

Let's try to make sure we don't create bigger problems when fixing it.
And "packages stop working when I customize grep-program" sounds worse
than "I can't customize grep-program to 'rg', so my searches are a bit
slower than they could have been".

It's a hard problem: grep.el is not prepared for abstracting like that. If
we at least standardized it internally on Extended format, that would at
least remove one source of uncertainty and bugs. The user still can input
basic regexps interactively, we can translate them easily.

Is there a package that can translate between them reliably?

For the limited purpose of symref/grep, we could use
xref--regexp-to-extended. It's already used in xref-matches-in-directory
and xref-matches-in-files. Better name/documentation and tests are pending.

Note that it actually translates from a (subset of) Emacs regexp to
Extended and back (it's reversible). The proper basic regexp syntax
treats '+' and '?' as normal characters unless escaped, but they're
special in Emacs regexps.

The above function is how one can use Emacs syntax (though only limited
a subset, for now) in project-find-regexp.

I also saw some commits to ELPA yesterday, that show that Consult
includes a more advanced version of this feature:

https://git.savannah.gnu.org/cgit/emacs/elpa.git/commit/?h=externals/consult&id=7bd3e44929d44cf0e17f38e943e9be2bd6014237
https://git.savannah.gnu.org/cgit/emacs/elpa.git/commit/?h=externals/consult&id=95dadd98a6a0f08955f67f1e9a7cc312435a86b8

Not sure how mature it is (seems still in development), but perhaps we
could move it to the core sooner or later. And use it instead, if it
does provide any improvement for our use case here.

Further: seeing xref-search-program-alist, people asked for support for
other similar programs, such as ag and pt. Any solution we end up with, we
should try to ensure they are valid values of grep-program as well.

Why not, semantic-symref already supports alternative tools
such as cscope, global, idutils. So xref could support more too.

It's easy enough for Xref, yes. It only has to support one single,
well-defined scenario.

  •                     (if (equal grep-program "rg")
    
  •                         (format "(^|\\W)%s(\\W|$)"
    
  •                                 (oref tool searchfor))
    
  •                       (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
    
  •                               (oref tool searchfor))))))
    

This can work. Except the comparison should be with "grep", I think: all
other alternatives only work with the Extended format.

I'm worried about the case when the user customizes
'grep-program' to e.g. an absolute path "/bin/grep"
or "/usr/local/bin/grep", etc.

(string-match "\bgrep\b" grep-program) could take care of this.

To sum up, I'm all for adding some clutches to symref/grep.el, to
support your advanced scenario, right now.

As for having grep-program customizable, perhaps we should add some new
feature/abstraction/package? To avoid breakage, and for it to be opt-in
for any new callers from Lisp.

Or indeed have templates use Extended syntax, and grep-expand-template
translate REGEXP to it. That can cause breakage for existing users,
though, those who already customize grep-find-template, etc, to their
particular programs.

@dickmao dickmao closed this Sep 8, 2021
@dickmao dickmao deleted the pr-49836 branch September 8, 2021 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant