Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

match beginning and end of line correctly #3575

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

matthias314
Copy link
Contributor

@matthias314 matthias314 commented Dec 14, 2024

When searching with buf:FindNext, the start of the search region currently always matches ^ and the end of the region $, no matter if they are the indeed the start or end of a line. Similar problems exist with other empty-string pattern like \b. The goal of this PR is to fix this. This is done by modifying the search pattern for the first and last lines (prepending and/or appending ..

Code common to findDown and findUp has been moved to separate functions. One could tweak the specific way it's done, but maybe it's better to first get some feedback if you are willing to go in this direction at all.

@matthias314 matthias314 force-pushed the m3/findnext-start-end branch from 6a8e479 to c0ba33e Compare December 19, 2024 14:15
@dmaluka
Copy link
Collaborator

dmaluka commented Jan 15, 2025

This "padded regex" approach feels hacky. Maybe it is reliable (I haven't analyzed it carefully yet), but how about a straightforward approach instead: search not just within the given range (start..end) but within the extended range from the beginning of the first line to the end of the last line, and filter out those matches that are outside start..end?

@matthias314
Copy link
Contributor Author

That doesn't work because FindAll and friends return only non-overlapping matches, see here. For example, for s := "abcd" and re := regexp.MustCompile(".."), the call re.FindAllStringIndex(s, -1) returns [[0 2] [2 4]]. Hence the method you are suggesting would claim that there is no match in s[1:3].

@dmaluka
Copy link
Collaborator

dmaluka commented Jan 15, 2025

Ok, fair point.

internal/buffer/search.go Outdated Show resolved Hide resolved
internal/buffer/search.go Outdated Show resolved Hide resolved
internal/buffer/search.go Outdated Show resolved Hide resolved
internal/buffer/search.go Outdated Show resolved Hide resolved
internal/buffer/search.go Outdated Show resolved Hide resolved
internal/buffer/search.go Outdated Show resolved Hide resolved
@matthias314 matthias314 force-pushed the m3/findnext-start-end branch from 926e411 to 2e3bc77 Compare January 16, 2025 02:06
@matthias314
Copy link
Contributor Author

I've force-pushed a new version:

  • the first commit is the old PR,
  • the second commit should address all your comments,
  • the third commit is a variant where I only use the two bitmasks padStart and padEnd instead of an enum with four elements. You may or may nor find this more elegant.

I've also rebased the PR so that one can test it together with the changes to the search functionality done in previous PRs.

internal/buffer/search.go Outdated Show resolved Hide resolved
@dmaluka
Copy link
Collaborator

dmaluka commented Jan 16, 2025

Thanks, LGTM (apart from my above comment about comments, and the fact that it's better to squash such fix-up changes into one commit, but we can squash them later).

internal/buffer/search.go Outdated Show resolved Hide resolved
internal/buffer/search.go Outdated Show resolved Hide resolved
@dmaluka
Copy link
Collaborator

dmaluka commented Jan 16, 2025

Side note: looks like we can deduplicate the code in findUp()/findDown() even more. Looks like both functions are identical except that one uses FindIndex() while the other one uses FindAllIndex() and takes the last match. So we can merge them into one function, e.g. find() or findNext(), with down argument.

@matthias314
Copy link
Contributor Author

matthias314 commented Jan 16, 2025

I've combined findUp and findDown to findNext, with an additional argument down. However, I wonder if down is necessary at all. Could we replace it by start.LessEqual(end)? If we keep down, why don't we change the calls of findNext so that start is always less than or equal to end?

Moreover, the code at the beginning of findNext (copied from the old functions) is not fully clear to me. Why may start.X and end.X be modified (to different values!) before we test if start is greater than end?

One might think of making findNext even more general so that it can be used for ReplaceRegex, too (and also for the function FindNextSubmatch that I proposed in #3552).

EDIT: We might also want the first argument to findNext to be the full set of four regexps we're using.

internal/buffer/search.go Outdated Show resolved Hide resolved
@dmaluka
Copy link
Collaborator

dmaluka commented Jan 17, 2025

Moreover, the code at the beginning of findNext (copied from the old functions) is not fully clear to me. Why may start.X and end.X be modified (to different values!) before we test if start is greater than end?

When start or end is beyond the last line of the buffer, i.e. its Y is greater than b.LinesNum()-1, we clamp it to be exactly at the end of the buffer, so that we don't try to search in non-existent lines. (Even though LineBytes() returns an empty string for non-existent lines so in most cases search would probably work correctly even if we didn't clamp, but e.g. if the search pattern is just ^ or $, we would get unwanted matches in those non-existent lines.) So we set Y to the last line, and that's not enough, we also need to set X to the end of this last line. For example, if the buffer has 100 lines and the last line has 50 characters, and end passed to findDown() is 0, 200, we should change it to 50, 99, not to 0, 99 (so that we search up to the very end of the buffer, including the last line).

@matthias314 matthias314 force-pushed the m3/findnext-start-end branch from 8bd0cac to 3e16d5b Compare January 17, 2025 12:48
@dmaluka
Copy link
Collaborator

dmaluka commented Jan 17, 2025

LGTM, but I've just noticed that it still doesn't fix the issue in the case of replace -a.

So I guess we also need a similar fix in ReplaceRegex().

(Unless we treat this as the intended behavior, but then it's inconsistent with replace without -a...)

@matthias314
Copy link
Contributor Author

matthias314 commented Jan 17, 2025

I was thinking of modifying ReplaceRegex in a separate PR. But of course we could also do it here. Whatever you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants