Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSL: need to be able to clear regex capture groups #1401

Closed
johnkerl opened this issue Sep 25, 2023 · 4 comments · Fixed by #1451
Closed

DSL: need to be able to clear regex capture groups #1401

johnkerl opened this issue Sep 25, 2023 · 4 comments · Fixed by #1451
Assignees

Comments

@johnkerl
Copy link
Owner

Repro in #1399

@johnkerl johnkerl self-assigned this Dec 2, 2023
@johnkerl johnkerl added the active label Dec 2, 2023
@johnkerl
Copy link
Owner Author

johnkerl commented Dec 2, 2023

From #1399 by @archetyped:

Is there a way to clear regex capture groups after a match is performed so that strings like "\1" will not be replaced by a regex capture group?

Proof of concept:

echo a=testing | mlr put -q '
val = "Test something here";
search = "(something)"i;
pre_match = "\1";
match = val =~ search;
post_match = "\1";
print "Variable (Pre-Match): " . pre_match;
print "Variable (Post-Match): " . post_match;
'

Output:

Variable (Pre-Match): \1
Variable (Post-Match): something

Ideally there would be a way to set post_match to "\1" just like pre_match (rather than it being populated by the regex capture group). Escaping (e.g. "\\1") does not seem to have any effect.

After a lot of debugging, I found this behavior to be the reason why some text replacements were not working as expected. Interestingly even a negative (!=~) match seems to cause strings such as "\1" to be replaced by capture groups from the regex match e.g.:

if ( val !=~ search ) {
    # Stop processing
    return val;
}
# Strings defined here will be replaced by capture groups from the above regex match.
test = "\1";

The capture groups appear to even persist into function calls when a match is performed in an outer (calling) function, which makes it even harder to track down when it results in unexpected behavior.

@johnkerl
Copy link
Owner Author

@archeyped what do you think of #1448?

@johnkerl
Copy link
Owner Author

johnkerl commented Dec 19, 2023

@archetyped the reset logic is documented here:

https://miller.readthedocs.io/en/man #in/reference-main-regular-expressions/#resetting-captures

Please let me know if more is needed and we can re-open -- thanks!

(Also, the strmatch and strmatchx functions as in issue #283 are on PR #1448.)

@archetyped
Copy link

@johnkerl strmatch/strmatchx DSL functions look great, as does the scoping for regex capture groups when using =~.

Look forward to testing the updates when released.

Thanks!

@johnkerl johnkerl removed the active label Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants