-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ruff
] Extend unnecessary-regular-expression to non-literal strings (RUF055
)
#14679
Merged
Merged
Changes from 16 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
3dec810
rename ruf055.py and snapshot
ntBre efcc4cf
add is_str
ntBre 84d090d
resolve a sequence of assignments
ntBre 6a69402
simplify lifetimes
ntBre 8f9a118
reuse find_binding_value
ntBre 55cbafb
don't break literal strings
ntBre 9154fe6
clippy
ntBre 8d2539a
add test, include named expr assignment
ntBre c17e9ab
move resolve_name back to free function and use to check `sub` repl
ntBre 1817fb6
test repl resolution
ntBre 76b9bd8
update snapshot
ntBre b48fb18
Revert "add is_str" since it's unused
ntBre f778523
add back comment
ntBre 7668328
simplify lifetimes
ntBre 16a5073
rename to resolve_string_literal
ntBre a63fa11
only resolve direct references, update test
ntBre 5ca3b1f
Reapply "add is_str"
ntBre e9a6384
use is_str for repl and test annotated function argument
ntBre 41b52dd
revert is_str changes
ntBre 5afa545
check for escapes in repl
ntBre adb14d6
update test case
ntBre 2097142
Fix other edge cases
AlexWaygood File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
File renamed without changes.
17 changes: 17 additions & 0 deletions
17
crates/ruff_linter/resources/test/fixtures/ruff/RUF055_1.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
"""Test that RUF055 can follow a single str assignment for both the pattern and | ||
the replacement argument to re.sub | ||
""" | ||
|
||
import re | ||
|
||
pat1 = "needle" | ||
|
||
re.sub(pat1, "", haystack) | ||
|
||
# aliases are not followed, so this one should not trigger the rule | ||
if pat4 := pat1: | ||
re.sub(pat4, "", haystack) | ||
|
||
# also works for the `repl` argument in sub | ||
repl = "new" | ||
re.sub(r"abc", repl, haystack) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
40 changes: 40 additions & 0 deletions
40
...rc/rules/ruff/snapshots/ruff_linter__rules__ruff__tests__preview__RUF055_RUF055_1.py.snap
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
--- | ||
source: crates/ruff_linter/src/rules/ruff/mod.rs | ||
snapshot_kind: text | ||
--- | ||
RUF055_1.py:9:1: RUF055 [*] Plain string pattern passed to `re` function | ||
| | ||
7 | pat1 = "needle" | ||
8 | | ||
9 | re.sub(pat1, "", haystack) | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ RUF055 | ||
10 | | ||
11 | # aliases are not followed, so this one should not trigger the rule | ||
| | ||
= help: Replace with `haystack.replace(pat1, "")` | ||
|
||
ℹ Safe fix | ||
6 6 | | ||
7 7 | pat1 = "needle" | ||
8 8 | | ||
9 |-re.sub(pat1, "", haystack) | ||
9 |+haystack.replace(pat1, "") | ||
10 10 | | ||
11 11 | # aliases are not followed, so this one should not trigger the rule | ||
12 12 | if pat4 := pat1: | ||
|
||
RUF055_1.py:17:1: RUF055 [*] Plain string pattern passed to `re` function | ||
| | ||
15 | # also works for the `repl` argument in sub | ||
16 | repl = "new" | ||
17 | re.sub(r"abc", repl, haystack) | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RUF055 | ||
| | ||
= help: Replace with `haystack.replace("abc", repl)` | ||
|
||
ℹ Safe fix | ||
14 14 | | ||
15 15 | # also works for the `repl` argument in sub | ||
16 16 | repl = "new" | ||
17 |-re.sub(r"abc", repl, haystack) | ||
17 |+haystack.replace("abc", repl) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, I think the need here is slightly different to the need on lines 95-96. On lines 95-96, we do need to know the value of the string in order to be able to check it doesn't have any metacharacters in it (so only a string literal will do, or something that we can resolve to a string literal). But here, we just need to know it's a string; any string will do, as long as the user isn't passing in a function.
Is that the case? If so, it might be worth adding back the
is_str
function you added in efcc4cf and using that here, rather than usingresolve_string_literal
in both places. The advantage of theis_str
technique is that it also understands basic type hints, e.g. it would understand thatre.sub()
is being passed a string for therepl
argument in something like this:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I look at it, maybe we do need to know (and analyze) the value of the
repl
string for a fully accurate analysis, though. For example, it seems like the initial version of the check that we merged yesterday emits a false-positive diagnostic (and incorrect autofix) for this:Now, this is a massive edge case -- I had to work quite hard to find it! I believe the only way you get a false positive with the rule's current logic is if there's a
\g
in the replacement string but no backslashes or metacharacters in the pattenr string, and it's almost impossible to think of a way you could plausibly have are.sub()
call with those characteristics. So maybe we shouldn't worry about this -- I'm interested in your thoughts and @MichaReiser's!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, good catch! I was working on adding
is_str
back in, but maybe instead I need to check for metacharacters inrepl
too.I thought we were safe from backreferences by avoiding
(
in the pattern, but I overlooked\g<0>
. That exact sequence seems like the only way to trigger this behavior?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes! Although we also emit a RUF055 diagnostic on invalid
re.sub()
calls like this, and maybe we should just ignore them? It feels like it might be outside of this rule's purview to autofix invalidre.sub()
calls into validstr.replace()
calls. We probably don't really know what the user intended exactly if there.sub()
call is invalid:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added back
is_str
locally, along with your function argument test case. It's really nice to handle that case, but I'm a bit bothered by this edge case too, so I could go either way. I'm interested to hear which approach you and Micha think is best overall.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you push the version with
is_str
to this PR, and we can see if it results in any more ecosystem hits? That might give us some more data on how useful it is to be able to detect that therepl
argument is a string from the function annotationThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it doesn't look like it adds any new ecosystem hits :/
I guess in that case, I'd vote for removing
is_str
again, and fixing the false positives on\g<0>
and\1
inrepl
arguments.Thanks for putting up with my pernickitiness here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, thanks for the thorough review! Should I reuse the other code to reject any metacharacters, or are references to named or numbered capture groups the only problems? I'm picturing checking for
\
followed byg
or1
through9
. That seems a bit nicer than rejecting any metacharacter like I did for the patterns but possibly less safe.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think actually we could check for
\
followed by any ASCII character except one ofabfnrtv
. Other than0-9
andg
(which both have special behaviour inrepl
strings, as we've just been discussing!), I believe those are the only ASCII escapes that will be permitted in arepl
string byre.sub()
, Anything else causesre.PatternError
to be raised -- meaning it's probably out of scope for us to emit this diagnostic on it:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you think about this version. If it looks good, it might be nice to reuse this escape check for
pattern
as well instead of rejecting\
entirely.