Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ruff] Expand unnecessary-regular-expression with re.compile (RUF055) #14680

Open
sbrugman opened this issue Nov 29, 2024 · 5 comments
Open

[ruff] Expand unnecessary-regular-expression with re.compile (RUF055) #14680

sbrugman opened this issue Nov 29, 2024 · 5 comments
Labels
rule Implementing or modifying a lint rule

Comments

@sbrugman
Copy link
Contributor

sbrugman commented Nov 29, 2024

Often users use re.compile to compile a regex pattern. The returned regular expression object can be then used for the methods already caught by this rule. This is more efficient when the same pattern is reused.

Detecting these cases can be done with the same logic as in RUF055. Providing a fix involves more work, as the compiled pattern can be dynamically passed. Having the detection without the fix is already valuable. Alternatively, this could be a separate rule.

 re.compile(r"^tinyint", re.IGNORECASE)

without regex:

def tinyint(s: str) -> bool:
    return s.lower().startswith("tinyint")

Examples:

cc: @ntBre

@AlexWaygood AlexWaygood added the rule Implementing or modifying a lint rule label Nov 29, 2024
@MichaReiser
Copy link
Member

I'm not sure if I fully understand the proposal. Is it to detect any usage of re.compile(r"^tinyint", re.IGNORECASE) or only usages in boolean positions?

I'm asking because I'm not sure if the regex patterns can easily be replaced in the examples you linked because the compiled expressions are passed to some generic testing function.

@ntBre
Copy link
Contributor

ntBre commented Nov 29, 2024

Hmm, I thought this sounded straightforward at first (just catch re.compile with a literal non-meta-character pattern), but I think you really have to know where it's used to see if this is a problem. If any of the Match APIs are used, for example, the results will still be different from any plain str version.

For example, this could trigger the rule:

pat = re.compile("xyz")
if pat.match(s):
    pass

but we'd want something like this not to:

pat = re.compile("xyz")
m = pat.match(s)
# use m ...

Basically I think we'd need to extend #14679 to resolve string literals through a re.compile call.

@sbrugman
Copy link
Contributor Author

Indeed, the examples above are not as clear.

A true positive example that would be great to be able to catch: https://github.com/great-expectations/great_expectations/blob/develop/ci/checks/check_only_name_tag_snippets.py#L48

@AlexWaygood
Copy link
Member

A true positive example that would be great to be able to catch: great-expectations/great_expectations@develop/ci/checks/check_only_name_tag_snippets.py#L48

I would say even that is not entirely clear-cut: because it's a global variable, it could be read from another module, and you don't know how that other module might use the re.Pattern object

@sbrugman
Copy link
Contributor Author

Good point. Even though this example should use string operations instead of regexes, we can't be sure without analysing the other files.
An example where the compiled regex is in the function scope:
https://github.com/mit-biomimetics/Cheetah-Software/blob/master/scripts/lcm-log2smat/python/lcmlog2smat/scan_for_lcmtypes.py#L12

Some obvious cases:

https://github.com/EmpireProject/Empire/blob/master/data/agent/agent.py#L927
https://github.com/johnlane/abcde/blob/master/examples/abcde.py#L96

After looking at more examples, I'm inclined to think that a dedicated rule with warning severity (#1256) that only considers the re.compile with a literal non-meta-character pattern would better suit this proposal, rather than extending RUF055.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rule Implementing or modifying a lint rule
Projects
None yet
Development

No branches or pull requests

4 participants