-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ForVariableUsedAfterBlock
(based on wemake_python_styleguide
ControlVarUsedAfterBlockViolation
WPS441
)
#11769
base: main
Are you sure you want to change the base?
Conversation
``` cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/pylint/control_var_used_after_block.py --no-cache --select WPS441 ```
Also we can rely on undefined detection rules for the case where people use a variable before a loop that is also defined as a loop var below.
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
RUF032 | 117 | 117 | 0 | 0 | 0 |
Fix loop var being used outside block. Before this change, we were always using the last room_id's retention policy for all events being filtered. I found this with astral-sh/ruff#11769
…lVarUsedAfterBlockViolation` `WPS441`) (#17272) Fix loop var being used outside block. Before this change, we were always using the last room_id's retention policy for all events being filtered. I found this bug with the [new lint rule, `ControlVarUsedAfterBlockViolation` `WPS441`](astral-sh/ruff#11769), that I re-implemented in `ruff`. Shout-out to @reivilibre for all the help in the beginning! ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
…lVarUsedAfterBlockViolation` `WPS441`) (element-hq#17272) Fix loop var being used outside block. Before this change, we were always using the last room_id's retention policy for all events being filtered. I found this bug with the [new lint rule, `ControlVarUsedAfterBlockViolation` `WPS441`](astral-sh/ruff#11769), that I re-implemented in `ruff`. Shout-out to @reivilibre for all the help in the beginning! ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
@@ -0,0 +1,90 @@ | |||
for global_var in []: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly ping @charliermarsh 🙇
Just looking for the next steps here (whether that be in a queue, pinging another person, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution.
What's the motivation for limiting the rule to with
and for
only? Should the rule apply to all compound statements?
@AlexWaygood mentioned that there reasons to access variables from inside a with
block. So maybe we should exclude with
statements.
...ruff_linter/resources/test/fixtures/wemake_python_styleguide/control_var_used_after_block.py
Outdated
Show resolved
Hide resolved
We discussed this internally and we agreed on:
The fact that |
A commit merged to the CPython repo earlier this morning has several examples in it: python/cpython@0697188. Here's a slightly simpler example than any of the functions that commit touched, from CPython's import unittest
from dataclasses import dataclass
class FrozenTests(unittest.TestCase):
def test_non_frozen_normal_derived_from_empty_frozen(self):
@dataclass(frozen=True)
class D:
pass
class S(D):
pass
s = S()
self.assertFalse(hasattr(s, 'x'))
s.x = 5
self.assertEqual(s.x, 5)
del s.x
self.assertFalse(hasattr(s, 'x'))
with self.assertRaises(AttributeError) as cm:
del s.x
self.assertNotIsInstance(cm.exception, FrozenInstanceError) The I think tests are the places where this comes up most often. You also see this pattern a lot with |
Conflicts: crates/ruff_linter/src/codes.rs crates/ruff_python_semantic/src/model.rs
Updated ✅ |
CodSpeed Performance ReportMerging #11769 will degrade performances by 5.26%Comparing Summary
Benchmarks breakdown
|
crates/ruff_linter/src/rules/ruff/rules/for_variable_used_after_block.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/for_variable_used_after_block.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/for_variable_used_after_block.rs
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/for_variable_used_after_block.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/for_variable_used_after_block.rs
Outdated
Show resolved
Hide resolved
The [`known_good_reference_node_ids`](astral-sh@eaf2a35) evolved first to solve the shadowed cases but my [fix later on to allow previous references](astral-sh@a7de1cb) also ended up covering the shadowed cases without me realizing (much more elegant too). See: - astral-sh#11769 (comment) - astral-sh#11769 (comment)
Conflicts: crates/ruff_linter/src/codes.rs
ControlVarUsedAfterBlockViolation
(based on wemake_python_styleguide
WPS441
)ForVariableUsedAfterBlock
(based on wemake_python_styleguide
WPS441
)
ForVariableUsedAfterBlock
(based on wemake_python_styleguide
WPS441
)ForVariableUsedAfterBlock
(based on wemake_python_styleguide
ControlVarUsedAfterBlockViolation
WPS441
)
/// The statement in which the [`Binding`] was defined. `None` if the [`Binding`] | ||
/// comes from a built-in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please double-check the accuracy of these comments. In any case, I'd like to document the cases so we don't have to run through this uncertainty in the future.
Some interesting cases from the ecosystem check Variable declared outside the loop I think the code here is correct because the variable is first defined outside the loop as being (many more) An easy fix here could be to skip the rule if a binding for the same name was already defined before the for loop. But it may also be intentional that we catch this as part of this rule. What do you think? Use in combination with try..except It seems to be a somewhat common pattern to use the for loop variable outside the for-block when combined with Multiple usages in the same statement It would be great if we only flagged each variable once (at least per statement?)
**Delete statements ** Control flow This here seems like a clear false positive but detecting it would be hard because it requires some form of control flow analysis https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/pandas/plotting/_matplotlib/boxplot.py#L520-L561 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look good to me but there are some common patterns that the rule isn't handling well which makes me a bit wary of shipping the rule as is. @AlexWaygood any thoughts on the examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split the conversation from #11769 up so we can better discuss each issue and resolve.
# Assign a variable before the loop | ||
room_id = 3 | ||
_ = room_id | ||
# Use the same variable name in a loop | ||
for room_id in []: | ||
_ = room_id | ||
pass | ||
|
||
# ❌ After the loop is not ok because the value is probably not what you expect | ||
_ = room_id | ||
|
||
# ❌ Augmented assignment is not allowed because the value is probably not what you expect | ||
room_id += 1 | ||
|
||
# Assigning again after the loop is ok | ||
room_id = 5 | ||
room_id += 1 | ||
_ = room_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable declared outside the loop
But it may also be intentional that we catch this as part of this rule. What do you think?
I think it's intentional that we catch the indico
example as part of the rule. We have an example in the test fixture file for this.
The other example from pandas
could be addressed with some fancy control flow analysis (see other discussion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think there are two different reasons why you want to avoid using a loop variable after the loop has finished:
- The first reason is that the loop might be empty, which means that the variable is never assigned:
Catching these feels like a "correctness" issue, that we should hopefully be able to do without very many false positives.
iterable = [] for x in iterable: pass print(x) # NameError
- The second is cases where we can be sure that the variable is defined, but we still forbid this pattern, perhaps on grounds of readability/style (it might not be obvious to the reader that the variable is mutated; there might be clearer ways of writing the code), or because there's a chance that the author of the code might not be familiar with Python's scoping rules, and might not have intended to reassign the variable by using the loop:
Sometimes code like this can indicate a bug, but it also might be working exactly as the author of the code intended; there's a lot of code out there that deliberately mutates a variable using a loop like this.
x = 42 for x in range(5): pass print(x) # oops, now it's 4?? The author of the code expected it to still be 42
The conclusion I draw from this is that we in fact need two rules here:
- One rule to catch instances where a loop variable is used after the loop and was not defined before the loop. This is almost always bad practice, as the loop variable might not be defined.
- One more opinionated rule that catches instances where the loop variable is definitely defined, but the mutation of the variable via the loop might not be desired. This might be a bug and might not be, depending on how experienced the developer is with Python's semantics.
For now I think I'd implement the first rule; we can consider the second rule as a followup.
room_id = 5 | ||
room_id += 1 | ||
_ = room_id | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use in combination with try..except
It seems to be a somewhat common pattern to use the for loop variable outside the for-block when combined with
try..except
The trio
example really lends itself well to the convenience that the non-block scoping provides. I think this is probably best resolved by adding a # noqa: RUF032
comment for the lint rule to explicitly mark the case as fine.
room_id = 5 | ||
room_id += 1 | ||
_ = room_id | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple usages in the same statement
It would be great if we only flagged each variable once (at least per statement?)
- src/_pytest/assertion/rewrite.py:492:40: RUF032
for
loop variable i is used outside of block- src/_pytest/assertion/rewrite.py:492:53: RUF032
for
loop variable i is used outside of block- src/_pytest/assertion/rewrite.py:492:66: RUF032
for
loop variable i is used outside of block
Seems like a decent improvement. Do we do this sort of deduplication for other rules? For the violation Diagnostic
, do we combine the TextRange
to cover the whole area?
room_id = 5 | ||
room_id += 1 | ||
_ = room_id | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete statements
We may explicitly want to allow usages in delete statements (example)
Seems harmless to allow 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usages in del
statements are just as problematic as other usages, since they also result in NameError
s if the variable is undefined:
>>> iterable = []
>>> for item in iterable:
... pass
...
>>> del item
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'item' is not defined. Did you mean: 'iter'?
room_id = 5 | ||
room_id += 1 | ||
_ = room_id | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Control flow
This here seems like a clear false positive but detecting it would be hard because it requires some form of control flow analysis
The pandas
example makes sense to allow if we can detect the context via control flow analysis. The other pandas
example makes sense for this as well since it has a return
statement to guard it every leaking below.
But I think it's good that we catch bokeh
examples as sketchy. I don't see how we could use control flow analysis to see that they're fine?
I'm assuming we don't want to actually dive into implementing the control flow analysis in this first iteration?
Summary
Add
ForVariableUsedAfterBlock
(RUF032
) based onControlVarUsedAfterBlockViolation
(WPS441
) from thewemake_python_styleguide
. This rule prevents you from usingfor
-loop variables outside of the loop block(or a.with
context variable)Example:
with
statementPart of #3845
This is spawning from a real-world pain where I accidentally used one for-loop control variable in another loop, see element-hq/synapse#17187 (comment)
Relevant wtfPython around loop variables leaking out, https://github.com/satwikkansal/wtfPython?tab=readme-ov-file#-loop-variables-leaking-out
Please bear in mind that this is my first Rust code so I'm probably doing a few things wrong 🙇. Shout out to @reivilibre who helped me in the beginning!
Test Plan
Tested against the fixture file:
Previous command before the rename
Also tested against the https://github.com/element-hq/synapse codebase. It does trigger in several spots that work because Python works this way but could be refactored to align with this rule. Some of the usages seem slightly sketchy but it requires some more context/time to determine whether they are bugs in the Synapse code.
Here is a real world bug that it caught: element-hq/synapse#17272
Dev notes
Running
ruff
against your test file with your new rule:Getting started
CONTRIBUTING.md#prerequisites
CONTRIBUTING.md#example-adding-a-new-lint-rule
Running the
pre-commit
hooks without installing to your system:python -m venv /home/eric/Downloads/ruff-venv source /home/eric/Downloads/ruff-venv/bin/activate pip install pre-commit
Debug
NodeId
/node_id
Add the following method to
crates/ruff_python_semantic/src/model.rs
and useprintln!("nodes {:#?}", checker.semantic().nodes());
. It would be really nice if the node index/ID was printed next to each one but you can manually number things to get by.Inspect the AST
Helpful related
ruff
rulesunused_variable
redefined_loop_name
too_many_nested_blocks
unused_loop_control_variable