Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ForVariableUsedAfterBlock (based on wemake_python_styleguide ControlVarUsedAfterBlockViolation WPS441) #11769

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

MadLittleMods
Copy link

@MadLittleMods MadLittleMods commented Jun 6, 2024

Summary

Add ForVariableUsedAfterBlock (RUF032) based on ControlVarUsedAfterBlockViolation (WPS441) from the wemake_python_styleguide. This rule prevents you from using for-loop variables outside of the loop block (or a with context variable).

Example:

# For control var used outside block
for event in []:
    _ = event
    pass

# ❌
_ = event
with statement
# with statement
with None as w:
    _ = w
    pass

# ❌
_ = w

Part of #3845

This is spawning from a real-world pain where I accidentally used one for-loop control variable in another loop, see element-hq/synapse#17187 (comment)

Relevant wtfPython around loop variables leaking out, https://github.com/satwikkansal/wtfPython?tab=readme-ov-file#-loop-variables-leaking-out

Please bear in mind that this is my first Rust code so I'm probably doing a few things wrong 🙇. Shout out to @reivilibre who helped me in the beginning!

Test Plan

Tested against the fixture file:

cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/ruff/for_variable_used_after_block.py --no-cache --select RUF032 --preview
Previous command before the rename
cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/wemake_python_styleguide/control_var_used_after_block.py --no-cache --select WPS441

Also tested against the https://github.com/element-hq/synapse codebase. It does trigger in several spots that work because Python works this way but could be refactored to align with this rule. Some of the usages seem slightly sketchy but it requires some more context/time to determine whether they are bugs in the Synapse code.

Here is a real world bug that it caught: element-hq/synapse#17272

Dev notes

Running ruff against your test file with your new rule:

cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/wemake_python_styleguide/control_var_used_after_block.py --no-cache --select WPS441

Getting started

Running the pre-commit hooks without installing to your system:

python -m venv /home/eric/Downloads/ruff-venv
source /home/eric/Downloads/ruff-venv/bin/activate
pip install pre-commit

Debug NodeId/node_id

Add the following method to crates/ruff_python_semantic/src/model.rs and use println!("nodes {:#?}", checker.semantic().nodes());. It would be really nice if the node index/ID was printed next to each one but you can manually number things to get by.

    #[inline]
    pub fn nodes(&self) -> &Nodes<'a> {
        &self.nodes
    }

Inspect the AST

cargo dev print-ast crates/ruff_linter/resources/test/fixtures/wemake_python_styleguide/control_var_used_after_block.py

Helpful related ruff rules

  • unused_variable
  • redefined_loop_name
  • too_many_nested_blocks
  • unused_loop_control_variable

```
cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/pylint/control_var_used_after_block.py --no-cache --select WPS441
```
Also we can rely on undefined detection rules for the case
where people use a variable before a loop that is also defined
as a loop var below.
Copy link
Contributor

github-actions bot commented Jun 6, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+117 -0 violations, +0 -0 fixes in 15 projects; 39 projects unchanged)

DisnakeDev/disnake (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ disnake/message.py:1104:28: RUF032 `for` loop variable index is used outside of block
+ disnake/message.py:1105:16: RUF032 `for` loop variable reaction is used outside of block
+ scripts/codemods/typed_permissions.py:84:33: RUF032 `for` loop variable b is used outside of block

RasaHQ/rasa (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ tests/shared/nlu/training_data/test_features.py:234:46: RUF032 `for` loop variable idx is used outside of block
+ tests/shared/nlu/training_data/test_features.py:234:67: RUF032 `for` loop variable idx is used outside of block

apache/airflow (+25 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ airflow/api/__init__.py:46:76: RUF032 `for` loop variable backend is used outside of block
+ airflow/models/param.py:284:67: RUF032 `for` loop variable k is used outside of block
+ airflow/providers/amazon/aws/sensors/s3.py:150:64: RUF032 `for` loop variable key is used outside of block
+ airflow/providers/amazon/aws/sensors/s3.py:154:41: RUF032 `for` loop variable key is used outside of block
+ airflow/providers/amazon/aws/sensors/s3.py:159:50: RUF032 `for` loop variable key is used outside of block
+ airflow/providers/apache/hive/hooks/hive.py:1015:59: RUF032 `for` loop variable i is used outside of block
+ airflow/providers/cncf/kubernetes/utils/pod_manager.py:476:60: RUF032 `for` loop variable line is used outside of block
+ airflow/providers/cncf/kubernetes/utils/pod_manager.py:479:60: RUF032 `for` loop variable line is used outside of block
+ airflow/providers/docker/operators/docker_swarm.py:221:32: RUF032 `for` loop variable line is used outside of block
+ airflow/timetables/events.py:103:47: RUF032 `for` loop variable next_event is used outside of block
... 15 additional changes omitted for project

apache/superset (+8 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ superset/cli/test_db.py:235:8: RUF032 `for` loop variable spec is used outside of block
+ superset/cli/test_db.py:238:21: RUF032 `for` loop variable spec is used outside of block
+ superset/cli/test_db.py:269:12: RUF032 `for` loop variable spec is used outside of block
+ superset/migrations/versions/2017-10-03_14-37_4736ec66ce19_.py:204:19: RUF032 `for` loop variable foreign is used outside of block
+ superset/migrations/versions/2020-08-12_00-24_978245563a02_migrate_iframe_to_dash_markdown.py:191:40: RUF032 `for` loop variable dashboard is used outside of block
+ superset/sql_parse.py:1539:53: RUF032 `for` loop variable dialect is used outside of block
+ superset/tasks/cache.py:269:31: RUF032 `for` loop variable class_ is used outside of block
+ superset/tasks/cache.py:271:20: RUF032 `for` loop variable class_ is used outside of block

bokeh/bokeh (+13 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ src/bokeh/application/handlers/document_lifecycle.py:70:13: RUF032 `for` loop variable callback is used outside of block
+ src/bokeh/embed/util.py:276:27: RUF032 `for` loop variable elementid is used outside of block
+ src/bokeh/embed/util.py:276:38: RUF032 `for` loop variable root is used outside of block
+ src/bokeh/embed/util.py:276:47: RUF032 `for` loop variable root is used outside of block
+ src/bokeh/embed/util.py:276:58: RUF032 `for` loop variable root is used outside of block
+ src/bokeh/plotting/_figure.py:483:13: RUF032 `for` loop variable kw is used outside of block
+ src/bokeh/plotting/_figure.py:484:46: RUF032 `for` loop variable kw is used outside of block
+ src/bokeh/plotting/_figure.py:488:35: RUF032 `for` loop variable kw is used outside of block
+ src/bokeh/util/compiler.py:116:38: RUF032 `for` loop variable i is used outside of block
+ tests/codebase/test_code_quality.py:112:12: RUF032 `for` loop variable line is used outside of block
... 3 additional changes omitted for project

ibis-project/ibis (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ ibis/backends/bigquery/tests/unit/udf/test_core.py:181:16: RUF032 `for` loop variable i is used outside of block

milvus-io/pymilvus (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ examples/milvus_client/rbac.py:97:17: RUF032 `for` loop variable role is used outside of block

pandas-dev/pandas (+18 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ pandas/__init__.py:19:25: RUF032 `for` loop variable _dependency is used outside of block
+ pandas/io/common.py:599:8: RUF032 `for` loop variable compression is used outside of block
+ pandas/io/common.py:600:16: RUF032 `for` loop variable compression is used outside of block
+ pandas/io/common.py:604:43: RUF032 `for` loop variable compression is used outside of block
+ pandas/io/formats/excel.py:668:25: RUF032 `for` loop variable lnum is used outside of block
+ pandas/io/formats/excel.py:673:29: RUF032 `for` loop variable lnum is used outside of block
+ pandas/io/formats/excel.py:678:27: RUF032 `for` loop variable lnum is used outside of block
+ pandas/plotting/_matplotlib/boxplot.py:557:16: RUF032 `for` loop variable ax is used outside of block
+ pandas/tests/frame/indexing/test_take.py:48:13: RUF032 `for` loop variable df is used outside of block
+ pandas/tests/frame/indexing/test_take.py:50:13: RUF032 `for` loop variable df is used outside of block
... 8 additional changes omitted for project

python-poetry/poetry (+4 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ src/poetry/masonry/builders/editable.py:182:27: RUF032 `for` loop variable scripts_path is used outside of block
+ src/poetry/masonry/builders/editable.py:184:64: RUF032 `for` loop variable scripts_path is used outside of block
+ src/poetry/masonry/builders/editable.py:207:28: RUF032 `for` loop variable scripts_path is used outside of block
+ tests/utils/env/test_env.py:303:37: RUF032 `for` loop variable dist is used outside of block

rotki/rotki (+7 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ rotkehlchen/api/rest.py:559:49: RUF032 `for` loop variable idx is used outside of block
+ rotkehlchen/chain/zksync_lite/manager.py:828:25: RUF032 `for` loop variable target is used outside of block
+ rotkehlchen/tests/db/test_db_upgrades.py:2603:12: RUF032 `for` loop variable idx is used outside of block
+ rotkehlchen/tests/integration/test_zksynclite.py:108:25: RUF032 `for` loop variable idx is used outside of block
+ rotkehlchen/tests/integration/test_zksynclite.py:91:25: RUF032 `for` loop variable idx is used outside of block
+ rotkehlchen/tests/unit/globaldb/test_globaldb_consistency.py:329:75: RUF032 `for` loop variable key is used outside of block
+ rotkehlchen/tests/unit/globaldb/test_globaldb_consistency.py:329:84: RUF032 `for` loop variable packaged_field is used outside of block

zulip/zulip (+10 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ zerver/actions/streams.py:628:16: RUF032 `for` loop variable stream_id is used outside of block
+ zerver/actions/streams.py:631:71: RUF032 `for` loop variable stream_id is used outside of block
+ zerver/lib/markdown/__init__.py:1036:75: RUF032 `for` loop variable size_name is used outside of block
+ zerver/tests/test_auth_backends.py:239:32: RUF032 `for` loop variable backend_name is used outside of block
+ zerver/tests/test_auth_backends.py:256:32: RUF032 `for` loop variable backend_name is used outside of block
+ zerver/tests/test_message_send.py:1686:30: RUF032 `for` loop variable user is used outside of block
+ zerver/tests/test_message_send.py:1699:30: RUF032 `for` loop variable user is used outside of block
+ zerver/tests/test_user_topics.py:124:44: RUF032 `for` loop variable data is used outside of block
+ zerver/tests/test_user_topics.py:129:9: RUF032 `for` loop variable data is used outside of block
+ zerver/tests/test_user_topics.py:131:48: RUF032 `for` loop variable data is used outside of block

indico/indico (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ bin/utils/storage_checksums.py:43:12: RUF032 `for` loop variable row is used outside of block
+ bin/utils/storage_checksums.py:46:57: RUF032 `for` loop variable row is used outside of block

python-trio/trio (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ notes-to-self/socketpair-buffering.py:35:44: RUF032 `for` loop variable _count is used outside of block
+ src/trio/_tests/check_type_completeness.py:106:90: RUF032 `for` loop variable obj_name is used outside of block

pytest-dev/pytest (+20 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ src/_pytest/assertion/rewrite.py:492:40: RUF032 `for` loop variable i is used outside of block
+ src/_pytest/assertion/rewrite.py:492:53: RUF032 `for` loop variable i is used outside of block
+ src/_pytest/assertion/rewrite.py:492:66: RUF032 `for` loop variable i is used outside of block
+ src/_pytest/assertion/rewrite.py:495:12: RUF032 `for` loop variable expl is used outside of block
+ src/_pytest/assertion/rewrite.py:713:23: RUF032 `for` loop variable item is used outside of block
+ src/_pytest/assertion/rewrite.py:713:50: RUF032 `for` loop variable item is used outside of block
+ src/_pytest/assertion/rewrite.py:714:22: RUF032 `for` loop variable item is used outside of block
+ src/_pytest/assertion/rewrite.py:716:22: RUF032 `for` loop variable item is used outside of block
+ src/_pytest/assertion/truncate.py:111:37: RUF032 `for` loop variable iterated_index is used outside of block
+ src/_pytest/assertion/truncate.py:112:30: RUF032 `for` loop variable iterated_index is used outside of block
... 10 additional changes omitted for project

... Truncated remaining completed project reports due to GitHub comment length restrictions

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
RUF032 117 117 0 0 0

MadLittleMods added a commit to element-hq/synapse that referenced this pull request Jun 6, 2024
Fix loop var being used outside block.

Before this change, we were always using the last room_id's
retention policy for all events being filtered.

I found this with astral-sh/ruff#11769
MadLittleMods added a commit to element-hq/synapse that referenced this pull request Jun 10, 2024
…lVarUsedAfterBlockViolation` `WPS441`) (#17272)

Fix loop var being used outside block.

Before this change, we were always using the last room_id's retention policy for all events being filtered.

I found this bug with the [new lint rule, `ControlVarUsedAfterBlockViolation` `WPS441`](astral-sh/ruff#11769), that I re-implemented in `ruff`. Shout-out to @reivilibre for all the help in the beginning! 

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
Mic92 pushed a commit to Mic92/synapse that referenced this pull request Jun 14, 2024
…lVarUsedAfterBlockViolation` `WPS441`) (element-hq#17272)

Fix loop var being used outside block.

Before this change, we were always using the last room_id's retention policy for all events being filtered.

I found this bug with the [new lint rule, `ControlVarUsedAfterBlockViolation` `WPS441`](astral-sh/ruff#11769), that I re-implemented in `ruff`. Shout-out to @reivilibre for all the help in the beginning! 

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
@@ -0,0 +1,90 @@
for global_var in []:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly ping @charliermarsh 🙇

Just looking for the next steps here (whether that be in a queue, pinging another person, etc)

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.

What's the motivation for limiting the rule to with and for only? Should the rule apply to all compound statements?

@AlexWaygood mentioned that there reasons to access variables from inside a with block. So maybe we should exclude with statements.

@MichaReiser
Copy link
Member

MichaReiser commented Jul 29, 2024

We discussed this internally and we agreed on:

  • Using the for introduced variable outside of a loop is likely a bug. There are cases where this is intentional, but that's when using suppressions is fine.
  • Using the with introduced variable after the with block is more common and often acceptable. @AlexWaygood would you mind sharing an example?
  • There's a similar problem related to variables introduced by match case
    import random
    
    match random.choice([True, False]):
        case True: False
        case x:
            y = 2
    
    
    print(x)

The fact that for is most likely an error and with is not justifies that this rule should be split into two rules. I suggest that we start with the for loop and add it to the RUF ruleset.

@AlexWaygood
Copy link
Member

Using the with introduced variable after the with block is more common and often acceptable. @AlexWaygood would you mind sharing an example?

A commit merged to the CPython repo earlier this morning has several examples in it: python/cpython@0697188.

Here's a slightly simpler example than any of the functions that commit touched, from CPython's dataclasses test suite: https://github.com/python/cpython/blob/9187484dd97f6beb94fc17676014706922e380e1/Lib/test/test_dataclasses/__init__.py#L3121C1-L3138C68:

import unittest
from dataclasses import dataclass

class FrozenTests(unittest.TestCase):
    def test_non_frozen_normal_derived_from_empty_frozen(self):
        @dataclass(frozen=True)
        class D:
            pass

        class S(D):
            pass

        s = S()
        self.assertFalse(hasattr(s, 'x'))
        s.x = 5
        self.assertEqual(s.x, 5)

        del s.x
        self.assertFalse(hasattr(s, 'x'))
        with self.assertRaises(AttributeError) as cm:
            del s.x
        self.assertNotIsInstance(cm.exception, FrozenInstanceError)

The with self.assertRaises block is kept as small as possible, because the author of the test wanted to assert that the AttributeError exception is raised on exactly one line: the del s.x call. If an AttributeError was raised on another line, that would indicate a bug in the test or in the library; that should fail. So the with block is only one line long; but you want to save the cm binding that's created in the with statement so that you can do some extra assertions on it after the with block has ended.

I think tests are the places where this comes up most often. You also see this pattern a lot with pytest-style tests that use with pytest.raises() context managers; it's not just unittest-style tests.

@MadLittleMods
Copy link
Author

I suggest that we start with the for loop and add it to the RUF ruleset.

Updated ✅

Copy link

codspeed-hq bot commented Aug 2, 2024

CodSpeed Performance Report

Merging #11769 will degrade performances by 5.26%

Comparing MadLittleMods:madlittlemods/control-var-used-after-block (9bcc4fe) with main (c906b01)

Summary

❌ 1 regressions
✅ 31 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main MadLittleMods:madlittlemods/control-var-used-after-block Change
linter/all-rules[numpy/globals.py] 726.2 µs 766.5 µs -5.26%

The [`known_good_reference_node_ids`](astral-sh@eaf2a35)
evolved first to solve the shadowed cases but my
[fix later on to allow previous references](astral-sh@a7de1cb)
also ended up covering the shadowed cases without
me realizing (much more elegant too).

See:

 - astral-sh#11769 (comment)
 - astral-sh#11769 (comment)
@dhruvmanila dhruvmanila added rule Implementing or modifying a lint rule preview Related to preview mode features labels Aug 9, 2024
@MadLittleMods MadLittleMods changed the title Implement ControlVarUsedAfterBlockViolation (based on wemake_python_styleguide WPS441) Implement ForVariableUsedAfterBlock (based on wemake_python_styleguide WPS441) Aug 9, 2024
@MadLittleMods MadLittleMods changed the title Implement ForVariableUsedAfterBlock (based on wemake_python_styleguide WPS441) Implement ForVariableUsedAfterBlock (based on wemake_python_styleguide ControlVarUsedAfterBlockViolation WPS441) Aug 9, 2024
Comment on lines +28 to +29
/// The statement in which the [`Binding`] was defined. `None` if the [`Binding`]
/// comes from a built-in.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double-check the accuracy of these comments. In any case, I'd like to document the cases so we don't have to run through this uncertainty in the future.

@MichaReiser
Copy link
Member

MichaReiser commented Aug 9, 2024

Some interesting cases from the ecosystem check

Variable declared outside the loop

I think the code here is correct because the variable is first defined outside the loop as being None.

https://github.com/indico/indico/blob/a22e0c0a9c0cf4d40d04029489dc530b4819d2ea/bin/utils/storage_checksums.py#L39-L43

https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/pandas/io/common.py#L581-L599

(many more)

An easy fix here could be to skip the rule if a binding for the same name was already defined before the for loop. But it may also be intentional that we catch this as part of this rule. What do you think?

Use in combination with try..except

It seems to be a somewhat common pattern to use the for loop variable outside the for-block when combined with try..except

https://github.com/python-trio/trio/blob/c9baca6045d427fd4cc0de2ca396ddeb4a0f40a5/src/trio/_tests/check_type_completeness.py#L62-L109

Multiple usages in the same statement

It would be great if we only flagged each variable once (at least per statement?)

**Delete statements **
This is interesting. We may explicitly want to allow usages in delete statements

https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/pandas/__init__.py#L19

Control flow

This here seems like a clear false positive but detecting it would be hard because it requires some form of control flow analysis

https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/pandas/plotting/_matplotlib/boxplot.py#L520-L561
https://github.com/bokeh/bokeh/blob/829b2a75c402d0d0abd7e37ff201fbdfd949d857/src/bokeh/embed/util.py#L266-L276
https://github.com/bokeh/bokeh/blob/829b2a75c402d0d0abd7e37ff201fbdfd949d857/src/bokeh/plotting/_figure.py#L476-L486

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look good to me but there are some common patterns that the rule isn't handling well which makes me a bit wary of shipping the rule as is. @AlexWaygood any thoughts on the examples?

Copy link
Author

@MadLittleMods MadLittleMods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the conversation from #11769 up so we can better discuss each issue and resolve.

Comment on lines +31 to +48
# Assign a variable before the loop
room_id = 3
_ = room_id
# Use the same variable name in a loop
for room_id in []:
_ = room_id
pass

# ❌ After the loop is not ok because the value is probably not what you expect
_ = room_id

# ❌ Augmented assignment is not allowed because the value is probably not what you expect
room_id += 1

# Assigning again after the loop is ok
room_id = 5
room_id += 1
_ = room_id
Copy link
Author

@MadLittleMods MadLittleMods Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable declared outside the loop

But it may also be intentional that we catch this as part of this rule. What do you think?

I think it's intentional that we catch the indico example as part of the rule. We have an example in the test fixture file for this.

The other example from pandas could be addressed with some fancy control flow analysis (see other discussion)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think there are two different reasons why you want to avoid using a loop variable after the loop has finished:

  1. The first reason is that the loop might be empty, which means that the variable is never assigned:
    iterable = []
    for x in iterable:
        pass
    print(x)  # NameError
    Catching these feels like a "correctness" issue, that we should hopefully be able to do without very many false positives.
  2. The second is cases where we can be sure that the variable is defined, but we still forbid this pattern, perhaps on grounds of readability/style (it might not be obvious to the reader that the variable is mutated; there might be clearer ways of writing the code), or because there's a chance that the author of the code might not be familiar with Python's scoping rules, and might not have intended to reassign the variable by using the loop:
    x = 42
    for x in range(5):
         pass
    print(x)  # oops, now it's 4?? The author of the code expected it to still be 42
    Sometimes code like this can indicate a bug, but it also might be working exactly as the author of the code intended; there's a lot of code out there that deliberately mutates a variable using a loop like this.

The conclusion I draw from this is that we in fact need two rules here:

  1. One rule to catch instances where a loop variable is used after the loop and was not defined before the loop. This is almost always bad practice, as the loop variable might not be defined.
  2. One more opinionated rule that catches instances where the loop variable is definitely defined, but the mutation of the variable via the loop might not be desired. This might be a bug and might not be, depending on how experienced the developer is with Python's semantics.

For now I think I'd implement the first rule; we can consider the second rule as a followup.

room_id = 5
room_id += 1
_ = room_id

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use in combination with try..except

It seems to be a somewhat common pattern to use the for loop variable outside the for-block when combined with try..except

The trio example really lends itself well to the convenience that the non-block scoping provides. I think this is probably best resolved by adding a # noqa: RUF032 comment for the lint rule to explicitly mark the case as fine.

room_id = 5
room_id += 1
_ = room_id

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple usages in the same statement

It would be great if we only flagged each variable once (at least per statement?)

Seems like a decent improvement. Do we do this sort of deduplication for other rules? For the violation Diagnostic, do we combine the TextRange to cover the whole area?

room_id = 5
room_id += 1
_ = room_id

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete statements

We may explicitly want to allow usages in delete statements (example)

Seems harmless to allow 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usages in del statements are just as problematic as other usages, since they also result in NameErrors if the variable is undefined:

>>> iterable = []
>>> for item in iterable:
...     pass
... 
>>> del item
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'item' is not defined. Did you mean: 'iter'?

room_id = 5
room_id += 1
_ = room_id

Copy link
Author

@MadLittleMods MadLittleMods Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Control flow

This here seems like a clear false positive but detecting it would be hard because it requires some form of control flow analysis

The pandas example makes sense to allow if we can detect the context via control flow analysis. The other pandas example makes sense for this as well since it has a return statement to guard it every leaking below.

But I think it's good that we catch bokeh examples as sketchy. I don't see how we could use control flow analysis to see that they're fine?


I'm assuming we don't want to actually dive into implementing the control flow analysis in this first iteration?

@MichaReiser MichaReiser added the needs-decision Awaiting a decision from a maintainer label Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision Awaiting a decision from a maintainer preview Related to preview mode features rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants