Fix NFKC normalization bug when removing unused imports #12571

AlexWaygood · 2024-07-29T18:35:34Z

Summary

Our lexer eagerly applies NFKC normalization to NKFC-confusable unicode characters; this is done to match Python's semantics at runtime, where the interpreter views these characters as the same. When removing unused imports, however, we use libCST, which doesn't apply the same normalization (it would be incorrect for libCST to do so, since it's a CST rather than an AST). This can lead to a QualifiedName constructed from a libCST node not comparing equal to a QualifiedName constructed from a ruff_python_ast node that represents the same sapn of source code as the libCST node. The difference here between the two parsers is the root cause of #12570.

This PR therefore takes care to apply NFKC normalization when constructing QualifiedNames from libCST nodes, so that these QualifiedNames are always comparable to QualifiedNames constructed from ruff_python_ast nodes.

Test plan

cargo test

github-actions · 2024-07-29T18:58:12Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/Chat_finetuning_data_prep.ipynb:6:18:25: Unparenthesized generator expression cannot be used here
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_gmail.ipynb:15:1:1: Expected an expression
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_jira.ipynb:15:1:1: Expected an expression
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_sharepoint_doc.ipynb:28:1:5: Simple statements must be separated by newlines or semicolons
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_sharepoint_text.ipynb:28:1:5: Simple statements must be separated by newlines or semicolons
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_middleware_azure_function.ipynb:37:1:13: Simple statements must be separated by newlines or semicolons

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/Chat_finetuning_data_prep.ipynb:6:18:25: Unparenthesized generator expression cannot be used here
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_gmail.ipynb:15:1:1: Expected an expression
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_jira.ipynb:15:1:1: Expected an expression
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_sharepoint_doc.ipynb:28:1:5: Simple statements must be separated by newlines or semicolons
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_action_sharepoint_text.ipynb:28:1:5: Simple statements must be separated by newlines or semicolons
error: Failed to parse examples/chatgpt/gpt_actions_library/gpt_middleware_azure_function.ipynb:37:1:13: Simple statements must be separated by newlines or semicolons

MichaReiser · 2024-07-29T19:38:24Z

Thanks for looking into this bug. This looks interesting.

It would help me review if you could update the summary and include a short summary of what the issue was/how this PR fixes the issue.

AlexWaygood · 2024-07-29T19:43:43Z

Thanks for looking into this bug. This looks interesting.

It would help me review if you could update the summary and include a short summary of what the issue was/how this PR fixes the issue.

Sorry, my bad. I provided an analysis of the bug in the issue, but forgot to copy it over in the PR summary. I'll do so first thing tomorrow.

MichaReiser

If I understand the change correctly, the issue is that libCST doesn't perform nfkc normalization. Is that correct?

Sorry, my bad. I provided an analysis of the bug in the issue, but forgot to copy it over in the PR summary. I'll do so first thing tomorrow.

Linking to the analysis from the issue in the comment should be sufficient. I only want to avoid that we merge the PR with a stale summary (commit message)

crates/ruff_linter/src/fix/codemods.rs

AlexWaygood · 2024-07-30T09:22:17Z

If I understand the change correctly, the issue is that libCST doesn't perform nfkc normalization. Is that correct?

Yup, that's correct. Our lexer eagerly performs NFKC normalization to match Python's semantics, whereas libCST's lexer leaves NFKC confusables untouched. I don't think that's a bug in libCST, since it's a CST rather than an AST; but it is a difference that we need to account for when building QualifiedNames from libCST trees, since a QualifiedName represents semantic information, and a QualifiedName built from a libCST node should be comparable with a QualifiedName built from a ruff_python_ast node that's built from the same source.

codspeed-hq · 2024-07-30T09:54:43Z

CodSpeed Performance Report

Merging #12571 will improve performances by ×2

_{Comparing alex/f401-bug-2 (ffb7fd1) with alex/f401-bug-2 (ed8e5ed)}

Summary

⚡ 1 improvements
✅ 32 untouched benchmarks

Benchmarks breakdown

	Benchmark	`alex/f401-bug-2`	`alex/f401-bug-2`	Change
⚡	`red_knot_check_file[incremental]`	534 µs	264.4 µs	×2

Fix NFKC normalization bug when removing unused imports

41192cb

AlexWaygood force-pushed the alex/f401-bug-2 branch from 8644e30 to 41192cb Compare July 29, 2024 18:38

Add test

ed8e5ed

AlexWaygood marked this pull request as ready for review July 29, 2024 19:23

MichaReiser approved these changes Jul 30, 2024

View reviewed changes

crates/ruff_linter/src/fix/codemods.rs Outdated Show resolved Hide resolved

comments

ffb7fd1

AlexWaygood enabled auto-merge (squash) July 30, 2024 09:49

AlexWaygood added fixes Related to suggested fixes for violations linter Related to the linter labels Jul 30, 2024

AlexWaygood merged commit aaa56eb into main Jul 30, 2024
18 checks passed

AlexWaygood deleted the alex/f401-bug-2 branch July 30, 2024 09:54

BrewTestBot mentioned this pull request Aug 2, 2024

ruff 0.5.6 Homebrew/homebrew-core#179432

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix NFKC normalization bug when removing unused imports #12571

Fix NFKC normalization bug when removing unused imports #12571

AlexWaygood commented Jul 29, 2024 •

edited

Loading

github-actions bot commented Jul 29, 2024 •

edited

Loading

MichaReiser commented Jul 29, 2024 •

edited

Loading

AlexWaygood commented Jul 29, 2024

MichaReiser left a comment •

edited

Loading

AlexWaygood commented Jul 30, 2024

codspeed-hq bot commented Jul 30, 2024

Fix NFKC normalization bug when removing unused imports #12571

Fix NFKC normalization bug when removing unused imports #12571

Conversation

AlexWaygood commented Jul 29, 2024 • edited Loading

Summary

Test plan

github-actions bot commented Jul 29, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

MichaReiser commented Jul 29, 2024 • edited Loading

AlexWaygood commented Jul 29, 2024

MichaReiser left a comment • edited Loading

Choose a reason for hiding this comment

AlexWaygood commented Jul 30, 2024

codspeed-hq bot commented Jul 30, 2024

CodSpeed Performance Report

Merging #12571 will improve performances by ×2

Summary

Benchmarks breakdown

AlexWaygood commented Jul 29, 2024 •

edited

Loading

github-actions bot commented Jul 29, 2024 •

edited

Loading

`ruff-ecosystem` results

MichaReiser commented Jul 29, 2024 •

edited

Loading

MichaReiser left a comment •

edited

Loading