-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NFKC normalization bug when removing unused imports #12571
Conversation
8644e30
to
41192cb
Compare
|
Thanks for looking into this bug. This looks interesting. It would help me review if you could update the summary and include a short summary of what the issue was/how this PR fixes the issue. |
Sorry, my bad. I provided an analysis of the bug in the issue, but forgot to copy it over in the PR summary. I'll do so first thing tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the change correctly, the issue is that libCST doesn't perform nfkc normalization. Is that correct?
Sorry, my bad. I provided an analysis of the bug in the issue, but forgot to copy it over in the PR summary. I'll do so first thing tomorrow.
Linking to the analysis from the issue in the comment should be sufficient. I only want to avoid that we merge the PR with a stale summary (commit message)
Yup, that's correct. Our lexer eagerly performs NFKC normalization to match Python's semantics, whereas libCST's lexer leaves NFKC confusables untouched. I don't think that's a bug in libCST, since it's a CST rather than an AST; but it is a difference that we need to account for when building |
CodSpeed Performance ReportMerging #12571 will improve performances by ×2Comparing Summary
Benchmarks breakdown
|
Fixes #12570.
Summary
Our lexer eagerly applies NFKC normalization to NKFC-confusable unicode characters; this is done to match Python's semantics at runtime, where the interpreter views these characters as the same. When removing unused imports, however, we use
libCST
, which doesn't apply the same normalization (it would be incorrect forlibCST
to do so, since it's a CST rather than an AST). This can lead to aQualifiedName
constructed from alibCST
node not comparing equal to aQualifiedName
constructed from aruff_python_ast
node that represents the same sapn of source code as thelibCST
node. The difference here between the two parsers is the root cause of #12570.This PR therefore takes care to apply NFKC normalization when constructing
QualifiedName
s fromlibCST
nodes, so that theseQualifiedName
s are always comparable toQualifiedName
s constructed fromruff_python_ast
nodes.Test plan
cargo test