Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code generator doesn't preserve individual string quote style #7799

Open
dhruvmanila opened this issue Oct 4, 2023 · 1 comment
Open

Code generator doesn't preserve individual string quote style #7799

dhruvmanila opened this issue Oct 4, 2023 · 1 comment
Assignees
Labels
bug Something isn't working fixes Related to suggested fixes for violations

Comments

@dhruvmanila
Copy link
Member

This is related to the Generator struct.

This is especially true for triple-quoted strings. The Generator will normalize them to single quotes and perform some transformations which may or may not be correct:

  • Escape the inner quotes if they match the outer ones
  • Preserve the whitespaces (for example, using \n)

Related issues:

@sanxiyn
Copy link
Contributor

sanxiyn commented Feb 15, 2024

I am interested in working on this.

AlexWaygood added a commit that referenced this issue Mar 8, 2024
This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information:
- The quoting style used (double or single quotes)
- Whether the string is triple-quoted or not
- Whether the string is raw or not

This PR is a followup to #10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the `ExprGenerator` in a followup PR, which will allow us to provide a fix for #7799.

The information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on `Tok::String`, `Tok::FStringStart` and `Tok::FStringMiddle` tokens in #10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node.

Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a `u` prefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.
nkxxll pushed a commit to nkxxll/ruff that referenced this issue Mar 10, 2024
This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information:
- The quoting style used (double or single quotes)
- Whether the string is triple-quoted or not
- Whether the string is raw or not

This PR is a followup to astral-sh#10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the `ExprGenerator` in a followup PR, which will allow us to provide a fix for astral-sh#7799.

The information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on `Tok::String`, `Tok::FStringStart` and `Tok::FStringMiddle` tokens in astral-sh#10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node.

Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a `u` prefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.
@AlexWaygood AlexWaygood self-assigned this May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixes Related to suggested fixes for violations
Projects
None yet
Development

No branches or pull requests

3 participants