Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Style/StringLiterals ignores strings with non-ascii characters #3017

Closed
deivid-rodriguez opened this issue Apr 7, 2016 · 8 comments · Fixed by #3189
Closed

Style/StringLiterals ignores strings with non-ascii characters #3017

deivid-rodriguez opened this issue Apr 7, 2016 · 8 comments · Fixed by #3189

Comments

@deivid-rodriguez
Copy link
Contributor

echo '"Esp"' | rubocop --stdin --only Style/StringLiterals -

results in

Inspecting 1 file
C

Offenses:

-:1:1: C: Prefer single-quoted strings when you don't need string interpolation or special symbols.
"Esp"
^^^^^

1 file inspected, 1 offense detected

whereas

echo '"España"' | rubocop --stdin --only Style/StringLiterals -

results in

Inspecting 1 file
.

1 file inspected, no offenses detected
$ rubocop -V
0.39.0 (using Parser 2.3.0.7, running on ruby 2.3.0 x86_64-linux)
@alexdowad
Copy link
Contributor

Hi! This was by design. Please see lines 231-238 of util.rb:

      # If double quoted string literals are found in Ruby code, and they are
      # not the preferred style, should they be flagged?
      def double_quotes_acceptable?(string)
        # If a string literal contains hard-to-type characters which would
        # not appear on a "normal" keyboard, then double-quotes are acceptable
        double_quotes_required?(string) ||
          string.codepoints.any? { |cp| cp < 32 || cp > 126 }
      end

...I don't remember what the reasoning was behind this code, but if you want to argue for something else, please do so. Anyways, this is not a "bug" in the sense of unintentional behavior.

@bbatsov
Copy link
Collaborator

bbatsov commented May 31, 2016

...I don't remember what the reasoning was behind this code, but if you want to argue for something else, please do so. Anyways, this is not a "bug" in the sense of unintentional behavior.

I don't remember this either.

@alexdowad
Copy link
Contributor

LOL. I think it was me who wrote it.

@deivid-rodriguez
Copy link
Contributor Author

Yeah, I was git-blaming this and it seems so! :)

@alexdowad
Copy link
Contributor

OK, I remember why this is so.

The Ruby parser doesn't differentiate between your string and "Espa\u00f1a". Both of them come out just the same in the AST which we analyze.

So when we find a double-quoted string literal with a \U00F1 in it, we don't flag it, because the programmer may have entered a literal \U00F1 and we don't want to force them to enter the funny little squiggly little "n" thing instead. We might confuse some poor ignorant souls, who don't know how to enter those fancy European characters. (Like... me...)

@deivid-rodriguez
Copy link
Contributor Author

@alexdowad I'm not sure I quite get it. What's the technical limitation to not fix this?

@alexdowad
Copy link
Contributor

The "technical limitation" is that if we fix the handling of "España", we will also fix the handling of "Espa\u001fa". And when I say we will "fix" it, I mean that in the sense of "breaking" it.

That is, unless you actually re-parse the source code for the string yourself, and distinguish between the 2 cases. The AST which parser gives us is the same for the 2 cases mentioned above.

@deivid-rodriguez
Copy link
Contributor Author

That is, unless you actually re-parse the source code for the string yourself, and distinguish between the 2 cases.

I guess this would be the way to go, then. Maybe I'll give it a try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants