-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise non-silent warning for invalid escape sequences #77093
Comments
This is a follow-up to bpo-27364. Back in Python 3.6, a silent warning was added for all invalid escape sequences in str and bytes. It was suggested that it would remain a silent warning (which still impacts tests, while not visually annoying the average user) for two releases (3.6 and 3.7), then would be upgraded to a non-silent warning for two subsequent releases (3.8 and 3.9) before becoming a full-on syntax error. With the 3.7 feature freeze on and going, I think it's time to evaluate the approach we take for 3.8 :) I suggest upgrading the DeprecationWarning to a SyntaxWarning, which is visible by default, for 3.8 and 3.9. I have cross-linked bpo-27364 to this issue as well. -Em |
+1 for the DeprecationWarning->SyntaxWarning->SyntaxError approach from me (especially as 3.7 will make the existing deprecation warning visible in interactive shells and __main__ modules by default). |
PR 5849 changes not only the Python parser, but codecs. It shouldn't. The Python parser and codecs will go different ways. The warning in the Python parser will finally be upgraded to SyntaxError (it is already replaced with SyntaxError if the warning is raised as error). The warning in codecs will become UnicodeDecodeError or ValueError. This is why the code for emitting the warning is not shared between the parser and codecs at first place. |
I'm seeing these warnings pop up from time to time with various third party packages (such as pyparsing which is invoked by ensurepip). Am wondering whether this should be deferred for another version or so until be have good confidence that the major third party packages have all had a chance to adapt. Otherwise, this will mostly be noise for end-users. Also, it gets in the way of the end-user strategy of "backslash anything that looks special" as preferred over "remember exactly which things are special". This may be a case of "the cure is worse than the disease". |
I agree with Raymond that third party libraries are not ready for this. My biggest issue is that the way Python warns about this makes it very difficult for library authors to fix this. Most won't even notice. The problem is the warnings are only shown once, when the file is compiled. So if you run the code in any way that causes Python to compile it, a further run of 'python -Wd' will show nothing. I don't know if it's reasonable, but it would be nice if Python recompiled when given -Wd, or somehow saved the warning so it could be shown if that flag is given later. As an anecdote, for SymPy's CI, we went through five (if I am counting correctly) iterations of trying to test this. Each of the first four were subtly incorrect, until we finally managed to find the correct one (for reference, 'python -We:invalid -m compileall -f -q module/'). So most library authors who will attempt to add tests against this will get it wrong. Simply adding -Wd as you would expect is wrong. If the code is already compiled (which it probably is, e.g., if you ran setup.py), it won't show the warnings. At the very least the "correct" way to test this should be documented. Things would probably be improved if the warnings were always shown, as at least then devs will see the error once (although most will probably be confused when the warning doesn't repeat). Another problem is the information in the warnings. It seems the line number of the string is now shown, which is an improvement (https://bugs.python.org/issue28128). It would be nice if it showed the actual line and column number in the file of the invalid escape. This is especially annoying when an escape appears in a docstring. It just shows """ as the offending line. We have a lot of LaTeX in our docstrings in SymPy, so we had quite a few of these to fix. SymPy doesn't have invalid escapes anymore because I was proactive about it, but from what I've seen, most library authors haven't been. By the way, this looks like a bug (python 3.8b1): $ cat test.py
"""
a
\p
"""
$ python -We:invalid test.py
File "test.py", line 2
^
SyntaxError: invalid escape sequence \p
$ It might not be possible to see in what I pasted, but it filled my terminal with spaces. |
In the particular case of pyparsing there was a bug in the docstring. def sub(self, repl):
"""
Return Regex with an attached parse action to transform the parsed
result as if called using `re.sub(expr, repl, string) <https://docs.python.org/3/library/re.html#re.sub>`_.
Example::
make_html = Regex(r"(\w+):(.*?):").sub(r"<\1>\2</\1>")
print(make_html.transformString("h1:main title:"))
# prints "<h1>main title</h1>"
""" \1 and \2 were interpreted as control characters U+0001 and U+0002, so the user could see make_html = Regex(r"(\w+):(.*?):").sub(r"<^A>^B</^A>") or make_html = Regex(r"(\w+):(.*?):").sub(r"<☺>☻</☺>") or make_html = Regex(r"(\w+):(.*?):").sub(r"<></>") depending on the output device. So this examples proves the usefulness of the new warning. |
Aaron, you have reported several different issues in your message:
And maybe more. I think all of them deserve opening separate issues. |
I agree. Please someone else do that. I don't know what already has issues and I unfortunately don't have time right now to help out with any of this. I simply mentioned all these things as arguments why Python should not (yet) make these warnings errors, which is the point of this issue. Also, for the pyparsing example, you would have gotten lucky because it also contained \w, which is not a valid escape. If it didn't, you wouldn't be warned. So clearly this will help things, but it will also be good to have linting tools that, for example, warn about any escape sequences inside docstrings. |
[Raymond]
That's not a good strategy in the first place, though: adding an extra backslash for something that doesn't need to be escaped isn't benign - it's usually an error, since it puts that backslash into the resulting string. >>> "abc\`d" == "abc`d"
False In other words, it seems to me that getting in the way of this broken end-user strategy is a *good* thing, since it warns of possible mistakes. |
[Aaron]
If you're willing to use 3rd party libraries, then my own experience is that |
Here's an example from the current version of Bottle (0.12.17): /Users/raymond/Dropbox/Public/sj205/notes2/bottle.py:3433: SyntaxWarning: invalid escape sequence \[ These warnings would spew out during a recent Python course where we used 3.8b2. It mae it difficult to teach building templates in an iterative style. We had to abandon 3.8 and go back to 3.7 to proceed with the templating lesson. IMO, spewing out these warnings for things users can't do anything about is a usability travesty. |
Here's another example from the current build of the docs using Sphinx: (venv) ~/npython/Doc $ make html |
I think it is poor form to bombard end-users with warnings about things they can't fix. |
The warning in docutils was fixed in https://sourceforge.net/p/docutils/code/8255/ 1.5 months ago. It was an outliner, all other occurrences of \leavevmode were either with the double backslash or in raw string literals. The Bottle code was written 5 years age (bottlepy/bottle@8a5bfe2), it was a part of the raw string. But a bug was introduced when it it was backported to the 0.12 version (bottlepy/bottle@1f6fda6). |
I'm starting to see this error even for plain text strings in existing code:
These not easily suppressed messages are really annoying. We don't have to inflict this on our users. IMO, this is the least user friendly change to Python 3.8 and as far as I can tell, it is entirely unnecessary (we've lived without it for 28+ years). |
AFAICT, I'm currently one of the only people in the world using the 3.8 beta for my work on a daily basis. I've encountered these warnings multiple times in multiple contexts. If we think other people won't experience this recurring annoyance, we're in denial. The entire point of beta testing is to discover problems like this. However, the participants in this thread seem inclined to ignore the evidence and persist with the idea that inflicting this on users makes the language better. Respectfully, I disagree. |
Raymond, are you in agreement that these warnings should at some point eventually become syntax errors? |
I used to think so, but after experiencing the incessant warnings, I question the value. In inactive sessions (either with the regular REPL or the ipython REPL), they are a recurring annoyance that interferes with data exploration live demos. Perhaps, this should be left for a lint or code analysis tool. Why should we intentionally break code that is currently working fine. Another issue that I've encountered is that ASCII art becomes gets flagged. Switching to a raw string then kills the unicode escape sequences. This isn't really a "Raymond doesn't like this" concern. I think anyone who starts using 3.8 on a daily basis for non-toy examples will constantly run into this. Possibly, it catches a real error, but most often it will just be a recurring distractor, especially when teaching Python. All instructors and runners of live demos will need to memorize exactly which characters require an escape and which don't -- it creates a new burden that didn't exist before. And if the source of the problem is in an external library, the result is unactionable by the user, merely making their experience unpleasant. |
Well paradoxically, the bugs that this prevents are the ones it doesn't warn about. If someone writes '\tan(x)' thinking it is a string representing a LaTeX formula for the tangent of x, they won't realize that they actually created a string with a tab plus "an(x)". So actually I would argue that the end goal *is* to make people aware of which escape characters exist, or at the very least, always make strings raw if there's even the remotest chance they will contain a backslash character. Is it the best way to go about this? I don't know. The whole thing sort of makes me think raw strings should have been the default, but it's obviously too late to change that. I personally don't feel strongly about the warnings being enabled by default or not. My big gripe is that if you actually want the warnings they are difficult to get in a reproducible way. I'm actually surprised they are so annoying for you. Once a py file is compiled into a pyc file the warnings completely disappear, even if you want them! The fact that you can't use a real escape sequence in a raw string is annoying but not the end of the world given that it's trivial to concatenate strings. |
If you already use escape sequences in your ASCII art, what is the problem of using them for backslashes?
What is a better solution? The deprecation warning was emitted starting from 3.6. Deprecation warnings were silent by default in 3.6, but they become more visible in 3.7. This helped some projects (which give attention to warnings in they tests) to fix real bugs. But it was not enough, so other bugs are fixed only now, when the warnings become even more visible in 3.8. We see a value of warnings, they help to fix bugs. If we rollback this change, it will cause yet undiscovered bugs be hidden for more time. In Python 3.0 we make many abrupt breaking changes at once. People complained. Here is an opposite example of very gradual change: two releases with a DeprecationWarning, few more (at least two) releases with a SyntaxWarning, and finally perhaps a SyntaxError. |
I think we haven't *actually* done a proper DeprecationWarning period for this. We tried, but because of the issue with byte-compiling, the warnings were unconditionally suppressed for most users -- even the users who are diligent enough to enable warnings and look at warnings in their test suites. I can see a good argument for making the change, but if we're going to do it then it's obviously the kind of change that requires a proper deprecation period, and that hasn't happened. Maybe .pyc files need to be extended to store a list of syntax-related DeprecationWarnings and SyntaxWarnings, that are re-issued every time the .pyc is loaded? Then we'd at least have the technical capability to deprecate this properly. |
Can the pyc compilation step done by our normal package installers be made to treat this warning as an error so that it is forced into the package owners faces instead of overlooked because it was just something on stderr? This syntax warning is absolutely the right thing to surface for _owners/maintainers_ of a given piece of code. Their strings are wrong and their code quality will improve as a result. It isn't the right thing to surface to _users_ of someone else's problematic code. (though it does serve as a red flag proving that the owners of that code or person who pinned some dep to an old version haven't proactively tested with modern python) |
PR 15142 reverts that change for 3.8. |
Thank you! |
I'm leaving this open, as we may still want to do it in 3.9+, just in a less disruptive manner. (That, and how, hasn't been decided yet) Follow the thread(s) on python-dev for the latest on that. |
Are there issues tracking the things I mentioned, which should IMO happen before this becomes a hard error (making the warnings reproduce even if the file has already been compiled, and making warning message point to the correct line in multiline strings)? And is it too late to potentially get some of those things in 3.8? |
I haven't looked, so not that i'm aware of. I suggest filing one for each of those. The warning not pointing to the right line in a multiline literal sounds like a bug to me so that one, if fixed, seems reasonable for 3.8. The release manager gets to decide. Making the syntax warnings happen upon import of an already compiled pyc is more feature-ish, likely to be 3.9+. |
One possibility is to restrict the warning to a backslash followed by an alphabetic character or backslash, and that we define backslash followed by any other printable character as specifically allowed. This would catch the likely sources of errors without breaking the likes of \, or \-. Also, multiline strings have less of a need for a warning than a single line string. |
I'm not sure if this is an issue or by design, but this DeprecationWarning behaves differently to other DeprecationWarnings. A normal DeprecationWarning triggered by code in __main__ is printed by default: $ python -c 'import warnings; warnings.warn("test", DeprecationWarning)'
<string>:1: DeprecationWarning: test But this one is silent: $ python -c '"\,"'
[no output] To see this DeprecationWarning at all, I need to type: $ python -Wdefault -c '"\,"'
<string>:1: DeprecationWarning: invalid escape sequence '\,' But that enables this DeprecationWarning for all modules, not just __main__ . I've tested this with Python 3.9 on debian bullseye and the 3.10 docker image. |
The escape sequence `\|` was added to the documentation for `or_` to prevent Sphinx interpreting the pipe as the start of an inline substitution start string, but this is an illegal escape sequence that causes a warning, due to: python/cpython#77093 This causes further issues for tests that treat warnings as errors, see e.g.: numba#8095 This commit resolves the issue by rewriting the expression as inline code, and rewriting all other expressions in the docstrings for atomics as inline code for consistency.
New attempt to replace DeprecationWarning with SyntaxWarning: |
The regex was fixed in the meanwhile. With Bottle 0.12.23, there is no warning anymore:
|
I confirm that this issue is fixed in docutils 0.19. I tested Sphinx 5.3.0. There is no warning anymore:
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: